← Life Changelog

May 2024

Joined Harvard Kirchhausen Lab

ResearchCompleted

ML researcher at Harvard Medical School applying 3D vision transformers to cryo-electron tomography. The lab images subcellular structures at near-atomic resolution; my job was to make the resulting volumes interpretable at scale.

● What I shipped

  • Trained on multi-node DGX clusters: A100 / H100, NVLink intra-node, Infiniband inter-node, RAID + NVMe storage tier.
  • PyTorch FSDP with bf16 mixed precision and activation checkpointing to fit large 3D ViTs.
  • Diagnosed a Rendezvous (RDZV) backend issue affecting Infiniband multi-node training — filed PyTorch issue #144779.

● Stack

PyTorchFSDPCUDAInfinibandNCCLDGXcryo-ET

● Links