← Life Changelog

May 2024

Joined Harvard Kirchhausen Lab

ResearchCompleted

ML researcher at Harvard Medical School applying 3D vision transformers to lattice light-sheet microscopy. The lab images live-cell subcellular dynamics at ~3 nm; my job was to make the resulting 4D volumes interpretable at scale.

● What I shipped

  • Trained on multi-node DGX clusters: A100 / H100, NVLink intra-node, Infiniband inter-node, RAID + NVMe storage tier.
  • PyTorch DDP with bf16 mixed precision and activation checkpointing to fit large 3D ViTs.
  • Diagnosed a Rendezvous (RDZV) backend issue affecting Infiniband multi-node training — filed PyTorch issue #144779.

● Stack

PyTorchDDPCUDAInfinibandNCCLDGXLLSM3D ViT

● Links