2025

SpatialDINO - 3D self-supervised vision transformer for LLSM

ResearchCompleted

Designed and trained a 3D self-supervised vision transformer for label-free segmentation and tracking of subcellular dynamics in lattice light-sheet microscopy — pre-trained on 2.4 TB / 180k volumes across 24 A100s. Beat a prior approach co-led by Nobel laureate Eric Betzig on downstream evaluation.

● What I shipped

Adapted DINO-style self-supervised contrastive learning into 3D — student/teacher ViTs over LLSM volumes with native 3D iBOT block masking.
Introduced KMeans content-aware 3D cropping, no-positional-encoding 3D ViTs (NoPE), and a 3D adaptation of SINDER for singular-defect repair.
Built a streaming encoder with token-store + online softmax for full-volume inference at million-token sequence lengths.
Beat the prior SOTA — including the Nobel-laureate-led approach — on downstream subcellular structure prediction.
Released as a BioRxiv preprint, first-author; engineering log at /writing/spatialdino-lessons.

● Stack

PyTorchDINO3D ViTSelf-Supervised LearningLLSMDDPTriton

● What I shipped

● Stack

● Links