Competition for Particle Detection in 3D Cryo-Electron Tomography

Develop a ML model for annotating subcellular structures and proteins in CryoET data

Status

Completed

Ended

February 5, 2025

Length

3 months

$75,000 in prizes

931 Teams / 6855 Entrants

27,971 Submissions

76 Countries

Impact

Read the paper on the phantom sample and reference annotations

Competition datasets 10445 and 10446 contain protein complexes to mimic cellular CryoET data and annotations from months of particle picking.

Calling all ML developers! Can you beat the competition winners with your annotations?

Bio Rxiv Paper Preview with the title 'Annotating CryoET Volumes: A Machine Learning Challenge'

10 Winning Teams

How is this score calculated

1st Place

Score: 0.78759

Daddies

Members:
Christof Henkel
,
Eugene Khvedchenya

The 1st place solution employs an ensemble approach combining segmentation models (3D UNets with ResNet & B3 encoders) and object detection models (SegResNet and DynUnet backbones) from MONAI. Segmentation uses weighted CrossEntropy loss (256:1 positive:negative weighting), while detection implements a modified PP-Yolo loss with IoU-based point-point similarity metrics. Models are trained on 96×96×96 patches with inference on larger volumes, and both approaches are merged through a novel scaling technique that aligns feature map distributions before object detection post-processing. Performance is optimized by converting models to TensorRT, achieving a 200% speedup and enabling parallel inference on two T4 GPUs.

2nd Place

Score: 0.78381

LuoZiqian&Lion

Members:
Ziqian Luo
,
Shuo Wang

The 2nd place solution employs an ensemble of multiple lightweight segmentation models with parameter sizes ranging from 873K to 14.2M, including architectures such as UNet3D, VoxResNet, VoxHRNet, SegResNet, DenseVNet, and UNet2E3D. The models are trained using Tversky Loss, Dice Loss, and Cross-Entropy Loss with customized mask radii for each particle type, utilizing InstanceNorm3d and PReLU for enhanced training stability. After segmentation, particle centroids are computed using CC3D and filtered based on voxel count statistics, with an ensemble strategy involving averaging of 7 to 10 complementary models and test-time augmentation.

3rd Place

Score: 0.78351

ONCE UPON A MOON

Members:
tangtang1999

The 3rd place solution implements an ensemble of 3D UNet models with ResNet101 backbone, trained using Cross Entropy loss on all seven particle classes including the non-scored beta-amylase. Training utilizes smaller input dimensions (64×128×128) while inference benefits from larger dimensions (64×256×256), coupled with Exponential Moving Average (EMA) with a decay of 0.995 for model stabilization. The final submission consists of a 4-fold average ensemble (from 7-fold cross-validation) with test-time augmentation including flips along x, y, z axes and 90-degree rotations in the x-y plane.

4th Place

Score: 0.78306

yu4u & tattaka

Members:
Yusuke Uchida
,
Takaaki Fukui

5th Place

Score: 0.78252

Youssef Ouertani

Members:
Youssef Ouertani

6th Place

Score: 0.78022

tomoon33

Members:
Tomoki Uchiyama

7th Place

Score: 0.77708

kobakos

Members:
Koki Kobayashi

8th Place

Score: 0.77612

I Cryo Everyteim

Members:
Sergio Alvarez da Silva Junior
,
Naoki Hashimoto
,
Sirapoab Chaikunsaeng
,
Sahil Barnwal

9th Place

Score: 0.77274

Avengers

Members:
Koki Wada

10th Place

Score: 0.77263

Josef Slavicek

Members:
Josef Slavicek

Competition Details

About the Competition

We held a competition for the development of machine learning algorithms to overcome a major bottleneck limiting biomedical discoveries—the annotation and analysis of high resolution 3D images from advanced imaging technologies.

Goal

Advance the understanding of cell biology through machine learning algorithms to annotate particles in 3D images of cells captured by cryoET.

The resulting algorithms were able to perform robust annotation of particles of variable shapes and sizes within the hundreds of 3D images in the competition dataset after being trained on a limited set of available reference annotations from the same dataset.

Competition Data

Competition Deposition Name:

CZII - 2024 CryoET Object Identification Challenge

Experimental and simulated training data for the CryoET Object Identification Challenge. Each dataset contains tilt series, alignments, tomograms and ground truth annotations for six protein complexes (Apo-ferritin, Beta-amylase, Beta-galactosidase, cytosolic ribosome (80S), thyroglobulin and VLP). Curation procedures are described in detail in the accompanying paper. Details on how the dataset was used in the competition are available on Kaggle.

Challenge Resources

To reduce the onboarding time for competitors, an extensive set of example notebooks was provided. These notebooks include the following models:

These notebooks also leverage the copick library for handling cryoET datasets, for which we provided PyTorch Datasets and utility functions to simplify the creation of data loaders, metadata tracking, and model performance analysis.

These example notebooks can be found in the Github Repository - CZII ML Challenge Notebooks.

In addition, the CryoET Data Portal provides multiple annotated datasets of protein complexes in situ which can be used to tune machine learning algorithms to the crowded nature of in situ samples.

What is CryoET?

Overview

Cryo-electron tomography (CryoET) is an imaging technique that enables 3D visualization of the cell at sub-nanometer resolution but, unlike other high-resolution imaging techniques, the cryogenic (frozen) condition preserves cellular architecture so this detailed view includes protein structures in their natural biological context. Three-dimensional tomograms can be generated from many images of a thin slice through a cell, taken while tilting the specimen in multiple directions.

A given tomogram is typically only about 200 nanometers thick—approximately five hundred times thinner than a sheet of paper—yet packed with information about the structures of the cellular machinery driving health and disease.

For more information about CryoET basics, check out the educational articles from the CryoET Data Portal documentation site.

Glossary of Terms

80S ribosome

Ribosomes are the molecular machines that translate the genetic information from the intermediary mRNA templates into proteins. The 80S ribosome is specifically the eukaryotic ribosome, and it is abundant in cell lysate since translation of mRNA is a constant activity within the cell.

Apo-ferritin

A 24-subunit globular protein in all cells and tissues. It binds and transports iron in all cells and tissues. Apoferritin refers to the iron-free form of the protein.

Annotating

The process of identifying proteins or membranes of interest in noisy 3D tomograms.
See also labeling and picking.

Competition Contributors

Contributors are listed alphabetically by last name:

David Agard
Richa Agarwal
Ashley Anderson
Jeremy Asuncion
Rodrigo Baltazar
Tristan Bepler
Nina Borja
Alister Burt
Bridget Carragher
Mikala Caton
Anchi Cheng
Chi-Li Chiu
Yongbaek Cho
Ellaine Chou
Bryan Chu
Charlie Dubbledam
Kira Evans
Kirsty Ewing
Jessica Gadling
Lorenzo Gaifas
Kyle Harrington
Matthias Haury
Utz Heinrich Ermel
Norbert Hill
Erin Hoops
Timmy Huang
Peng Jin
Ann E Jones
Saugat Kandel
Kandarp Khandwala
Robert Kiewisz
Dari Kimanius
Mykhailo Kopylov
Justine Larsen
Manuel Leonetti
Donghui Li
Emma Lundberg
Kristen Maitland
Gorica Margulis
Dannielle McCarthy
Elizabeth Montabana
Ben Nelson
Jun Xi Ni
Stephani Otte
Mohammadreza Paraan
Noeli Pazsoldan
Ariana Peck
Clinton Potter
Janeece Pourroy
Dana Sadgat
Simon Sander
Jonathan Schwartz
Daniel Serwas
Shu-Hsien Sheu
Hannah Siems
Trent Smith
Andrew Sweet
Shivanshi Vaid
Madhuri Vangipuram
Manasa Venkatakrishnan
Carmela Villegas
Thorsten Wagner
Eric Wang
Zun Shi Wang
Feng Wang
Samantha Yammine
Yue Yu
Zhuowen Zhao
Shawn Zheng
Ellen Zhong