The release of Tahoe-100, the world’s largest single-cell drug-perturbed dataset, marks a pivotal shift toward building virtual cell models that simulate biological responses to drugs and genetic edits. By providing 100 million single-cell data points across 50 cancer models and 1,200 drug treatments, this resource enables machine learning models to move beyond descriptive correlation toward a causal understanding of cellular dynamics. Unlike traditional hypothesis-driven research, this large-scale, unbiased data generation allows AI to identify novel drug targets and predict therapeutic outcomes with greater precision. Integrating this with the Arc Institute’s scBaseCamp—a 230-million-cell observational dataset—creates a robust foundation for systems biology. These advancements aim to overcome high failure rates in clinical trials by replacing slow, manual experimentation with scalable, in silico simulations that effectively model the complex, context-dependent nature of human disease.
Sign in to continue reading, translating and more.
Continue