Innovation in generative algorithms
Our team is experienced in developing generative algorithms and has contributed influential papers in both generative flow networks and conditional flow matching. We also recently released FoldFlow — a series of generative algorithms that extend the flow-matching paradigm to SE(3) equivariant motions.
FoldFlow-Base: The first simulation-free model over SE(3)
FoldFlow-OT: Accelerates our base model with Riemannian optimal transport
FoldFlow-SFM: Extends FoldFlow OT to stochastic flows
This family of algorithms is more stable and faster to train compared to diffusion-based models. While diffusion-based models source from unstructured (Gaussian noise) prior, our models can generate from any source distribution, including known protein conformation to help us capture protein dynamics.
Generate the right data at scale
We work closely with established biotechnology companies. By combining our individual strengths in massively scalable synthetic biology with the power of our generative modelling, we’ve built a solution featuring the best technology in both fields. We test hundreds of thousands of completely novel protein designs for properties like binding affinity and specificity in each iteration of the binding assay. We also do a follow-up massively parallel functional assay to measure how the same set of proteins affects the signalling of individual living cells. We derive structural insights with machine reasoning over 3D structures and run assays in a continuous process to improve generative algorithms with each design iteration.
Massive-scale computation
We run machine learning workloads across 1000+ GPUs. Our core library is built with Ray Open Source and we collaborate with AnyScale on the parallelization of our workflows. We also collaborate with Nvidia, Amazon Web Services (AWS), and Google Cloud Platform (GCP) and have also received research credits.
Patient-centric generative design
The drug development path is full of forks in the road — most of which lead nowhere — which means each approved medicine carries the cost of hundreds of failures. Finding the right path is a modelling problem on how the drug will behave inside a human being. If we had a model that could tell which protein designs will work ahead of time, it could cost a fraction of the current 2B to bring a drug to market. We believe the solution is testing millions of protein designs on patient-derived samples to learn a phenotypic model. This is enabled with methods like microfluidics that account for tumour and patient heterogeneity. Eventually, generating the right drug for a patient and choosing the right patient for the drug (a set of biomarkers to select patients for clinical trials) will merge into one machine-learning assisted process.