Decoding Specific Biology
AI/ML isn't a magic bullet, but, if focused on decoding specific biological questions, it has the potential to quickly generate new insights and hypotheses that otherwise might have been missed. That's the approach we take at OneThree.
Every one of our technologies addresses a specific biological question such as binding to a given target class, gene essentiality in specific genotypes, or target/structure derived toxicity. This begins by first working with experts to fully understand the problem and the complexities that scientists must consider when tackling that problem today. We then design specific machine learning models that are most suited for this question. For instance, if the problem concerns signaling pathways we may use Bayesian networks or label propagation algorithms that can best capture the underlying pathway structure, however, if the problem involves a large amount of image-like information (such as chemical structures) we may rely on a neural net/deep learning architecture.
Once we have built our model, we then go back to the experimental team to thoroughly evaluate how the hypotheses and insights derived from the platform could impact the development process. This step often involves a feedback process where we collect new data to improve or test our platform. This close partnership with the experimentalists and the machine learning scientists is fundamental to our approach.
This not only allows us to better design and test new systems, but it also makes sure the output of any predictive algorithm can be used to answer direct mechanistic questions and uncover new biology. It's our way to unbox the AI "black box."
Some of the biggest breakthroughs in biology came from scientists considering all the data they had available and drawing insights across them, yet most existing computational approaches were built and optimized for only a limited number of data types when. We'd spent years working in biology and chemistry labs and knew that each type of data is only a piece of a much bigger puzzle, and we brought this mentality into OneThree.
Before we begin a new project, we take the time to understand all the data types out there and how they could be integrated together. We then build machine learning models that are specifically designed to handle these different types of data. Some examples:
- When we predict compound toxicity we integrate 100+ features from 12 different data types, including compound and protein structures, genetic interaction networks, CRISPR and shRNA screens, and tissue-specific genomic profiles. This layered data integration provides us with a better understanding on the underlying cause of any predicted adverse events.
- When identifying synthetic lethal (SL) genes, we first mined journal articles and published knockdown screens to build our machine learning models on top of. We then integrated genomic data from cancer patients and cell lines, known metabolic pathways, data from model organisms, and 30+ gene level features to predict new SL pairs. This approach led to a much higher accuracy and greater number of predicted SL pairs that could be exploited therapeutically.
In these, and other predictive tasks, we've witnessed the power of this approach. In fact, across the board we see up to a 20% increase in accuracy as we layer on more data types.
Additionally we've formed data generation partnerships with Weill Cornell and other companies and institutes to make sure we always have a stream of new, high quality data feeding into our platform.
Looking Beyond One Part of the Pipeline
It's just as important whether a drug can treat brain cancer as it is to know whether that drug will cause severe side effects when it binds proteins in the liver, or if the compound can even cross the blood brain barrier to begin with. Over 95% of compounds that enter preclinical development fail before ever making it to patients, and it's often because of issues like these are overlooked in early discovery. Because of this, we built our platform to provide insight across multiple parts of drug development pipeline. Since 2013, we've built 10 different algorithms into our platform to provide insights across steps like target selection, compound mechanisms, toxicity and metabolism, and patient biomarkers at the beginning of the development process. Not only does this provide greater mechanistic insights but also helps ensure a greater success rate for drugs entering preclinical development.