Predicting Genotype Specific Synergistic Combinations
(Before jumping in, I want to give a shout out to Coryandar Gilvary, a current graduate student at Weill Cornell. She led the work and was lead author on this method and is one of the best scientists working on machine learning for drug development I've met. This post relates to work done at Weill Cornell University, where OneThree’s core technology was developed)
One area that's always interested us at OneThree is the idea of genotype-specific drug combinations. Overall, drug combinations are one of the most exciting ways drug development is evolving. As we better our understanding on the complexity of certain diseases, drug combinations offer a way to target a disease in multiple different way and potentially limit the degree of acquired drug resistance. Additionally right around the time we were brainstorming the NCI released its ALMANAC database which contains combinations of over 100 single drugs, each tested on 60 cancer cell lines, and they reported the degree of drug synergy in each cell line.
When we dived into this database we observed something pretty interesting: the majority of synergistic drug combinations were context-specific. What this meant was that it was important to define drug synergy in terms of particular genotype or cancer subtype rather than broadly. This makes experimental discovery of drug combinations very difficult because not only do you have to test many combinations (9.900 for just 100 single drugs), but you have test each one against multiple different cell lines to pinpoint this context-specificity. However, this is a perfect opportunity for AI to make an impact, because if we can build a system that can integrate sample information to predict synergistic combinations, it could address some of the issues in screening thousands of combinations across many different samples.
To accomplish, we built a model that computes multiple different similarity metrics for each pair of drugs in a potential combination. These similarity metrics fall into 4 main categories: similarities based on the drug structure, its targets, its effects, and cell line perturbations. For instance, similarities based on how a drug's targets effect metabolic pathways would fall into the target based similarities. We then used a multi-task learning model to integrate these similarities and predict the degree of synergy in each cell line for each combination of drugs.
When we tested this method on held-out test sets from the NCI ALMANAC database we observed a dramatic increase in accuracy (AUC = .92) compared to traditional methods for predicting combination synergy, and we also validated our data integration hypothesis, by showing that the model achieved the highest overall AUROC when combining the multiple types of similarities.
Since validating the method computationally, we've identified some novel predicted combinations that we're in the process of validating experimentally, and I'll update this post as soon as we get those results!
Till next time!