If scientists run enough variations of a lab experiment, perhaps AI models could, someday, predict the results of future versions of that study.
A new Seattle-based startup called Synthesize Bio believes it may be able to achieve that sooner than later, starting with gene-expression studies. Synthesize has already generated some promising results in predicting these types of lab experiments, according to a preprint published Tuesday.
Synthesize will officially launch on Wednesday, its co-CEOs
, Jeff Leek and Robert Bradley, exclusively told
Endpoints News,
as it releases its AI model,
called GEM-1. The 15-employee biotech was founded in 2023 and raised a previously undisclosed $10 million seed round in October 2023, led by Madrona Ventures. Other investors include Sahsen Ventures, AI2 Incubator, Inner Loop Capital, and MD Venture Investments.
Leek and Bradley came up with the idea behind Synthesize in the hallways of the Fred Hutchinson Cancer Center. Leek is the cancer center’s chief data officer, and Bradley is the scientific director of a translational data science research center.
They wanted to see if biotech could follow a similar trajectory in building large AI models, similar to how companies like OpenAI and Anthropic grew large language models trained on nearly all of the internet’s data.
“We felt like this is an idea where we believe it can transform how scientists go about their work,” Bradley said in an interview.
The Fred Hutch duo built a model called GEM-1 over the last two years. The model is trained on tons of different gene-expression experiments that they curated and annotated from public datasets. Synthesize is already at work on the next-generation version of the model called GEM-2.
“This is the worst version of the GEM model you will ever see,” Leek said. “It’s only going to get better over time.”
Gene-expression experiments and single-cell studies have been a hot space within the AI bio field. Large organizations, like the
Chan Zuckerberg Initiative
and
the Arc Institute
, are championing moonshot projects that view these types of experiments as key building blocks in eventually making models of virtual cells. Virtual cells could allow more experiments traditionally done in the lab to be accurately simulated by computers. Other startups, like
Noetik
,
Tahoe Therapeutics
, and
Xaira Therapeutics
, are also taking their own twists on building large or rich datasets using some of these experiments.
Synthesize believes its model can tackle a wider range of gene-expression experiments than the rest.
To test that belief, they looked at how GEM-1’s predictions compared to real-world results. They set a cut-off date of mid-2024, training their model on experimental data in public databases before that date, and then using new experiments added after that date as tests.
Overall, Synthesize’s team found GEM-1 “yielded accuracy comparable to the best-possible performance estimated by comparing the results of matched lab experiments,” according to the preprint.
The preprint also found that about 95% of these new experiments were similar to those in GEM-1’s training data, meaning the model had already seen experiments that studied similar types of cells or similar perturbations.
GEM-1’s predictions varied in accuracy. The model nailed the effect on gene expression of certain molecules, like the hair-loss drug finasteride and an alpha blocker called doxazosin. But predictions for other compounds were less than 50% accurate, like the HIV treatment dolutegravir or cancer drug afatinib, according to the preprint.
An area of active research is working on confidence intervals, which could give users a sense of the quality of a prediction. Bradley said that’s something the broader AI field is also working on, as it’s an important problem. That could be critical in telling the finasteride and afatinib predictions apart.
“Our goal is not to replace experiments,” Bradley said. “Our goal is to have a seamless blend of wet-lab work and AI experimentation. Experiments are critical. They’re always going to be the gold standard.”
Instead, GEM-1 could allow scientists on finite budgets to consider more data for experiments they may not have the time or resources to run. It could also enable scientists to get a sense of results on experiments that are impossible or impractical today, Bradley said. For instance, the model could help predict a drug’s activity in tissue where you can’t take a patient’s biopsy, he said.
Leek said there are blind spots, where GEM-1’s performance is wanting. That’s why they are eager to release the model openly,
including free access on its website
, to gain a sense of how researchers use it and its real-world performance.