PubChem is a gold mine for chemists and biologists on the hunt for new drugs. As the world’s largest public database of small-molecule screening data, it warehouses thousands of experiments on hundreds of thousands of compounds with millions of values.
Yet the worth of the repository of chemical compounds and their potential use as therapeutic agents has been limited because researchers cannot easily search or compare the voluminous data sets it contains, nor integrate them with other data sources.
|The work of Stephan Schürer, Ph.D., and Vance Lemmon, Ph.D., powered by a federal stimulus grant, is now enabling scientists to access and analyze complex biological datasets in just minutes.
Just 18 months after receiving a
$1.5 million National Institutes of Health stimulus grant, Stephan Schürer, Ph.D., research assistant professor of molecular and cellular pharmacology, and Vance Lemmon, Ph.D., professor of neurological surgery and the Walter G. Ross Distinguished Chair in Developmental Neuroscience, and their team of programmers and computer scientists have developed and released an ontology for bioassays and the accompanying software, which is enabling scientists to retrieve, analyze, and compare PubChem’s diverse biological data sets in minutes.
A controlled vocabulary that enables computers to decipher complex concepts and relationships, the Bioassay Ontology resolves the key problem the Schürer-Lemmon team set out to tackle: the lack of standardization of entries in the PubChem or other chemical-compound databases. When researchers upload screening results to the PubChem library, they do so without annotations or with ad hoc annotations, making it impossible for a computer to search the assays or answer complex queries about them.
“Humans know my mother’s mother is my grandmother, but unless I introduce the properties of relationships, the computer just knows grandmother as another word,’’ Schürer explains. “So in addition to terminology, we’re giving the computer basic knowledge of how assays are related. That way it can answer more interesting questions and identify other potentially relevant information. All we need is for people to use the terms.’’
How widely the ontology will be used remains to be seen, but there’s no doubt the Schürer-Lemmon team already has met the economic stimulus grant goal of producing a high-impact breakthrough quickly. Researchers now have the tools to rapidly identify compounds that may target a specific disease, accelerating the first phase of the cumbersome drug design process.