“Present chemical data storage methodologies place many restrictions on the use of stored data. The absence of sufficient high-quality metadata prevents intelligent computer access to data without human intervention. This creates barriers to the automation of data mining in activities such as quantitative structure−activity relationship modelling. The application of Semantic Web technologies to chemical data is shown to reduce these limitations. “ – ‘Bringing Chemical Data onto the Semantic Web’ by Taylor, Gledhill, Essex, Frey, Harris and De Roure – Journal of Chemical Information and Modeling
Chemistry on the Semantic Web
While the Semantic Web has been very popular in general Life Science contexts (see, for example, information about SWAT4LS ), so far it has not been an active field of development for chemistry, in general, and computational chemistry, in particular. Nonetheless, there are a few specific examples of developments that are important for understanding the current status of research in this area. One of the most important achievements is the creation of ChEBI ontology (Chemical Entities of Biological Interest (ChEBI). ChEBI is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. ChEBI is an OBO compliant ontology. The Open Biomedical Ontologies (OBO) is an umbrella organization for ontologies and structured, shared, and controlled vocabularies for use across all biological and biomedical domains. Before CHEBI, GO (Gene Ontology) was another high profile and very well-known ontology. It facilitates describing the action of gene products in a biological context. The Gene Ontology is run by the Gene Ontology Consortium.
Another interesting achievement is the ChemINF ontology. The ChemINF ontology deals with chemical information entities and is focused on data-driven research and integration of calculated properties (descriptors) of chemical entities within the Semantic Web context.
From the general molecular data perspective, the large database ran by the Royal Society of Chemistry called ChemSpider is a very important resource. While ChemSpider is “a free chemical structure database providing fast text and structure search access to over 34 million structures from hundreds of data sources”, it is also an ideal integration, search and sharing platform for chemistry, one of critical importance for the field.
From the identification and addressing perspective, application of Semantic Web technologies for chemistry can benefit from the use of The IUPAC International Chemical Identifier (InChI). Developed between 2000 and 2004, InChI can be used to create unique identifiers (URIs) for chemical substances.
Finally, there is no doubt that the first steps toward the application of the Semantic Web for chemistry were made a long time ago by UK-based scientists (Peter Murray-Rust and Henry Rzepa) who created CML – Chemical Markup Language. You can read about this interesting development in Peter Murray-Rust’s article in Nature.