Chemical Semantics, Inc. has released its first web portal (www.chemsem.com) where one can publish results and search for results published by yourself or others. The description here is sure to change but is given as an indication of some of the current and expected capabilities of the portal.
The portal
There are two ways to publish on the portal. The first is through various software packages that produce computational chemistry results. The number of these will expand as time progresses. The first example, used to develop the basis ideas is HyperChem, Release 9. The second way to publish is to have a CSX file that defines the publication, the molecular system and the calculations and just upload that file to the portal where it will be translated to a Turtle (*.ttl) file and the data placed on the semantic web. One might also upload an output file of a computational package and have that output parsed and put into CSX form first.
Publish Button
The first publication procedure, which uses HyperChem 9 or later, will not only publish HyperChem calculations but any third-party calculations that HyperChem has imported such as those from Gamess, Gaussian, Mopac, etc.
For example, HyperChem can parse a Gamess output file. The following screen shot shows three Gamess output files containing result for a single point ab initio SCF calculation, a geometry optimization of structure and a vibrational analysis calculation. Once imported into HyperChem 9, these results can be published just as if they we computed by HyperChem.
While parsing output files is certainly possible, Chemical Semantics, Inc. expects to work with developers of these computational chemistry packages to help them install their own “Publish Button”.
In addition to the title of the publication, the authors, their organizations and e-mail, and the publication abstract, pushing the publication button (which initially creates a CSX file) adds a number of other things to the publication. The Login Data… Button allows entering data so that the Publisher Package can use a login ID and password to directly publish results. The Content Button allows choices to be made of what is published among the available results. The Flags button allows the author to choose to define the current state of the publication as shown below or the Visibility (Private, Protected, and Public). A private publication can be seen only by the authors, a protected publication can be show to anyone that the authors send a URI to with a key, and a public publication can be seen by anyone logged in to the portal.
It is also possible to add tags (essentially keywords) to any publication. These may help in searching. A common set of tags is available as well as custom tags set by the authors.
Logging In and Manual Upload of CSX
In addition to publishing by just metaphorically (not metaphorical in HyperChem 9!) hitting a “Publish Button, many scientists will publish by creating CSX file and uploading it to the portal site. As described earlier above, Chemical Semantics, Inc. has created a new CSX standard, similar to CML, for holding all the required details of a computational chemistry publication. This includes details about the authors, the title of the publication, etc. as more or less just shown above for HyperChem 9’s Publish Button. The information transferred to the portal by hitting the Publish Button is that stored in a CSX file. Multiple pathways can be expected to produce these CSX files as time progresses. The conversion from CSX to RDF occurs at the portal server.
One first has to log in to the Portal. One can register if one does not yet have an ID and Password. After entering the portal one is met with a list of one’s own publications but one can inspect all publications as well depending upon their visibility (Private, Protected, Public).
One can peruse all publications based upon Author, Title, Category, Tag, etc. and then view any publication.
Viewing Publications
The View button to the right of any publication allows you to view details of a publication. Of the various Tabs, the Basics Tab is shown below:
Each publication requires a Unique Name to be used in generating the URI for this publication. Normally these will be assigned a the server level by the portal. In the present case the unique name is generated temporarily by HyperChem and is HyperChem10, the 10th publication since the counter was reset. This name with the date generate the unique URI shown at the bottom of the Basics page. Thus,
http://purl.org/chem/pub/2013-08-24-hyperchem10
uniquely identifies this publication. It is dereferenceable and can be passed to friends to access the publication if so desired.
The other tabs – Results, Molecules, etc.… allow access to other facets of the publication, the Results tab,
Shows a summary of the molecular system and calculation. The Molecules Tab gives a 3D rendering of the molecule that can be rotated, zoomed, panned, etc.,
The Wave Function Tab shows information on the orbitals, etc. of a wave function. The Graph Tab shows a rendering of the RDF graph that can be expanded, explored, etc.
The Data Sets Tab describes the files (CSX, TTL, etc.) associated with the publication and which can be downloaded as desired. For example, HyperChem 9 can read and display any of the results in a CSX file as will other third-party software packages as Chemical Semantics grows.
Data Federation
One of the problems with existing databases is that the data exists in silos of isolation. The individual databases are difficult to merge and there is general difficulty in sharing data because of a lack of universal agreement on the database schema, column names, etc. An fundamental aspect of the semantic web is its ability to federate data, i.e. make data available globally. This comes about because of the global data standards that have been set, because one can merge individual ontologies easily and because two separate graph databases can be merged just by adding a single link (predicate) from one graph to another.
An example of this federation is available at the Chemical Semantics portal by clicking on the Data Federation Tab. If the molecule for the current publication is Methyl Chloride, then clicking on the tab brings up something like the following page:
This displays the information about Methyl Chloride that exists at the ChemSpider site of the Royal Society of Chemistry (RSC) and the Chemical Entities of Biological Interest (ChEBI) site of the European Molecular Biology Lab (EMBI-EBI).
Archiving
In addition to Publishing, Chemical Semantics Inc.’s portal can be used to simply archive random files. The upload feature recognizes CSX and semantic web files but is capable of storing any random file along with title, author, abstract, tags and all the other features of our portal. Thus, a “publication” can include the archiving of arbitrary files that are worth keeping. They can be found easily using the features of the portal and the semantic web and subsequently downloaded. Thus our portal also has the features of a cloud archive.
Virtuoso and SPARQL Queries
The portal includes a SPARQL end point, i.e. a web site where SPARQL queries can be made. As indicated above, we use the Virtuoso software:
The above example does not show the namespaces at the start of the query for lack of space. The example is a query for a specific URI,
http://purl.org/chem/pub/2013-08-12-hyperchem174
The query is for all vibrational frequencies and intensities for this molecule (cyclopropane) associated with the publication noted.
Hopefully, this gives you the flavor of SPARQL queries. Some experience in forming SPARQL queries is necessary with the existing software. An English language front end would be useful.