Highlights paper #1

OntoMaton: Google spreadsheets meet NCBO BioPortal services (schedule)

Bioinformatics. 2013 February 15; 29(4): 525–527.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3570217/

Eamonn Maguire1 eamonn.maguire@st-annes.ox.ac.uk
Alejandra Gonzalez-Beltran1 alejandra.gonzalez.beltran@gmail.com
Patricia Whetzel2 whetzel@stanford.edu
Susanna-Assunta Sansone1 sa.sansone@gmail.com
Philippe Rocca-Serra1 proccaserra@gmail.com

1 United Kingdom Oxford e-Research Centre, University of Oxford
2 United States Stanford Center for Biomedical Informatics Research, Stanford University

Academic research is about exchange of ideas, exchange of opinions which, increasingly, also requires exchange of data. The research environment is also one of interaction and collaborative activities. For many years, collaboration manifested itself through the form of email exchange, often augmented by attachments of textual or tabular documents. Recipients would then review, alter, amend, or unwillingly deteriorate the initial document before possibly returning it to its original owner then burdened with merging changes and resolving conflicts. The process is well known to anyone involved in writing manuscripts or to data managers having to deal with spreadsheet circulating between consortium members assigned to specific duties in geographically distinct locations. A typical instance of such cases is highlighted when samples collected during the in-life portion of a trial are sent to third party specialists for molecular characterization.
While computer scientists have been using version control systems for a long time, these tools may be intimidating or unwieldy to users from a different background. The advent of virtualized infrastructures, marketed as cloud solutions, now offers a number of options circumventing the need to circulate documents. However, not all solutions allow for version tracking and collaborative editing features of a spreadsheet document as found in Google Spreadsheets facility. While this web-based spreadsheet solution does not match in functionality the ubiquitous Microsoft Excel, it does provide enough for the most basic actions. Furthermore, it provides an API to interact with it and the possibility of creating scripts/apps hosted on Google servers to augment and enhance capabilities. It is this last element that has been harnessed to create a service for data managers working in the biodomain. Through provision of OntoMaton, an application with the ability to call a terminology service directly within Google spreadsheets, we have provided the means to limit free text description, by normalizing annotation with controlled vocabularies. Standardisation of metadata descriptors is a key requirement for many data managers.
The purpose of this highlight is to introduce OntoMaton, a solution bringing NCBO Bioportal annotation and vocabulary lookup services to Google spreadsheets. While the OntoMaton widget is entirely agnostic to format, it allows for creation of ad-hoc custom templates for data collection with the ability, if required, to restrict columns or fields to specific ontologies, ontology branches or terms. A demonstration will show how it supports annotation of experimental datasets when combined with the ISA-Tab syntax and how it has been integrated in an a broader ecosystem of tools supporting semantic publication, namely ISA2RDF and ISA2OWL.



Highlights paper #2

Three Ontologies to define phenotype measurement data (schedule)

Front Genet. 2012; 3: 87.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3361058/

Mary Shimoyama1 shimoyama@mcw.edu
Rajni Nigam1 rnigam@mcw.edu            
Leslie McIntosh2 lmcintosh@wustl.edu        
Rakesh Nagarajan2 rakesh@wustl.edu        
Treva Rice2 treva@wubios.wustl.edu        
Dc Rao2 rao@wubios.wustl.edu       
Melinda Dwinell1 mrdwinell@mcw.edu

1United States Medical College of Wisconsin
2 United States Washington University
  
Ontologies have become widely used to create annotations to various types of data such as genes, proteins and genomic variations to indicate a characteristic or function or participation in a larger process. They have also been used for annotations that indicate interactions or associations for such things as disease, phenotype or drug-gene interactions. There has been little use of multiple ontologies to standardize individual elements of multi-domain records, such as phenotype and clinical measurements. This paper outlines an approach for standardizing phenotype and clinical measurement records using multiple ontologies created for that purpose. These include the Clinical Measurement Ontology (CMO), Measurement Method Ontology (MMO), and Experimental Condition Ontology (XCO) used to annotate: 1) what was measured, 2) how it was measured, 3) and under what conditions it was measured. The Clinical Measurement Ontology includes terms for morphological, blood, cell, organ system, movement, chemical responses and other measurements commonly taken in animal laboratories and medical clinics while the Measurement Method Ontology includes terms for both ex vivo and in vivo methods. The Experimental Conditions Ontology includes branches and terms for diet, activity, chemicals, light and atmosphere, surgical manipulations and other types of conditions used as both stressors and in controlled situations. Using the ontologies described in this paper and following the approach outlined, nearly 60,000 phenotype measurements for rat have been integrated into the PhenoMiner project http://rgd.mcw.edu/phenotypes/. Sources for these measurements have included the literature, direct submissions from researchers, and large scale public datasets such as The National BioResource Project for the Rat in Japan http://www.anim.med.kyoto-u.ac.jp and PhysGen Program for Genomic Applications http://pga.mcw.edu/. The multi-ontology approach also made it possible to provide a dynamic query interface that allows users to develop queries be-ginning with any of the ontologies representing a major element of the record and filter results progressively using the other ontologies.
In addition to the ontologies described in this paper, another important contribution involves the use of multiple ontologies to standardize a multifaceted data record. This is in contrast to the most common use of ontologies which is to create a descriptive annotation to a data object that is already standardized through another mechanism such as the use of the Gene Ontology with genes that are standardized through nomenclature and unique identifiers. It is also in contrast to the use of multiple ontologies to simply create a composite term for annotation. The value of this paper can be seen in the use of these ontologies for multiple groups and purposes and the adoption of the multiple ontology approach for multifaceted clinical records for research.