Highlights paper #1
OntoMaton: Google spreadsheets meet NCBO BioPortal
services (schedule)Bioinformatics. 2013 February 15; 29(4): 525–527.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3570217/
Eamonn Maguire1
eamonn.maguire@st-annes.ox.ac.uk
Alejandra Gonzalez-Beltran1
alejandra.gonzalez.beltran@gmail.com
Patricia Whetzel2 whetzel@stanford.edu
Susanna-Assunta Sansone1
sa.sansone@gmail.com
Philippe Rocca-Serra1 proccaserra@gmail.com
Academic research is about exchange of ideas, exchange of
opinions which, increasingly, also requires exchange of data.
The research environment is also one of interaction and
collaborative activities. For many years, collaboration
manifested itself through the form of email exchange, often
augmented by attachments of textual or tabular documents.
Recipients would then review, alter, amend, or unwillingly
deteriorate the initial document before possibly returning it
to its original owner then burdened with merging changes and
resolving conflicts. The process is well known to anyone
involved in writing manuscripts or to data managers having to
deal with spreadsheet circulating between consortium members
assigned to specific duties in geographically distinct
locations. A typical instance of such cases is highlighted
when samples collected during the in-life portion of a trial
are sent to third party specialists for molecular
characterization.
While computer scientists have been using version control
systems for a long time, these tools may be intimidating or
unwieldy to users from a different background. The advent of
virtualized infrastructures, marketed as cloud solutions, now
offers a number of options circumventing the need to circulate
documents. However, not all solutions allow for version
tracking and collaborative editing features of a spreadsheet
document as found in Google Spreadsheets facility. While this
web-based spreadsheet solution does not match in functionality
the ubiquitous Microsoft Excel, it does provide enough for the
most basic actions. Furthermore, it provides an API to
interact with it and the possibility of creating scripts/apps
hosted on Google servers to augment and enhance capabilities.
It is this last element that has been harnessed to create a
service for data managers working in the biodomain. Through
provision of OntoMaton, an application with the ability to
call a terminology service directly within Google
spreadsheets, we have provided the means to limit free text
description, by normalizing annotation with controlled
vocabularies. Standardisation of metadata descriptors is a key
requirement for many data managers.
The purpose of this highlight is to introduce OntoMaton, a
solution bringing NCBO Bioportal annotation and vocabulary
lookup services to Google spreadsheets. While the OntoMaton
widget is entirely agnostic to format, it allows for creation
of ad-hoc custom templates for data collection with the
ability, if required, to restrict columns or fields to
specific ontologies, ontology branches or terms. A
demonstration will show how it supports annotation of
experimental datasets when combined with the ISA-Tab syntax
and how it has been integrated in an a broader ecosystem of
tools supporting semantic publication, namely ISA2RDF and
ISA2OWL.
Highlights paper #2
Three Ontologies to define phenotype measurement data
(schedule)Front Genet. 2012; 3: 87.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3361058/
Mary Shimoyama1 shimoyama@mcw.edu
Rajni Nigam1 rnigam@mcw.edu
Leslie McIntosh2 lmcintosh@wustl.edu
Rakesh Nagarajan2 rakesh@wustl.edu
Treva Rice2 treva@wubios.wustl.edu
Dc Rao2 rao@wubios.wustl.edu
Melinda Dwinell1 mrdwinell@mcw.edu
1United States Medical College of Wisconsin
2 United States Washington University
Ontologies have become widely used to create annotations to various types of data such as genes, proteins and genomic variations to indicate a characteristic or function or participation in a larger process. They have also been used for annotations that indicate interactions or associations for such things as disease, phenotype or drug-gene interactions. There has been little use of multiple ontologies to standardize individual elements of multi-domain records, such as phenotype and clinical measurements. This paper outlines an approach for standardizing phenotype and clinical measurement records using multiple ontologies created for that purpose. These include the Clinical Measurement Ontology (CMO), Measurement Method Ontology (MMO), and Experimental Condition Ontology (XCO) used to annotate: 1) what was measured, 2) how it was measured, 3) and under what conditions it was measured. The Clinical Measurement Ontology includes terms for morphological, blood, cell, organ system, movement, chemical responses and other measurements commonly taken in animal laboratories and medical clinics while the Measurement Method Ontology includes terms for both ex vivo and in vivo methods. The Experimental Conditions Ontology includes branches and terms for diet, activity, chemicals, light and atmosphere, surgical manipulations and other types of conditions used as both stressors and in controlled situations. Using the ontologies described in this paper and following the approach outlined, nearly 60,000 phenotype measurements for rat have been integrated into the PhenoMiner project http://rgd.mcw.edu/phenotypes/. Sources for these measurements have included the literature, direct submissions from researchers, and large scale public datasets such as The National BioResource Project for the Rat in Japan http://www.anim.med.kyoto-u.ac.jp and PhysGen Program for Genomic Applications http://pga.mcw.edu/. The multi-ontology approach also made it possible to provide a dynamic query interface that allows users to develop queries be-ginning with any of the ontologies representing a major element of the record and filter results progressively using the other ontologies.
In addition to the ontologies described in this paper, another important contribution involves the use of multiple ontologies to standardize a multifaceted data record. This is in contrast to the most common use of ontologies which is to create a descriptive annotation to a data object that is already standardized through another mechanism such as the use of the Gene Ontology with genes that are standardized through nomenclature and unique identifiers. It is also in contrast to the use of multiple ontologies to simply create a composite term for annotation. The value of this paper can be seen in the use of these ontologies for multiple groups and purposes and the adoption of the multiple ontology approach for multifaceted clinical records for research.