Invited Talk

Annotation of genes with Gene Ontology terms in an evolutionary context
Pascale Gaudet, Rama Balakrishnan, James Hu, Eva Huala, Ranjana Kishore, Suzanna Lewis, Donghui Li, Brenley McIntosh, Huaiyu Mi, Li Ni, Paul D. Thomas.    (schedule)

The Gene Ontology (GO) has been used widely to annotate the functions of gene products, from published experimental results obtained in a number of different “model” organisms.  The GO Consortium has implemented a project to integrate and review annotations made for related genes in different organisms, and to use this information to improve both the quality and quantity of GO annotations.  Quality is addressed through simultaneous review of experimental annotations for related genes in different organisms, leading to removal of erroneous or misleading annotations and increased annotation consistency.  Quantity is addressed through use of experimental annotations from a few model organisms to infer annotations for related genes in many different organisms.  The basic approach has been published (Gaudet et al., Briefings in Bioinformatics 12:449, 2011), and is based on reviewing the annotations in the context of a phylogenetic tree of a gene family and then creating a model of function evolution within the family.  We will discuss our progress and lessons learned, as well as challenges and implications for further development of both curation tools and automated approaches to large-scale functional annotation.

Industry Talk

- a SPARQL engine for data federation and self-service querying in the Life Sciences
Alexandre Riazanov    (schedule)

Knowledge workers in biomedical research and biotech, as well as clinicians, spend a lot of their time finding and integrating information from multiple heterogeneous and distributed resources, such as online biomedical databases, nomenclatures, clinical databases and specialised analytical programs. The state-of-the-art approaches to data integration - datawarehousing and workflow scripting - have their limitations in terms of the scope and are also quite costly.
A possible answer to this challenge is data federation. In this paradigm, specialised middleware allows to query multiple heterogeneous distributed resources together as a single database. Ideally, for economic reasons, the querying should be self-service, so that scientists, biotechnologists and clinicians can create ad hoc queries and run them without relying on programmers or IT specialists.  
To implement data federation and self-service querying, we leverage the power of SADI (Semantic Automated Discovery and Integration) Web services that can be fully automatically discovered, composed and called by client programs, in particular by specialised SPARQL engines. Practically, these unique properties of SADI mean that a collection of databases and algorithms can be queried as a single RDF database. Technically, this is achieved by requiring compliant HTTP services to input and output RDF and by attaching simple descriptions fully defining what the services do. Also, the schema of the virtual database represented by a network of SADI services is essentially a controlled vocabulary containing concepts and relations from the subject domain, e.g., biology, chemistry or health care, which can be directly understood by non-technical users. This semantic exposition of the data represented by SADI services facilitates self-service ad hoc querying.
In this talk we will present HYDRA - a reasoning-enabled SPARQL query engine for SADI services, developed by IPSNP Computing Inc. We will discuss the performance and usability requirements driving the development of the engine and demonstrate its current capabilities on a number of queries from the Life Sciences domain. We will briefly outline some case studied in Bioinformatics and Clinical Intelligence in which HYDRA has been used. We will also briefly inform the audience about our plans for future functionality development and announce a closed beta-tester subscription.

Biography: Alexandre Riazanov is the CTO of IPSNP Computing Inc, based in Saint John, Canada, leading the R&D work on the development of HYDRA and HYDRA-based products. He holds a PhD from The University of Manchester, UK, and is best known for his early work on the prize-winning Vampire reasoner. He has both extensive industrial R&D experience and academic research track record. His main areas of expertise are semantic querying of data, efficient automated reasoning implementation and reasoning-based natural language processing. His recent work also includes applied research on the use of Semantic Web services for Bioinformatics and Clinical Intelligence.

Highlights paper

Intelligent systems for biological pathway integration, modeling, analysis and visualization

Daniela Rosu, Chiara Pastrello, Sara Rahmati, Giuseppe Agapito, David Otasek, Igor Jurisica.    (schedule)

Biomedical researchers use models of biological systems to integrate diverse types of information, from high-throughput datasets, functional annotations and orthology data to expert knowledge about biochemical reactions and biological pathways to develop new hypotheses and answer complex questions such as what type of system perturbation may result in a desired change in cellular function; what factors cause disease; will patients respond to a given treatment, etc.
The rapidly growing amount of data and knowledge in scientific articles and in public datasets is increasing the complexity of these models daily, making the work of maintaining them up to date ever more challenging and highlighting the pressing need to develop computational techniques to help with the associated tasks. Among these methods, diverse semantic web technologies, from linked data to ontologies, are increasingly becoming core components of a wide range of tools that are central to systems biology and enable scientists to combine quantitative and qualitative biological data and knowledge represented as formal concepts and rules to generate hypotheses as well as to test if they are sound and consistent, to answer complex queries and perform advanced tasks, such as assembling together individual reactions into potential new pathways.
Semantic biological pathway modelling, in particular, has been studied for some time, but it is still at an early stage of development. Specifically, we discuss challenges and experiences in the design and construction of pathway representation models, as well as tools and strategies for using these models for visualization and data integration.

Highlights paper

Benchmarking Infrastructure for Mutation Text Mining
Artjom Klein, Alexandre Riazanov, Matthew Hindle and Christopher J.O. Baker.     (schedule)

In biomedical text mining, the development of robust pipelines, publication of results and the running of comparative evaluations is greatly hindered by the lack of adequate benchmarking facilities. Benchmarks - annotated corpora - are usually designed and created for specific tasks and provide implemented hard-coded evaluation metrics. Comparative evaluations between tools and evaluation of these tools on different gold standard data sets is an important aspect for performance verification and adoption. Evaluations are hindered by a diversity and heterogeneity of formats and annotation schemas of corpora and systems. Well-known text mining frameworks such as UIMA and GATE include functionality for integration and evaluation of text mining tools based on hard-coded evaluation metrics. Unlike these approaches, we leverage semantic technologies to provide flexible and ad-hoc authoring of evaluation metrics. We report on a centralized community-oriented annotation and benchmarking infrastructure to support development, testing and comparative evaluation of text mining systems. We have deployed this infrastructure to evaluate the performance of mutation text mining systems.
The generic nature of the solution makes it flexible, easily extendable, re-usable and adoptable for new domains. Flexible SPARQL allows ad-hoc search and analysis of corpora and the implementation of evaluation metrics without requiring programming skills. This is the primary example of benchmarking infrastructure for mutation text mining.