Biomedical Informatics

Software     |     People     |     Publications    


The Biomedical Informatics team is developing techniques to provide easier and faster access to valuable information buried in biomedical texts, saving time and cost for biomedical researchers and clinicians, and potentially enabling new insights and discoveries.


With so much health data now being collected and stored electronically there are huge opportunities to provide new innovations to healthcare via Biomedical Informatics research.  The Biomedical Informatics team at NICTA is developing strategies for knowledge-based analysis of biomedical data.  "Knowledge-based" methods take advantage of background information available in relevant databases, structured resources such as ontologies, and in textual sources such as the published literature or clinical records, to provide context for the interpretation and understanding of biomedical data.

Our research addresses both direct analysis of biomedical textual data sources, to provide easier and faster access to the valuable information buried in those texts, and the use of that information in analysis and modelling of non-textual biomedical data.  We take advantage of a variety of underlying technologies including clustering, pattern recognition, natural language processing and general data mining to query, summarise and discover relationships between biomedical datasets.

A recent example of our work is a new method called LEAP-FS (Literature Enhanced Automated Prediction of Function Sites, Verspoor et al 2012) for high quality protein active site prediction that combines protein structure modelling and text mining.  The results of this method can be used as a starting point for drug design.  This work was developed in collaboration with computational biologists at Los Alamos National Laboratory.

In addition to saving time and cost for biomedical researchers and clinicians, this research potentially enables new insights and discoveries.  We are actively working on both genomic and clinical data sets, and with the help of our external partners and research collaborators, hope to combine the two to address the challenges of personalised medicine. 

Processing and management of clinical records and research literature is a critical component of biomedical research and clinical practice. Our biomedical research partners, based in hospitals and institutes in the Melbourne biomedical precinct, have identified this cost as a significant bottleneck in their work, and see a strong need for methods for making sense of large volumes of text. Existing technology addresses neither of these specific needs, nor broader problems of searching and summarising massive specialised collections.

We are developing "text mining" technologies to (semi-)automatically discover and visualise information from genetic and other biomedical research and clinical documents. Drawing on the team's leading strengths in information retrieval and natural language processing, we aim to develop and apply text mining techniques to a variety of practical problems faced by biomedical researchers.

We are developing fundamental algorithms and tools in the context of specific applications, basing our activities on issues identified as significant by our biomedical research collaborators. These biomedical researchers are investigating cutting-edge biomedical and clinical research topics in world-leading research institutes, and have found that issues with text are a critical bottleneck. We are developing innovative technologies that address specific problems identified by our partners where the technologies are deemed to be likely to be of broad value.

Our Research

Language Technology and Information Retrieval have long histories in the medical domain dating back to the 1960s. Since the completion of the human genome project in 2001, researchers in both areas have become increasingly involved in the information management challenges that have arisen from the rate at which new publications are being added to the bibliome.  However, despite the recent flurry of activity by the information retrieval, machine learning, and language technology communities in the biomedical arena, there has not been an enthusiastic uptake of these technologies by biomedical researchers. In an invited talk at the ACL-BioNLP workshop in 2007, Alfonso Valencia (Centro Nacional de Biotecnologia, Spain) stated that there is a growing gulf between what computer science researchers perceived to be of interest to biomedical personnel and what pain points these people are actually experiencing on a day-to-day basis.  He implored researchers to focus their efforts on tasks that really mattered to the biomedical community.

The aim of this program is to bridge the gap between biomedical information needs and LT/IR research focus. We are ideally positioned to do this, given Melbourne's status as Australia's Biomedical research hub, and through our established links with our biomedical collaborators.

Our current focus is on the following research themes:


  • Fact extraction. This is the process of finding relationships between biological entities in the biomedical literature.  In conjunction with one of our partners we are investigating the use of fact extraction the task of curating locus-specific databases, i.e. databases containing information about the mutations associated with a specific gene.

  • Information visualisation and analysis involves constructing non-standard views of document collections and the information contained therein. We are applying statistical topic-mapping techniques to provide abstractions of large volumes of text, providing high-level views and allowing a user to more easily see topic-based relationships that may be present.

  • Information retrieval is concerned with information organisation and retrieval tasks such as document search, clustering and filtering. The collation of relevant documents is an essential preprocessing step in our information management architecture.  The success of the technologies developed in the summarisation and fact extraction strands is dependent on the correct identification of relevant documents.

What will this research achieve?

The outcomes of this project will be a customisable platform for document processing, as well as a suite of tools leveraging language technology to process documents by detecting pertinent information relevant to the information task, such as location names or biomedical terms.

Another major outcome will be tools for organising search results in ways that make information more accessible, including clustering of related results and summarisation of documents.

These tools are designed to interface to different search engines, allowing existing engines to have their search capabilities upgraded.

Current Biomedical Collaborators

Further Information

For further information, contact Karin Verspoor: karin.verspoor<at>