Processing and management of clinical records and research literature is a critical component of biomedical research and clinical practice. Our biomedical research partners, based in hospitals and institutes in the Melbourne biomedical precinct, have identified this cost as a significant bottleneck in their work, and see a strong need for methods for making sense of large volumes of text. Existing technology addresses neither of these specific needs, nor broader problems of searching and summarising massive specialised collections.
BioTALA --- BioMedical Text and Language Applications --- is developing "text mining" technologies to (semi-)automatically discover and visualise information from genetic and other biomedical research and clinical documents. Drawing on the team's leading strengths in information retrieval and natural lanuage processing, we aim to develop and apply text mining techniques to a variety of practical problems faced by biomedical researchers.
We are developing fundamental algorithms and tools in the context of specific applications, basing our activities on issues identified as significant by our biomedical research collaborators. These biomedical researchers are investigating cutting-edge biomedical and clinical research topics in world-leading research institutes, and have found that issues with text are a critical bottleneck. We are developing innovative technologies that address specific problems identified by our partners where the technologies are deemed to be likely to be of broad value.
Our Research
Language Technology and Information Retrieval have long histories in the medical domain dating back to the 1960s. Since the completion of the human genome project in 2001, researchers in both areas have become increasingly involved in the information management challenges that have arisen from the rate at which new publications are being added to the bibliome. However, despite the recent flurry of activity by the information retrieval, machine learning, and language technology communities in the biomedical arena, there has not been an enthusiastic uptake of these technologies by biomedical researchers. In an invited talk at the ACL-BioNLP workshop in 2007, Alfonso Valencia (Centro Nacional de Biotecnologia, Spain) stated that there is a growing gulf between what computer science researchers perceived to be of interest to biomedical personnel and what pain points these people are actually experiencing on a day-to-day basis. He implored researchers to focus their efforts on tasks that really mattered to
the biomedical community.
The aim of the BioTALA project is to bridge the gap between biomedical information needs and LT/IR research focus. BioTALA is ideally positioned to do this, given Melbourne's status as Australia's Biomedical research hub, and through our established links with our biomedical partners.
Our current focus is on the following research themes:
- Fact extraction. This is the process of finding relationships between biological entities in the biomedical literature. In conjunction with one of our partners we are investigating the use of fact extraction the task of curating locus-specific databases, i.e. databases containing information about the mutations associated with a specific gene.
- Text summarisation is the process of generating a more concise representation of textual data. For example, we are investigating the application of summarisation to the task of automatically updating clincial evidence-based reviews.
- Information retrieval is concerned with information organisation and retrieval tasks such as document search, clustering and filtering. The collation of relevant documents is an essential preprocessing step in our information management architecture. The success of the technologies developed in the summarisation and fact extraction strands is dependent on the correct identification of relevant documents.

What will this research achieve?
The outcomes of this project will be a customisable platform for document processing, as well as a suite of tools leveraging language technology to process documents by detecting pertinent information relevant to the information task, such as location names or biomedical terms.
Another major outcome will be tools for organising search results in ways that make information more accessible, including clustering of related results and summarisation of documents.
These tools are designed to interface to different search engines, allowing existing engines to have their search capabilities upgraded.