Genomic information retrieval engine: our Genomic IR engine uses biomedical knowledge resources and recognition of entities (e.g. gene names) for query expansion and concept-based document retrieval, as well as an improved ranking algorithm. (This was the third-best performing system at TREC 2007 Genomics IR track, a research forum for evaluating IR systems). See (Stokes et al, 2009);
- Statistical text analysis techniques for interpreting whole-collections of documents,as well as the interpretation IR result-sets. We have applied topic-modeling techniques to interpreting collections (Newman et al. 2010) and the MeSH ontology (Newman et al. 2009a), and have also proposed for methodology external evaluation of the topic models themselves (Newman et al. 2009b);
- Integrated search and topic-based visualisation system: To facilitate more "abstract" views of document collections and the information they contain, we have integrated a topic-based visualisation framework with a biomedical search engine, allowing semantic relationships to be visualised as an aid to developing queries that give better results. (This system was a semi-finalist in the Elsevier Grand Challenge, 2008);
- Data collection tools: web-based annotation tool; search-proxy interface for collecting biomedical queries: The use of supervised machine learning techniques, popular in language technologies, requires the collection and annotation of data, in particular documents. Such data is extremely scarce in the biomedical domain, so we have developed web-based tools that can be used by our collaborators, for annotating documents and collecting task-specific query logs;
- Biomedical text mining: We have recently begun work on "text mining", i.e. using techniques from Human Language Technology to directly extract valuable information from documents; such information would ideally be used to populate a biomedical database without the high human labour cost currently required (see the Fact Extraction research stream described above). Two novel approaches we are exploring include: processing tables, which are a rich source of information within biomedical documents (Wong et al, 2009); combining the use of machine learning and grammar-based language technology approaches (Mackinlay et al, 2009).
For a full list of all our research publications, please click here.