Home     |     People     |     Publications      


The BioLemmatizer is a domain-specific lemmatization tool for the morphological analysis of biomedical literature.  The BioLemmatizer is tailored to the biological domain through integration of several published lexical resources related to molecular biology.  It focuses on the inflectional morphology of English, including the plural form of nouns, the conjugations of verbs, and the comparative and superlative form of adjectives and adverbs.  The BioLemmatizer retrieves lemmas based on the use of a lexicon that covers an exhaustive list of inflected word forms and their corresponding lemmas in both general English and the biomedical domain, as well as a set of rules that generalise morphological transformations to heuristically handle works that are not encountered in the lexicon.

For further information click here

The BioLemmatizer was developed through a collaboration of NICTA's Karin Verspoor and members of the Center for Computational Pharmacology at the University of Colorado School of Medicine. 

If you use the BioLemmatizer to support academic research, please cite the following paper:

Haibin Liu, Tom Christiansen, William A Baumgartner Jr, and Karin Verspoor.  BioLemmatizer: a lemmatization tool for morphological processing of biomedical text.  Journal of Biomedical Semantics, 2012, 3:3.   

Access paper here