Research Publications
Improving MeSH Classification of Biomedical Articles using Citation Contexts Medical Subject Headings (MeSH) are used to index the majority of databases generated by the National
Library of Medicine. Essentially, MeSH terms are designed to make information, such as scientific articles,
more retrievable and assessable to users of systems such as PubMed. This paper proposes a novel method
for automating the assignment of biomedical publications with MeSH terms that takes advantage of cita-
tion references to these publications. Our findings show that analysing the citation references that point
to a document can provide a useful source of terms that are not present in the document. The use of these
citation contexts, as they are known, can thus help to provide a richer document feature representation,
which in turn can help improve text mining and information retrieval applications, in our case MeSH
term classification. In this paper, we also explore new methods of selecting and utilising citation contexts.
In particular, we assess the effect of weighting the importance of citation terms (found in the citation
contexts) according to two aspects: (i) the section of the paper they appear in and (ii) their distance to
the citation marker.
We conduct intrinsic and extrinsic evaluations of citation term quality. For the intrinsic evaluation, we
rely on the UMLS Metathesaurus conceptual database to explore the semantic characteristics of the
mined citation terms. We also analyse the ‘‘informativeness’’ of these terms using a class-entropy mea-
sure. For the extrinsic evaluation, we run a series of automatic document classification experiments over
MeSH terms. Our experimental evaluation shows that citation contexts contain terms that are related to
the original document, and that the integration of this knowledge results in better classification perfor-
mance compared to two state-of-the-art MeSH classification systems: MeSHUP and MTI. Our experi-
ments also demonstrate that the consideration of Section and Distance factors can lead to statistically
significant improvements in citation feature quality, thus opening the way for better document feature
representation in other biomedical text processing applications.
Details
| Related Project
Related People |
