The two-day Louhi workshop brings together academia, industry, healthcare providers and policy makers interested in natural-language processing of written and spoken health communication. The programme consists of invited talks, scientific presentations and industry demonstrations.
Louhi 2013 takes place at the NICTA ATP Laboratory (13 Garden Street, Eveleigh NSW 2015, Australia, (view map), phone: +61 2 9376 2000), situated in the heart of Sydney at the Australian Technology Park. This is one of NICTA's five laboratories in Australia, and is the headquartes for the organisation. Please come to the Level 5 Reception first and you will be pointed to the seminar rooms L4.59 (scientific programme) and L4.60 (morning and afternoon teas) where the workshop takes place.
The workshop dinner takes place at Nick's Seafood Restaurant (The Promenade, Cockle Bay Wharf, Sydney NSW 2000 Australia, view map) on Monday 11 Feb, 2013 at 7 pm - 10 pm. We have chosen their set menu 1 with Mezze Plate, Main Course, and Dessert as well as a beverage package for two hours. Fetta, marinated olives, taramasalata, smoked salmon, grilled vegetables, marinated octopus and sour dough; golden fried king prawns with snow pea leaves, tomato salad and mango mayonnaise; oven baked breast of chicken with leeks, semi dried tomatoes, artichoke, red Capsicum and Port Jus; grilled John Dory with sweet potato mash and sweet corn salsa; white chocolate and raspberry creme brulee with chocolate macadamia short bread; and vanilla panna cotta with summer berries and raspberry coulis - sounds delicious!
Chair: Hanna Suominen
|Welcome and overview||Hanna Suominen||
Chair: Sumithra Velupillai
|Preliminary Evaluation of Speech Recognition for Capturing Patient Information at Nursing Shift Changes: Accuracy in Speech to Text and User Preferences for Recorders||Hanna Suominen, Jim Basilakis, Maree Johnson, Paula Sanchez, Linda Dawson, Leif Hanlen and Barbara Kelly|
|Vocabulary In Discharge Summaries – The Patients’ and the Nurses’ Perspective||Veronika Laippala, Riitta Danielsson-Ojala, Kirsi Aantaa, Tapio Salakoski and Sanna Salanterä||
|Automatic De-Identification of Electronic Health Records: An Australian Perspective||Guido Zuccon, Mitchel Strachan, Anthony Nguyen, Anton Bergheim and Narelle Grayson|
|Optimizing the Dimensionality of Clinical Term Spaces for the Improvement of an Automatic Diagnosis Coder||Aron Henriksson and Martin Hassel||
|Information Extraction from Medication Prescriptions Within Drug Administration Data||Andrew Mackinlay and Karin Verspoor|
|Effect of Additional In-domain Parallel Corpora in Biomedical Statistical Machine Translation||Antonio Jimeno and Aurelie Neveol
|Assessors Assessing Assessments||Karin Friberg Heppin and Anni Jarvelin
Chair: Hercules Dalianis
|Statistical Parsing of Varieties of Clinical Finnish||Veronika Laippala, Timo Viljanen, Antti Airola, Jenna Nyblom, Sanna Salanterä, Tapio Salakoski and Filip Ginter||
|Cross-Language Detection of Linguistic and Semantic Regularities in Pharmacovigilance Terms||Marie Dupuch, Thierry Hamon and Natalia Grabar|
|Morphostatistical Approach to Medical Document Classification||Raul Sirel||
|CLEFeHealth and Artif Intel in Med||Hanna Suominen|
Chair: Guido Zuccon
|A Machine Learning Approach Towards Early Detection of Frequent Health Care Users||Antti Airola, Tapio Pahikkala, Heljä Lundgren-Laine, Anne Santalahti, Sanna Salanterä and Tapio Salakoski||
Chair: Veronika Laippala
|An Approach for Automatic Multi-label Classification of Medical Sentences||Abeed Sarker, Diego Molla and Cecile Paris||
|Announcement of the 5th Louhi||5th Louhi Organisers|
|Porting a Rule-based Assertion Classifier for Clinical Text from English to Swedish||Sumithra Velupillai, Maria Skeppstedt, Maria Kvist, Danielle Mowery, Brian Chapman, Hercules Dalianis and Wendy Chapman|
Chair: Leif Hanlen
|NICTA eHealth||Leif Hanlen|
17:00 - 17:25
|Stability of Text Mining Techniques for Identifying Cancer Staging||David Martinez, Lawrence Cavedon and Graham Pitson|
|19:00 - 22:00
A\Prof Pierre Zweigenbaum
LIMSI-CNRS and ERTIM-INALCO, France
Medical Information Extraction and Multilingualism
Biomedical natural language processing is, as most other domains, more advanced for the English language than for other languages. Nevertheless, when clinical texts are concerned, processing "local" languages is a requirement. This talk will outline issues and paths to create a set of components which process clinical texts in a language other than English, and illustrate them in the case of French.For components which rely on expert knowledge, collecting suitable resources and eliciting knowledge are involved. For data-driven components, preparing training corpora is the key point, and hits constraints of privacy. The size needed for these corpora may be reduced by domain adaptation of existing components, either from general domains such as news, or from a closer domain and genre such as the biomedical literature. Leveraging resources and systems designed for English is another path to follow. This can take the form of the "localisation" of resources, for which some successes have been registered, but may also build on parallel corpora to transfer annotations from one language to another. Finally, some issues in the translation of biomedical texts from English to other languages will be considered..
Pierre Zweigenbaum is a CNRS Senior Researcher at LIMSI, where he leads the Natural Language Processing group, and a part-time Professor at INALCO, where he teaches Natural Language Engineering. Before 2007 he has been doing research at Assistance Publique - Paris Hospitals and INSERM for over twenty years. His main research interests are in Information Extraction, Question Answering, Computational Terminology, and Comparable Corpora, with applications to the medical domain. He is the chair of the recently created Francophone Special Interest Group of IMIA, which fosters the development of resources and tools to process French clinical texts.
University of Sydney, School of Information Technologies and Health Language Laboratories, Australia
Industrial Strength Language Engineering Methods - Automatic Extraction of Cancer Registry Content from Radiology Reports
The experimental work required for successful industrial strength language engineering is different to the needs of computational linguists for scientific publication. Important elements of computational linguistics such as 10-fold cross validation have little value in this work. The desire for quality is driven through continuous improvement of the gold standard by using linguists to deal with the language issues supported with software dedicated to their functional needs. Increasing the productivity of the linguists in the manual annotation, correction and validation of the corpus are also important infra-structural issues to address. The dominant issue for industrial research is when to stop annotating to satisfy client specifications for accuracy so active learning has a valuable role in prioritising documents for annotations. Further issues particular to industrial projects is the extent to which different sources of the nominally the same data can be relied upon to constitute an extension of the one corpus or a different corpus requiring a different computational language model for the same extraction task. An industrial project with the objective of extracting pertinent content from radiology reports from multiple radiological services for use in a population based cancer registry will be used to illustrate these issues for language engineering.
Jon Patrick has lead R&D groups in Language Technology and Clinical Information Systems. In 2005 he was awarded the Eureka Prize, for the development of the a natural language processing system that detects financial scams on the internet. Since 2005 he has focused on language technologies for enhancing clinical information systems and concomitant topics. He is a member of the WHO committee designing the information model for ICD11, and he is an Australian representative on the Implementation Committee of the IHTSDO for the design and use of SNOMED CT. His team has had their technologies installed in a number of Sydney hospitals. He has conducted extensive research on the use of information technology in Emergency Departments and is a well known critic of the current technologies being deployed at the moment. In 2012 he left the University of Sydney to pursue his interests in R&D consulting in Health IT and NLP
DSV-Stockholm University, Sweden
Over ten per cent of all in-patients today obtain a hospital acquired infection (HAI), this causes a lot of suffering for the patients and immense costs for the society. Health care managers cannot easily obtain statistics of the level of HAI in each clinic so they can put actions to prevent HAI. In this talk we explain what a HAI is and how one can detect it using both textual information and structured data from the patient record. We will show both rule based and machine learning methods to detect HAI and also discuss the difficulties in detecting the HAI.
Dr. Hercules Dalianis, Professor Curriculum Vitae; Hercules Dalianis, born 20 July 1959, Dalianis is a professor in Computer and Systems Sciences at Stockholm University. Dalianis held a three year guest professorship at CST, University of Copenhagen during 2002-2005, founded by the Nordforsk, the Nordic council. Dalianis received his Ph.D in 1996. Dalianis was post doc researcher at University of Southern California/ISI in Los Angeles 1997. Dalianis was also post doc researcher (forskarassistent) at NADA KTH, Stockholm 1999-2003. Dalianis works in the interface between industry and university and with the aim to make research results useful for society. Dalianis has specialized in the area of human language technology, to make computer to understand and process human language text, but also to make a computer to produce text automatically. Currently Dalianis is working in the area of clinical text mining with the aim to improve health care in form of better electronic patient record systems, presentation of the patient records and extraction of valuable information both for clinical researchers but also for lay persons in form of patients.
List of Accepted Papers
Research Papers with a Long Oral Presentation
Antti Airola, Tapio Pahikkala, Heljä Lundgren-Laine, Anne Santalahti, Sanna Salanterä and Tapio Salakoski: A Machine Learning Approach Towards Early Detection of Frequent Health Care Users
Marie Dupuch, Thierry Hamon and Natalia Grabar: Cross-Language Detection of Linguistic and Semantic Regularities in Pharmacovigilance Terms
Aron Henriksson and Martin Hassel: Optimizing the Dimensionality of Clinical Term Spaces for the Improvement of an Automatic Diagnosis Coder
Veronika Laippala, Riitta Danielsson-Ojala, Kirsi Aantaa, Tapio Salakoski and Sanna Salanterä: Vocabulary In Discharge Summaries – The Patients’ and the Nurses’ Perspective
Veronika Laippala, Timo Viljanen, Antti Airola, Jenna Nyblom, Sanna Salanterä, Tapio Salakoski and Filip Ginter: Statistical Parsing of Varieties of Clinical Finnish
Andrew Mackinlay and Karin Verspoor: Information Extraction from Medication Prescriptions Within Drug Administration Data
David Martinez, Lawrence Cavedon and Graham Pitson: Stability of Text Mining Techniques for Identifying Cancer Staging
Abeed Sarker, Diego Molla and Cecile Paris: An Approach for Automatic Multi-label Classification of Medical Sentences
Hanna Suominen, Jim Basilakis, Maree Johnson, Paula Sanchez, Linda Dawson, Leif Hanlen and Barbara Kelly: Preliminary Evaluation of Speech Recognition for Capturing Patient Information at Nursing Shift Changes: Accuracy in Speech to Text and User Preferences for Recorders
Sumithra Velupillai, Maria Skeppstedt, Maria Kvist, Danielle Mowery, Brian Chapman, Hercules Dalianis and Wendy Chapman: Porting a Rule-based Assertion Classier for Clinical Text from English to Swedish
Guido Zuccon, Mitchel Strachan, Anthony Nguyen, Anton Bergheim and Narelle Grayson: Automatic De-Identification of Electronic Health Records: An Australian Perspective
Research Papers with a Short Oral Presentation
Karin Friberg Heppin and Anni Järvelin: Assessors Assessing Assessments
Antonio Jimeno and Aurelie Neveol: Effect of Additional In-domain Parallel Corpora in Biomedical Statistical Machine Translation
Raul Sirel: Morphostatistical Approach to Medical Document Classification