Mayo Clinic, IBM launch open-source consortium
Biomedical informatics researchers at the Mayo Clinic and IBM have launched the Open Health Natural Language Processing (NLP) Consortium, which is establishing the open-source space to promote past and current development efforts, including participation in EMR information extraction.
As part of the launch, Mayo Clinic and IBM have released their clinical NLP technologies into the public domain via the web. The site, http://www.ohnlp.org, will allow the approximately 2,000 researchers and developers working on clinical language systems worldwide to contribute code and further develop the systems.
"Large-scale information extraction from the clinical narrative is a vital component in advancing translational research and patient care," adds Guergana Savova, PhD, medical informatics specialist and Mayo's NLP lead on the project. "It 'unlocks' the clinical textual data that resides in huge repositories. Such technology would allow for large-scale data aggregation, analyses and usage -- just imagine the power of data from millions of patients."
The two clinical text solutions released open-source by Mayo Clinic and IBM aim at processing two specific types of notes. Clinical notes describe patient-physician encounters, while pathology reports center around tissue findings.
Mayo's open-source solution, clinical Text Analysis and Knowledge Extraction System (cTAKES), focuses on processing the patient-centric clinical notes. IBM's medKAT systems (medical Knowledge Analysis Tool) is a UIMA-based, modular and flexible system that uses advanced NLP techniques to extract structured information from unstructured data sources, such as pathology reports, clinical notes, discharge summaries and medical literature. It has been designed to operate within institutional systems or databases of any size, according to the company.
As part of the launch, Mayo Clinic and IBM have released their clinical NLP technologies into the public domain via the web. The site, http://www.ohnlp.org, will allow the approximately 2,000 researchers and developers working on clinical language systems worldwide to contribute code and further develop the systems.
"Large-scale information extraction from the clinical narrative is a vital component in advancing translational research and patient care," adds Guergana Savova, PhD, medical informatics specialist and Mayo's NLP lead on the project. "It 'unlocks' the clinical textual data that resides in huge repositories. Such technology would allow for large-scale data aggregation, analyses and usage -- just imagine the power of data from millions of patients."
The two clinical text solutions released open-source by Mayo Clinic and IBM aim at processing two specific types of notes. Clinical notes describe patient-physician encounters, while pathology reports center around tissue findings.
Mayo's open-source solution, clinical Text Analysis and Knowledge Extraction System (cTAKES), focuses on processing the patient-centric clinical notes. IBM's medKAT systems (medical Knowledge Analysis Tool) is a UIMA-based, modular and flexible system that uses advanced NLP techniques to extract structured information from unstructured data sources, such as pathology reports, clinical notes, discharge summaries and medical literature. It has been designed to operate within institutional systems or databases of any size, according to the company.