NLP: the missing framework

Rahul Bhargava rhlbhrgv at gmail.com
Fri Feb 1 04:11:29 GMT 2013


Hi Eric,

Perhaps a tough problem executed well will make Haskell NLP stand out.

A while back, I came across a real world application that should be
addressable using existing Haskell NLP tools in the healthcare domain, with
large dictionaries, rules and logic.

Related to ICD-10, the 10th revision of the International Statistical
Classification of Diseases and Related Health
Problems<http://en.wikipedia.org/wiki/International_Statistical_Classification_of_Diseases_and_Related_Health_Problems>(ICD),
a medical classification list by the World Health Organization (WHO).

The classification was not designed to be computer processable and has been
critised for being arbitrary at times. Nonetheless, the deadline for the
United States to begin using Clinical Modification
ICD-10-CM<http://en.wikipedia.org/wiki/ICD-10_Clinical_Modification>for
diagnosis coding and Procedure Coding System
ICD-10-PCS <http://en.wikipedia.org/wiki/ICD-10_Procedure_Coding_System>for
inpatient hospital procedure coding is currently October 1, 2014.

It is in the process of being adopted by several countries, reluctantly by
clinics and hospitals for the overheads involved, with some country
modifications. I was attempting to facilitate its adoption, as it is the
prevailing standard, in a low-resource setting during migration to
electronic health records at a NGO.

Also there's "SNOMED CT (Systematized Nomenclature Of Medicine *C*linical *T
*erms) [2,3], a systematically organised computer processable collection of
medical terms providing codes, terms, synonyms and definitions covering
diseases, findings, procedures, microorganisms, substances, etc. It allows
a consistent way to index, store, retrieve, and aggregate clinical data
across specialties and sites of care." It requires a license though there
is no charge in IHTSDO Member countries <http://www.ihtsdo.org/members/>,
nor for approved research projects and public good uses.

SNOMED CT's relational statements are triplets.

The interpretation of these triplets is based on the semantics of a simple
Description logic.

forall x: instance-of (x, *Common cold*) -> exists y: instance-of (y, *Virus
*) and causative-agent* *(y, x)

equivalently,

*Common cold* subClassOf causative-agent some *Virus*


There was some Java software available from NLM that was rule based. An
example implementation is online, iMagic. "(Interactive Map-Assisted
Generation of ICD Codes) Algorithm utilizes the SNOMED CT to ICD-10-CM Map
in a real-time, interactive manner to generate ICD-10-CM codes."
http://imagic.nlm.nih.gov/imagic/code/map

This space seems particularly addressable by the Grammatical Framework.


Rahul

[1] http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/ICD10PCS/
[2] http://en.wikipedia.org/wiki/SNOMED_CT
[3]
http://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html

PS There are similar ontologies, though not as complex, proposed by the FAO.

On 19 January 2013 14:09, Eric Kow <eric.kow at gmail.com> wrote:

> Hi NLP Haskellers,
> [...]
>
How can we push things a little bit more in the right direction in the
> Haskell NLP world? What is the right direction?
>
> See comments as well.
>
> --
> Eric Kow <http://erickow.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://projects.haskell.org/pipermail/nlp/attachments/20130201/c949a975/attachment.htm>


More information about the NLP mailing list