Extração de Informação e Processamento de Linguagem Natural ...

Extração de Informação e Processamento de Linguagem Natural ...

4e dition du Symposium sur l'Ingnierie de l'Information Mdicale Les 23 et 24 Novembre 2017 Toulouse Keynote address: Stefan Schulz Medical University of Graz (Austria) purl.org/steschu Annotating clinical narratives with SNOMED CT Aspects of Reliability and Semantic Interoperability Typical information engineering workflow Data Acquisition Data Representation Reasoning

Result Typical information engineering workflow Data Acquisition Data Reasoning A Result A Reasoning B Result B Representation Typical information engineering workflow Data Acquisition

Representation A Reasoning Result A Representation B Reasoning Result B Data Typical information engineering workflow reliability Data Acquisition validity Representation A Reasoning Result A

Representation B Reasoning Result B Data Focus on data acquisition for information engineering Data Acquisition A DA Representation Reasoning Result A Data Acquisition B

DB Representation Reasoning Result B Data reliability Data interoperability high Data Acquisition A DA=DB DA DA Data Acquisition B DA DB

DB DA low Data reliability Data interoperability Data Acquisition A Data Acquisition B DA DA Query: , Focus of the talk

Analysis of coded extracts from clinical texts Inter-annotator agreement (reliability) Reasons for inter-annotator disagreement Discussion: how to improve agreement Assumption: better data reliability -> better semantic interoperability of clinical data Annotating clinical narratives with SNOMED CT Annotating clinical narratives with SNOMED CT Coding observation map standardised reality observed representation Vocabularies Ontologies Annotation

observation, interpretation Resektat nach Whipple": Ein noch nicht erffnetes Resektat, bestehend aus einem distalen Magen mit einer kleinen Kurvaturlnge von 9,5 cm und einer groen Kurvaturlnge von 13,5 cm, sowei einem duodenalen Anteil von 14 cm Lnge. 2 cm aboral des Pylorus zeigt die Dnndarmwandung eine sanduhrartige intermediate representation map standardised symbols representation Annotating clinical narratives with SNOMED CT Huge clinical reference terminology representable as OWL EL (quasi-) ontological

definitional and qualifying axioms eHealth standard, maintained by transnational SDO SNOMED CT multiple hierarchies ~300,000 "concepts" preferred terms and synonyms in several languages covers disorders, procedures, body parts, substances, devices, organisms, qualities Annotation: Sources of complexity Resektat nach Whipple": Ein noch nicht erffnetes Resektat, bestehend aus einem distalen Magen

mit einer kleinen Kurvaturlnge von 9,5 cm und einer groen Kurvaturlnge von 13,5 cm, sowei einem duodenalen Anteil von 14 cm Lnge. 2 cm aboral des Pylorus zeigt die Dnndarmwandung eine sanduhrartige Stenose. Im Magen- und Duodenallumen reichlich zhflssiger Schleim, sanguinolent; Human language - words, multiword terms - syntactic structures - relations at various levels Clinical language Compact Paragrammatical Context-dependent Best text span to annotate? Nave or analytic annotation? map SNOMED CT Ontology - classes - relations

- logical constructors - axioms Terminology - concepts - preferred terms - synonyms - definitions SNOMED CT Ill-defined concepts Similar concepts Pre-coordination vs. postcoordination Complex annotations (> 1 concept / term) Examples Clinical text SNOMED CT concepts (FSNs) 'Duodenal structure (body structure)' " the duodenum . The mucosa is" "Hemorrhagic shock

? ? ? suspected dengue" 'Duodenal mucous membrane structure (body structure)' 'Traffic accident on public road (event)' after RTA " "travel history of 'Mucous membrane structure (body structure)' 'Traffic accident on public road (event)', 'Renal tubular acidosis (disorder)' 'Traffic accident on public road (event)' or 'Renal tubular acidosis (disorder)' 'Suspected dengue (situation)' 'Suspected (qualifier value)' 'Dengue (disorder)' Coding / Annotation guidelines

Examples: 1. German coding guidelines for ICD and OPS, 171 pages 2. Using SNOMED CT in CDA models: 147 pages 3. CHEMDNER-patents: annotation of chemical entities in patent corpus: annotation manual 30 pages 4. CRAFT Concept Annotation guidelines: 47 pages 5. Gene Ontology Annotation conventions: 7 pages Complex rule sets, requiring intensive training 1. 2. 3. 4. 5. http://www.dkgev.de/media/file/21502.Deutsche_Kodierrichtlinien_Version_2016.pdf http://www.snomed.org/resource/resource/249 http://www.biocreative.org/media/store/files/2015/cemp_patent_guidelines_v1.pdf http://bionlp-corpora.sourceforge.net/CRAFT/guidelines/CRAFT_concept_annotation_guidelines.pdf http://geneontology.org/page/go-annotation-conventions Annotation experiments in ASSESS-CT Annotation Annotation experiments of clinical narratives

in ASSESS-CT EU support action on the fitness of SNOMED CT as a EU core reference terminology http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf Annotation Annotation experiments of clinical narratives in ASSESS-CT EU support action on the fitness of SNOMED CT as a EU core reference terminology Domain experts annotate 60 samples of clinical documents with SNOMED CT http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf Nitroglycerin pump spray as

required Amantadine bds Allopurinol 300 tablet every other day (last dose on 20091130) Mefenamic acid 500 mg up to 3x daily for pain in conjunction with simultaneous administration of a drug to protect the stomach e. g. Pantoprazole 40mg. Torasemide bds Melperone 50 mg p. m. 7 Intact teeth are in the mouth. Fractures are visible on the medians of Mandible and Maxilla the fragments are dislocated. Normal mucous membranes in mouth pharynx and on the larynx. Hyoid and thyroid cartilage are intact. Fragmental fractures of the two upper vertebrae of the

cervical spine. Otherwise the cervical spine is intact. Oesophagus as well as trachea are torn at the lower end of the neck. 387404004;385074009;225 761000 372763006;229799001 387135004;385055001;225 760004 387185008;258684004; 229798009;22253000 79970003;416118004; 373517009;69695003 395821003;258684004 318034005;229799001 442519006;258684004; 422133006 11163003;245543004; 123851003 263172003;263156006; 260528009 123735002 17621005;33044003; 71248005

21387005;52940008; 11163003 13321001;207984009; 207983003 122494005;11163003 262793000;282459005; 261122009;123958008 Annotation Annotation experiments of clinical narratives in ASSESS-CT EU support action on the fitness of SNOMED CT as a EU core reference terminology Domain experts annotate 60 samples of clinical documents with SNOMED CT 1/3 of samples annotated twice Support: Webinars, annotation guidelines http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf Nitroglycerin pump spray as

required Amantadine bds Allopurinol 300 tablet every other day (last dose on 20091130) Mefenamic acid 500 mg up to 3x daily for pain in conjunction with simultaneous administration of a drug to protect the stomach e. g. Pantoprazole 40mg. Torasemide bds Melperone 50 mg p. m. 7 Intact teeth are in the mouth. Fractures are visible on the medians of Mandible and Maxilla the fragments are dislocated. Normal mucous membranes in mouth pharynx and on the larynx. Hyoid and thyroid cartilage are intact. Fragmental fractures of the two upper vertebrae of the

cervical spine. Otherwise the cervical spine is intact. Oesophagus as well as trachea are torn at the lower end of the neck. 387404004;385074009;225 761000 372763006;229799001 387135004;385055001;225 760004 387185008;258684004; 229798009;22253000 79970003;416118004; 373517009;69695003 395821003;258684004 318034005;229799001 442519006;258684004; 422133006 11163003;245543004; 123851003 263172003;263156006; 260528009 123735002 17621005;33044003; 71248005

21387005;52940008; 11163003 13321001;207984009; 207983003 122494005;11163003 262793000;282459005; 261122009;123958008 Principal quantitative results (English) Concept coverage [95% CI] SNOMED CT Text annotations English .86 [.82-.88] Term coverage [95% CI] SNOMED CT .68 [.64; .70] Text annotations English Inter annotator agreement Krippendorff's Alpha [95% CI] SNOMED CT

Text annotations .37 [.33-.41] (similar results with alternative annotation task, using non-SNOMED UMLS extract) Krippendorff, Klaus (2013). Content analysis: An introduction to its methodology, 3rd edition. Thousand Oaks, CA: Sage. Agreement map: SNOMED annotations green: agreement yellow: only annotated by one coder red: disagreement - white no annotations Systematic error analysis Systematic error analysis Creation of gold standard for SNOMED CT 20 English text samples annotated twice 208 NPs Analysis of English SNOMED CT annotations by two additional terminology experts Consensus finding, according to pre-established annotation guidelines Inspection, analysis and classification of text annotation disagreements Presentation of some disagreement cases for SNOMED CT

Reasons for disagreement Human issues Lack of domain knowledge / carelessness Tokens Annotator #1 Annotator #2 "IV" 'Structure of abductor 'Abducens hallucis muscle (body nerve structure structure)' (body structure) ' Gold standard 'Abducens nerve structure (body structure)' Retrieval error (synonym not recognised) Tokens Annotator #1

"Glibenclamide" 'Glyburide (substance)' Annotator #2 Gold standard 'Glyburide (substance)' Ontology issues (I) Logical polysemy ("dot categories")* Tokens Annotator #1 Annotator #2 Gold standard 'Lymphoma" 'Malignant lymphoma (disorder)' 'Malignant lymphoma category (morphologic

abnormality)' 'Malignant lymphoma (disorder)' *Alexandra Arapinis, Laure Vieu: A plea for complex categories in ontologies. Applied Ontology 10(3-4): 285-296 (2015) Ontological issues (II) Incomplete definitions Tokens Annotator #1 "Motor: normal bulk and tone" 'Skeletal muscle structure (body structure)' 'Normal (qualifier value)' Annotator #2 'Muscle finding (finding)'

'Normal (qualifier value)' Gold standard 'Skeletal muscle normal (finding)' Ontological issues (II) Incomplete definitions Tokens Annotator #1 Annotator #2 "Motor: normal bulk and tone" 'Skeletal muscle structure (body structure)' 'Normal (qualifier value)' 'Muscle finding

(finding)' 'Normal (qualifier value)' Gold standard 'Skeletal muscle normal (finding)' Tokens Annotator #1 Annotator #2 Gold standard "Former smoker" 'In the past (qualifier value)' 'Smoker (finding)' 'History of (contextual qualifier) (qualifier value)' 'Ex-smoker (finding)' 'Smoker (finding)'

Ontological issues (III) Navigational concepts Tokens Annotator #1 Annotator #2 Gold standard "palpebral fissure" Finding of measures Structure of palpebral Measure of palpebral of palpebral fissure fissure (body fissure (observable (finding) structure) entity) Fuzzy, undefined qualifiers Tokens Annotator #1 'Significant "Significant

bleeding" (qualifier value)' 'Bleeding (finding)' Annotator #2 Gold standard 'Severe (severity modifier) (qualifier value)' 'Bleeding (finding)' 'Moderate (severity modifier) (qualifier value)' 'Bleeding (finding)' Interface term (synonym) issues Tokens Annotator #1 Annotator #2 Gold standard "Blood extravacation"

'Blood (substance)' 'Extravasation (morphologic abnormality)' 'Hemorrhage (morphologic abnormality)' 'Hemorrhage (morphologic abnormality)' "extravasation of blood" Interface term (synonym) issues Tokens Annotator #1 Annotator #2 Gold standard "Blood extravasation"

'Blood (substance)' 'Extravasation (morphologic abnormality)' 'Hemorrhage (morphologic abnormality)' 'Hemorrhage (morphologic abnormality)' "extravasation of blood" Tokens Annotator #1 "anxious" 'Anxiety (finding)' Annotator #2 Gold standard 'Worried (finding)'

'Anxiety (finding)' "anxious cognitions" Prevention and remediation of annotation disagreements Prevention and remediation of annotation disagreements Rationales: More principled SNOMED CT coding of EHR content More principled binding of SNOMED CT codes to clinical models Consistent manual annotations for training corpora and reference standards Improvement of performance of NLP-based annotations Preventive measures Prevention: annotation processes Training with continuous feedback Early detection of inter annotator disagreement triggers guideline enforcement / revision

Tooling Optimised concept retrieval (fuzzy, substring, synonyms) Guideline enforcement by appropriate tools Postcoordination support (complex syntactic expressions instead of simple concept grouping) Prevention: improve SNOMED CT quality Fill gaps Add missing equivalence axioms Self-explaining labels, text definitions where necessary Preference rules to manage polysemy Strengthen ontological foundations Upper-level ontology alignment Better distinction between domain entities and information entities Overhaul problematic subhierarchies, especially qualifiers Prevention: improve content maintenance Data-driven terminology maintenance Harvest notorious disagreements between annotations from clinical datasets Detect imbalances by analysing concept frequency and co-occurrence between comparable institutions Community processes: crowdsourcing of interface

terms by languages, dialects, specialties, user groups (ASSESS-CT: interface terminologies to be maintained separately from reference terminologies) Remediation of annotation disagreements Remediation of annotation disagreements Exploit ontological dependencies / implications Concept A 'Mast cell neoplasm (disorder)' Concept B 'Mast cell neoplasm (morphologic abnormality)' 'Isosorbide dinitrate' 'Isosorbide dinitrate (product)' (substance)' 'Palpation (procedure)' 'Palpation - action (qualifier value)' 'Blood pressure taking 'Blood pressure (procedure)' (observable entity)'

'Increased size 'Increased (qualifier (finding)' value)' 'Finding of heart rate 'Heart rate (finding)' (observable entity)' Dependency A subclassOf AssociatedMorphology some B A subclassOf HasActiveIngredient some B A subclassOf Method some B A subclassOf hasOutcome some B A subclassOf isBearerOf some B A subclassOf Interprets some B Experiment Gold standard expansion: Step 1: include concepts linked by attributive relations: A subclassOf Rel some B Step 2: include additional first-level taxonomic relations: A subclassOf B

Language of text sample Gold standard expansion no expansion English expansion step 1 expansion step 2 only insignificant improvement possibly due to missing relations in SNOMED CT (see "former smoker" and "skeletal muscle normal" examples) just a side issue requires more investigation F measure 0.28 0.28 0.29 Conclusions Conclusions Poor agreement hampers SNOMED CT use: Clinical decision support, cohort building, content retrieval, summarisation, analytics,

(but not specific for SNOMED CT ACCESS CT) Prevention & Remediation: Education, tooling, guidelines Large-scale SNOMED CT content and structure improvement High coverage local interface terminologies, representing real language of clinicians Outlook "Learning systems" for improvement terminology content / structure / tooling. "Clinical big data": pooling of non-re-identifiable annotations from multiple institutions Community efforts for interface terminology creation and maintenance Post processing of SNOMED CT annotations: Stream of codes text knowledge graph *Martnez-Costa, Kalra, Schulz. Semantic enrichment of clinical models towards semantic interoperability. JAMIA 2015 May;22(3):565-76 Thanks for your attention Slides will be made accessible at purl.org/steschu Acknowledgements: ASSESS CT team: Jose Antonio Miarro-Gimnez, Catalina MartnezCosta, Daniel Karlsson, Kirstine Rosenbeck Geg, Kornl Mark, Benny Van Bruwaene, Ronald Cornet, MarieChristine Jaulent, Pivi Hmlinen, Heike Dewenter,

Reza Fathollah Nejad, Sylvia Thun, Veli Stroetmann, Dipak Kalra Contact: [email protected] Ecosystem of semantic assets Process Models Information Models Terminologies Guideline Models Information Models Reference Terminologies describe and standardize a neutral, language-independent sense The meaning of domain terms The properties of the objects that

these terms denote Representational units are commonly called concepts RTs enhanced by formal descriptions = "Ontologies" Guideline Models Information Models AT3 AT2 Core Reference Terminology Systems of non-overlapping classes in single hierarchies, for

data aggregation and ordering. aka classifications, e.g. the WHO classifications Typically used for health statistics and reimbursement AT1 AT4 Aggregation Terminologies (Classifications) Guideline Models Reference and aggregation terminologies represent / organize the domain They are not primarily representations of language They use human language labels

as a means to univocally describe the entities they denote, independently of the language actually used in human communication AT Information Models AT3 2 Core Reference Terminology Systems of non-overlapping classes in single hierarchies, for data aggregation and ordering. aka classifications, e.g. the WHO

classifications Typically used for health statistics and reimbursement AT1 AT4 Guideline Models Information Models Collections of terms used in written and oral communication within a group of users Terms often ambiguous. Entries in user interface terminologies to be further specified by language, dialect, time, sub(domain), user group.

User Interface Terminology (language specific) Guideline Models User Interface Terminology (e.g. Portuguese) [chemistry] [oncology] "Ca" "Clcio" "Ca" "Cncer" "Carcinoma" Reference Terminology 5540006 | Calcium (substance) | 68453008 |

Carcinoma (morphologic abnormality) | User Interface Terminology AT3 Information Models AT2 RT1 Process Models RT4 Core Reference Terminology AT1 RT2 AT4 RT3

Guideline Models MUG-GIT: Creation of German Interface Terminologie for SNOMED CT Rules Chunker All SCT descriptions (EN) n-gram translations Human Validation dependent on use cases e.g. input for official translation e.g. starting point for crowdsourcing process for interface term

generation lexicon for NLP approaches Clinical corpus (DE) New Token translations Human curation Phrase generation rules Term reassembling heuristics Raw full terms

(DE) Reference corpus (DE) POS tags n- grams (EN) Non- Translatable SCT descriptions Rules rule exec untranslated tokens Translatable SCT descriptions (EN) filter concepts with identical terms across translations

Char translation rule acquisition Token translations Curated ngram translations(DE) correct most frequent mistranslations remove wrong translations check POS tags normalise adjectives add synonyms n- grams (DE) ngram core vocabulary vaginal fluoroscopic guidance disc lower limb

brain preparation method of bone Red Monitoring Computed phalanx subsp. anastomosis vessel Computed tomography uterus difficulty elbow high food Observation using fluoroscopic unable Peripheral unable to Vascular using fluoroscopic guidance Benign neoplasm 1

2 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 1 3

2 1478 vaginales|JJ 1477 Durchleuchtungskontrolle|NN|F 1476 Scheibe|NN|F 1473 unteres|JJ Extremitt|NN|F 1468 Gehirn|NN|N 1464 Zubereitung|NN|F 1463 Verfahren|NN|N 1462 des Knochens 1455 rotes|JJ 1453 berwachung|NN|F 1453 berechnetes|JJ 1449 Phalanx|NN|F 1449 1447 Anastomose|NN|F 1446 Blutgef|NN|N 1443 Computertomographie|NN|F 1436 Uterus|NN|M 1432 Schwierigkeit|NN|F 1429 Ellbogen|NN|M 1429 hohes|JJ 1423 Lebensmitte l|NN|N 1423 Beobachtung|NN|F 1422 1421 unfhiges|JJ 1419 peripheres|JJ

1418 unfhig zu 1417 vaskulres|JJ 1416 mit Durchleuchtungskontrolle 1415 gutartiges|JJ Neubildung|NN|F Scheiden- Bein|NN|N Hirn|NN|N Aufbe reitung|NN|F Methode|NN|F _Knochen_ Encephalon|NN|N Prparation|NN|F Monitoring|NN|N Computer- Anastomosierung|NN|F Gef|NN|N Gebrmutte r|NN|F Cubitus|NN|M Ellbogengelenk|NN|N Speise|NN|F

Nahrungsmitte l|NN|N Gefgutartiges|JJ Neoplasie|NN|F benignes|JJ Neoplasie|NN|F Machine-generated Interface terms 20170315_240011_002 20170315_240011_003 20170315_240011_004 20170315_240011_005 20170315_240011_006 20170315_240011_007 20170315_240011_008 20170315_240011_009 20170315_240011_010 20170315_240011_011 20170315_240011_012 20170315_241010_001 20170315_241010_002 20170315_241010_003 20170315_242015_001 20170315_242015_002 20170315_242015_003 20170315_242015_004 20170315_242015_005

20170315_242015_006 20170315_243013_001 20170315_243013_002 20170315_243013_003 20170315_243013_004 20170315_243013_005 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126952004 Neoplasm of brain 126953009 Neoplasm of cerebrum 126953009 Neoplasm of cerebrum 126953009 Neoplasm of cerebrum 126954003 Neoplasm of frontal lobe 126954003 Neoplasm of frontal lobe 126954003 Neoplasm of frontal lobe 126954003 Neoplasm of frontal lobe 126954003 Neoplasm of frontal lobe 126954003 Neoplasm of frontal lobe

126955002 Neoplasm of temporal lobe 126955002 Neoplasm of temporal lobe 126955002 Neoplasm of temporal lobe 126955002 Neoplasm of temporal lobe 126955002 Neoplasm of temporal lobe Gehirnneubildung Neubildung des Hirns Hirnneubildung Neoplasie des Gehirns Gehirnneoplasie Neoplasie des Hirns Hirnneoplasie Neoplasma des Gehirns Gehirnneoplasma Neoplasma des Hirns Hirnneoplasma Neubildung des Grohirns Neoplasie des Grohirns Neoplasma des Grohirns Neubildung des Frontallappens Neubildung des Lobus frontalis Neoplasie des Frontallappens Neoplasie des Lobus frontalis Neoplasma des Frontallappens Neoplasma des Lobus frontalis Neubildung des Temporallappens

Neubildung des Lobus temporalis Neoplasie des Temporallappens Neoplasie des Lobus temporalis Neoplasma des Temporallappens

Recently Viewed Presentations

  • Intermediate Algebra Chapter 6 - Richland Community College

    Intermediate Algebra Chapter 6 - Richland Community College

    Intermediate Algebra 098A Special Factoring Objectives:Factor a difference of squares a perfect square trinomial a sum of cubes a difference of cubes Factor the Difference of two squares Special Note The sum of two squares is prime and cannot be...
  •  A'amaal for Laylatul Qadr (Arabic text with English

    A'amaal for Laylatul Qadr (Arabic text with English

    Please open the . Holy Qur'an . and recite the following: اعمال للَيْلَةُ الْقَدْرِ. A'amaal for . Laylatul Qadr. When opening the Qur'an and praying for your needs with the intercession of the Qur'an, realize that while the Qur'an is...
  • In the law, the language of individual rights comes easily ...

    In the law, the language of individual rights comes easily ...

    In the law, the language of individual rights comes easily while the language of community is more foreign.. Robert F. Cochran Jr. and Robert M. Ackerman, Law and Community: the Case of Torts (2004).
  • ROSS Resource Ordering and Status System

    ROSS Resource Ordering and Status System

    ROSS Resource Ordering and Status System Author: A satisfied Microsoft Office User Last modified by ... Times New Roman Arial Book Antiqua Century Gothic Apothecary 1_Apothecary 2_Apothecary 3_Apothecary 4_Apothecary 5_Apothecary 6_Apothecary Contracts Unit Objectives Contracts and Agreements ...
  • Utilitarianism: happiness and preferences

    Utilitarianism: happiness and preferences

    What is happiness? Bentham and Mill: happiness = pleasure and the absence of pain. Mill: Happiness is not continuous pleasurable excitement, 'but moments of such, in an existence made up of few and transitory pains, many and various pleasures, with...
  • Using Gender Statistics: a toolkit for training data

    Using Gender Statistics: a toolkit for training data

    The public audience is made up of all members of society who want information in a personal capacity or on behalf of another organisation. The professional audience is made up of professionals in the public, private and educational sectors who...
  • Elm City Middle School Athletic Parent Meeting

    Elm City Middle School Athletic Parent Meeting

    Elm City Middle School Athletic Student Meeting Reasons for meeting to answer questions students may have about our athletic programs to communicate and pass along key information to allow students to meet the coaches to prevent any issues from arising...
  • Objectives  Be able to describe and explain the

    Objectives Be able to describe and explain the

    Be able to describe and explain the type of weather associated with the different fronts in a depression. Objectives Be able to describe how the weather changes with the passage of a depression. Be able to reproduce a diagram of...