Case Study of Named Entity Recognition in Biomedicine

: β€” β€” , , (NER)? , , "Machine Learning Deep Learning" .






. , , .





, QuickUMLS. QuickUMLS [1] β€” β€” (, , ) , (UMLS). . QuickUMLS . QuickUMLS MedMentions [2].





Figure 1. A schematic description of how QuickUMLS works.  Having received a string, a UMLS database turned into a simstring database, the model returns optimal matches, concept identifiers and semantic types
1. , QuickUMLS. , UMLS, simstring, ,

, NER

, , NER. NER (, , , . .) . , , , . , , " ", , β€” , - . , , "" β€” , "", , .





NER , , , , "." (hospital), " / " "/" (alcohol). , , , . , "alcohol" " alcohol" [ , , alcohol]. , , , , . NER . Slimmer AI, .





, , , , , . (UMLS), , . , "" "", . , "alcohol" .





UMLS (CUI), , (STY), , , , . , UMLS , , β€” . UMLS 2020AB, , 3 . , .





MedMentions

MedMentions. 4 392 ( ), Pubmed 2016 ; 352 K ( CUI) UMLS. 34 β€” 1 % UMLS. , UMLS , .





, MedMentions CUI, . , , UMLS . UMLS 127 , . MedMentions β€” st21pv, , , 21 .





45,3 F- [2]. , BlueBERT [3] BioBERT [4], 56,3 , [5]. , , , . , . MedMentions.





QuickUMLS:

BERT QuickUMLS , , . , QuickUMLS β€” , . , , , , . :





  1. . , .





  2. . , , . β€” zero-shot.





Zero-shot learning (ZSL) β€” , , , , .





, , MedMentions. , MedMentions UMLS, . , MedMentions , .





QuickUMLS

QuickUMLS . spacy. n-, , , -.  , n-, . [1]. UMLS , , n-. , simstring [6]. QuickUMLS, , UMLS . , β€œ ”, ( ) 0,7, :





patient:





{β€˜term’: β€˜Inpatient’, β€˜cui’: β€˜C1548438’, β€˜similarity’: 0.71, β€˜semtypes’: {β€˜T078’}, β€˜preferred’: 1},
{β€˜term’: β€˜Inpatient’, β€˜cui’: β€˜C1549404’, β€˜similarity’: 0.71, β€˜semtypes’: {β€˜T078’}, β€˜preferred’: 1},
{β€˜term’: β€˜Inpatient’, β€˜cui’: β€˜C1555324’, β€˜similarity’: 0.71, β€˜semtypes’: {β€˜T058’}, β€˜preferred’: 1},
{β€˜term’: β€˜*^patient’, β€˜cui’: β€˜C0030705’, β€˜similarity’: 0.71, β€˜semtypes’: {β€˜T101’}, β€˜preferred’: 1},
{β€˜term’: β€˜patient’, β€˜cui’: β€˜C0030705’, β€˜similarity’: 1.0, β€˜semtypes’: {β€˜T101’}, β€˜preferred’: 0},
{β€˜term’: β€˜inpatient’, β€˜cui’: β€˜C0021562’, β€˜similarity’: 0.71, β€˜semtypes’: {β€˜T101’}, β€˜preferred’: 0}
      
      



hemmorhage:





{β€˜term’: β€˜No hemorrhage’, β€˜cui’: β€˜C1861265’, β€˜similarity’: 0.72, β€˜semtypes’: {β€˜T033’}, β€˜preferred’: 1},
{β€˜term’: β€˜hemorrhagin’, β€˜cui’: β€˜C0121419’, β€˜similarity’: 0.7, β€˜semtypes’: {β€˜T116’, β€˜T126’}, β€˜preferred’: 1},
{β€˜term’: β€˜hemorrhagic’, β€˜cui’: β€˜C0333275’, β€˜similarity’: 0.7, β€˜semtypes’: {β€˜T080’}, β€˜preferred’: 1},
{β€˜term’: β€˜hemorrhage’, β€˜cui’: β€˜C0019080’, β€˜similarity’: 1.0, β€˜semtypes’: {β€˜T046’}, β€˜preferred’: 0},
{β€˜term’: β€˜GI hemorrhage’, β€˜cui’: β€˜C0017181’, β€˜similarity’: 0.72, β€˜semtypes’: {β€˜T046’}, β€˜preferred’: 0},
{β€˜term’: β€˜Hemorrhages’, β€˜cui’: β€˜C0019080’, β€˜similarity’: 0.7, β€˜semtypes’: {β€˜T046’}, β€˜preferred’: 0}
      
      



, β€œpatient” (T101) (C0030705). β€œβ€ , "No hemmorhage". , , .





QuickUMLS , , 1, . () β€” (baseline model). seqeval , [5].





╔═══╦══════╦═══════╗
β•‘   β•‘ BERT β•‘ QUMLS β•‘
╠═══╬══════╬═══════╣
β•‘ P β•‘  .53 β•‘   .27 β•‘
β•‘ R β•‘  .58 β•‘   .36 β•‘
β•‘ F β•‘  .56 β•‘   .31 β•‘
β•šβ•β•β•β•©β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•
 1 β€”   
      
      



, ? , , . , .





QuickUMLS

QuickUMLS . -, , , QuickUMLS, spacy, . . en_core_web_sm. , , . spacy scispacy [7], en_core_sci_sm. - .





╔═══╦══════╦═══════╦═════════╗
β•‘   β•‘ BERT β•‘ QUMLS β•‘ + Spacy β•‘
╠═══╬══════╬═══════╬═════════╣
β•‘ P β•‘  .53 β•‘   .27 β•‘     .29 β•‘
β•‘ R β•‘  .58 β•‘   .36 β•‘     .37 β•‘
β•‘ F β•‘  .56 β•‘   .31 β•‘     .32 β•‘
β•šβ•β•β•β•©β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•
 2 β€”   scispacy
      
      



, . QuickUMLS , - . , β€œβ€ : , , , .





QuickUMLS

QuickUMLS 0,7 . , , β€œJaccard”, β€œcosine”, β€œoverlap” β€œdice”. , . 0,99, , SimString β€œJaccard”, . , BERT.





╔═══╦══════╦═══════╦═════════╦════════╗
β•‘   β•‘ BERT β•‘ QUMLS β•‘ + Spacy β•‘ + Grid β•‘
╠═══╬══════╬═══════╬═════════╬════════╣
β•‘ P β•‘  .53 β•‘   .27 β•‘     .29 β•‘    .37 β•‘
β•‘ R β•‘  .58 β•‘   .36 β•‘     .37 β•‘    .37 β•‘
β•‘ F β•‘  .56 β•‘   .31 β•‘     .32 β•‘    .37 β•‘
β•šβ•β•β•β•©β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•
 3 β€”    
      
      



, , , , . , , , β€œalcohol”. , , , . , , , , . .





, . , , , , , . . , .





╔═══╦══════╦═══════╦═════════╦════════╦══════════╗
β•‘   β•‘ BERT β•‘ QUMLS β•‘ + Spacy β•‘ + Grid β•‘ + Priors β•‘
╠═══╬══════╬═══════╬═════════╬════════╬══════════╣
β•‘ P β•‘  .53 β•‘   .27 β•‘     .29 β•‘    .37 β•‘      .39 β•‘
β•‘ R β•‘  .58 β•‘   .36 β•‘     .37 β•‘    .37 β•‘      .39 β•‘
β•‘ F β•‘  .56 β•‘   .31 β•‘     .32 β•‘    .37 β•‘      .39 β•‘
β•šβ•β•β•β•©β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•
 4 β€”  
      
      



, , , QuickUMLS. , 0,99, , QuickUMLS. , QuickUMLS.





: ?

, . -, , . , , : , , β€œalcohol” , . -, , . β€œ ”. . β€” β€œ ”, β€œβ€. - , , UMLS , . , :





, , QuickUMLS, . , , , . , QuickUMLS , .





, NER . , R&D . QuickUMLS , . , , , . QuickUMLS , github. , , , , , .





β€” , β€” : , , , .





, , , β€” , , "Machine Learning Deep Learning", NVIDIA.





[1] L. Soldaini, and N. Goharian. Quickumls: a fast, unsupervised approach for medical concept extraction, (2016), MedIR workshop, SIGIR





[2] S. Mohan, and D. Li, Medmentions: a large biomedical corpus annotated with UMLS concepts, (2019), arXiv preprint arXiv:1902.09476





[3] Y. Peng, Q. Chen, and Z. Lu, An empirical study of multi-task learning on BERT for biomedical text mining, (2020), arXiv preprint arXiv:2005.02799





[4] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, and J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, (2020), Bioinformatics, 36(4)





[5] K.C. Fraser, I. Nejadgholi, B. De Bruijn, M. Li, A. LaPlante and K.Z.E. Abidine, Extracting UMLS concepts from medical text using general and domain-specific deep learning models, (2019), arXiv preprint arXiv:1910.01274.





[6] N. Okazaki, and J.I. Tsujii, Simple and efficient algorithm for approximate dictionary matching, (2010, August), In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)





[7] M. Neumann, D. King, I. Beltagy, and W. Ammar, Scispacy: Fast and robust models for biomedical natural language processing, (2019), arXiv preprint arXiv:1902.07669.





, :





  • Data Scientist





  • Data Analyst





  • Data Engineering









  • Fullstack- Python





  • Java-





  • QA- JAVA





  • Frontend-









  • C++





  • Unity





  • -





  • iOS-





  • Android-









  • Machine Learning





  • "Machine Learning Deep Learning"





  • " Data Science"





  • " Machine Learning Data Science"





  • "Python -"





  • " "









  • DevOps








All Articles