: β β , , (NER)? , , "Machine Learning Deep Learning" .
. , , .
, QuickUMLS. QuickUMLS [1] β β (, , ) , (UMLS). . QuickUMLS . QuickUMLS MedMentions [2].
, NER
, , NER. NER (, , , . .) . , , , . , , " ", , β , - . , , "" β , "", , .
NER , , , , "." (hospital), " / " "/" (alcohol). , , , . , "alcohol" " alcohol" [ , , alcohol]. , , , , . NER . Slimmer AI, .
, , , , , . (UMLS), , . , "" "", . , "alcohol" .
UMLS (CUI), , (STY), , , , . , UMLS , , β . UMLS 2020AB, , 3 . , .
MedMentions
MedMentions. 4 392 ( ), Pubmed 2016 ; 352 K ( CUI) UMLS. 34 β 1 % UMLS. , UMLS , .
, MedMentions CUI, . , , UMLS . UMLS 127 , . MedMentions β st21pv, , , 21 .
45,3 F- [2]. , BlueBERT [3] BioBERT [4], 56,3 , [5]. , , , . , . MedMentions.
QuickUMLS:
BERT QuickUMLS , , . , QuickUMLS β , . , , , , . :
. , .
. , , . β zero-shot.
Zero-shot learning (ZSL) β , , , , .
, , MedMentions. , MedMentions UMLS, . , MedMentions , .
QuickUMLS
QuickUMLS . spacy. n-, , , -. , n-, . [1]. UMLS , , n-. , simstring [6]. QuickUMLS, , UMLS . , β β, ( ) 0,7, :
patient:
{βtermβ: βInpatientβ, βcuiβ: βC1548438β, βsimilarityβ: 0.71, βsemtypesβ: {βT078β}, βpreferredβ: 1}, {βtermβ: βInpatientβ, βcuiβ: βC1549404β, βsimilarityβ: 0.71, βsemtypesβ: {βT078β}, βpreferredβ: 1}, {βtermβ: βInpatientβ, βcuiβ: βC1555324β, βsimilarityβ: 0.71, βsemtypesβ: {βT058β}, βpreferredβ: 1}, {βtermβ: β*^patientβ, βcuiβ: βC0030705β, βsimilarityβ: 0.71, βsemtypesβ: {βT101β}, βpreferredβ: 1}, {βtermβ: βpatientβ, βcuiβ: βC0030705β, βsimilarityβ: 1.0, βsemtypesβ: {βT101β}, βpreferredβ: 0}, {βtermβ: βinpatientβ, βcuiβ: βC0021562β, βsimilarityβ: 0.71, βsemtypesβ: {βT101β}, βpreferredβ: 0}
hemmorhage:
{βtermβ: βNo hemorrhageβ, βcuiβ: βC1861265β, βsimilarityβ: 0.72, βsemtypesβ: {βT033β}, βpreferredβ: 1},
{βtermβ: βhemorrhaginβ, βcuiβ: βC0121419β, βsimilarityβ: 0.7, βsemtypesβ: {βT116β, βT126β}, βpreferredβ: 1},
{βtermβ: βhemorrhagicβ, βcuiβ: βC0333275β, βsimilarityβ: 0.7, βsemtypesβ: {βT080β}, βpreferredβ: 1},
{βtermβ: βhemorrhageβ, βcuiβ: βC0019080β, βsimilarityβ: 1.0, βsemtypesβ: {βT046β}, βpreferredβ: 0},
{βtermβ: βGI hemorrhageβ, βcuiβ: βC0017181β, βsimilarityβ: 0.72, βsemtypesβ: {βT046β}, βpreferredβ: 0},
{βtermβ: βHemorrhagesβ, βcuiβ: βC0019080β, βsimilarityβ: 0.7, βsemtypesβ: {βT046β}, βpreferredβ: 0}
, βpatientβ (T101) (C0030705). ββ , "No hemmorhage". , , .
QuickUMLS , , 1, . () β (baseline model). seqeval , [5].
βββββ¦βββββββ¦ββββββββ
β β BERT β QUMLS β
β ββββ¬βββββββ¬ββββββββ£
β P β .53 β .27 β
β R β .58 β .36 β
β F β .56 β .31 β
βββββ©βββββββ©ββββββββ
1 β
, ? , , . , .
QuickUMLS
QuickUMLS . -, , , QuickUMLS, spacy, . . en_core_web_sm. , , . spacy scispacy [7], en_core_sci_sm. - .
βββββ¦βββββββ¦ββββββββ¦ββββββββββ
β β BERT β QUMLS β + Spacy β
β ββββ¬βββββββ¬ββββββββ¬ββββββββββ£
β P β .53 β .27 β .29 β
β R β .58 β .36 β .37 β
β F β .56 β .31 β .32 β
βββββ©βββββββ©ββββββββ©ββββββββββ
2 β scispacy
, . QuickUMLS , - . , ββ : , , , .
QuickUMLS
QuickUMLS 0,7 . , , βJaccardβ, βcosineβ, βoverlapβ βdiceβ. , . 0,99, , SimString βJaccardβ, . , BERT.
βββββ¦βββββββ¦ββββββββ¦ββββββββββ¦βββββββββ
β β BERT β QUMLS β + Spacy β + Grid β
β ββββ¬βββββββ¬ββββββββ¬ββββββββββ¬βββββββββ£
β P β .53 β .27 β .29 β .37 β
β R β .58 β .36 β .37 β .37 β
β F β .56 β .31 β .32 β .37 β
βββββ©βββββββ©ββββββββ©ββββββββββ©βββββββββ
3 β
, , , , . , , , βalcoholβ. , , , . , , , , . .
βββββ¦βββββββ¦ββββββββ¦ββββββββββ¦βββββββββ¦βββββββββββ
β β BERT β QUMLS β + Spacy β + Grid β + Priors β
β ββββ¬βββββββ¬ββββββββ¬ββββββββββ¬βββββββββ¬βββββββββββ£
β P β .53 β .27 β .29 β .37 β .39 β
β R β .58 β .36 β .37 β .37 β .39 β
β F β .56 β .31 β .32 β .37 β .39 β
βββββ©βββββββ©ββββββββ©ββββββββββ©βββββββββ©βββββββββββ
4 β
, , , QuickUMLS. , 0,99, , QuickUMLS. , QuickUMLS.
: ?
, . -, , . , , : , , βalcoholβ , . -, , . β β. . β β β, ββ. - , , UMLS , . , :
, , QuickUMLS, . , , , . , QuickUMLS , .
, NER . , R&D . QuickUMLS , . , , , . QuickUMLS , github. , , , , , .
β , β : , , , .
, , , β , , "Machine Learning Deep Learning", NVIDIA.
[1] L. Soldaini, and N. Goharian. Quickumls: a fast, unsupervised approach for medical concept extraction, (2016), MedIR workshop, SIGIR
[2] S. Mohan, and D. Li, Medmentions: a large biomedical corpus annotated with UMLS concepts, (2019), arXiv preprint arXiv:1902.09476
[3] Y. Peng, Q. Chen, and Z. Lu, An empirical study of multi-task learning on BERT for biomedical text mining, (2020), arXiv preprint arXiv:2005.02799
[4] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, and J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, (2020), Bioinformatics, 36(4)
[5] K.C. Fraser, I. Nejadgholi, B. De Bruijn, M. Li, A. LaPlante and K.Z.E. Abidine, Extracting UMLS concepts from medical text using general and domain-specific deep learning models, (2019), arXiv preprint arXiv:1910.01274.
[6] N. Okazaki, and J.I. Tsujii, Simple and efficient algorithm for approximate dictionary matching, (2010, August), In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
[7] M. Neumann, D. King, I. Beltagy, and W. Ammar, Scispacy: Fast and robust models for biomedical natural language processing, (2019), arXiv preprint arXiv:1902.07669.