Crosslingual Argument Mining in the Medical Domain

  1. Yeginbergen, Anar
  2. Agerri, Rodrigo
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2024

Número: 73

Páginas: 296-312

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

La tecnología basada en Inteligencia Artificial tiene una gran potencialidad para desarrollar asistentes que ayuden a profesionales médicos en la toma de decisiones, las cuales en muchos casos están basadas en el procesamiento de una gran cantidad de textos no-estructurados. En este contexto, la minería de argumentos (AM) puede ayudar a estructurar los datos textuales en componentes argumentativos y las relaciones discursivas existentes entre ellos. Sin embargo, al igual que todavía ocurre en muchas tareas de Procesamiento del Lenguaje Natural, la gran mayoría del trabajo sobre argumentación computacional en el dominio medico se ha centrado únicamente en ingles. En este articulo investigamos varias estrategias para realizar AM en textos médicos para un idioma como el español, para el cual no existen datos manualmente etiquetados. Nuestro trabajo muestra que traducir y proyectar automáticamente anotaciones del ingles a un idioma de destino determinado como el español es una forma eficaz de generar datos anotados sin necesidad de realizar anotación manual. Por otra parte, se demuestra experimentalmente que traducir y proyectar obtiene mejores resultados que los métodos basados en las capacidades de transferencia crosslingüe de modelos de lenguaje multilingües. Finalmente, usamos los datos automáticamente generados para español para mejorar los resultados originales en inglés, proporcionando así una estrategia de aumento de datos totalmente automática.

Referencias bibliográficas

  • Accuosto, P., M. Neves, and H. Saggion. 2021. Argumentation mining in scientific literature: From computational linguistics to biomedicine. In Frommholz I, Mayr P, Cabanac G, Verberne S, editors. BIR 2021: 11th International Workshop on Bibliometric-enhanced Information Retrieval; 2021 Apr 1; Lucca, Italy. Aachen: CEUR; 2021. p. 20-36, pages 20–36. CEUR Workshop Proceedings.
  • Agerri, R. and E. Agirre. 2023. Lessons learned from the evaluation of Spanish Language Models. Proces. del Leng. Natural, 70:157–170.
  • Agerri, R., Y. Chung, I. Aldabe, N. Aranberri, G. Labaka, and G. Rigau. 2018. Building named entity recognition taggers via parallel corpora. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
  • Alamri, A. and M. Stevenson. 2016. A corpus of potentially contradictory research claims from cardiovascular research abstracts. Journal of biomedical semantics, 7(1):1–9.
  • Artetxe, M., S. Ruder, and D. Yogatama. 2020. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637. Association for Computational Linguistics.
  • Beltagy, I., K. Lo, and A. Cohan. 2019. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.
  • Chen, X., A. H. Awadallah, H. Hassan, W. Wang, and C. Cardie. 2019. Multi-source cross-lingual model transfer: Learning what to share. In A. Korhonen, D. Traum, and L. Márquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3098–3112. Association for Computational Linguistics.
  • Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised Crosslingual Representation Learning at Scale. In ACL.
  • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. Bert: Pretraining of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805.
  • Dou, Z.-Y. and G. Neubig. 2021. Word alignment by fine-tuning embeddings on parallel corpora. arXiv preprint arXiv:2101.08231.
  • Eger, S., J. Daxenberger, C. Stab, and I. Gurevych. 2018. Cross-lingual argumentation mining: Machine translation (and a bit of projection) is all you need! In Proceedings of the 27th International Conference on Computational Linguistics, pages 831–844. Association for Computational Linguistics.
  • Fan, A., S. Bhosale, H. Schwenk, Z. Ma, A. El-Kishky, S. Goyal, M. Baines, O. Celebi, G. Wenzek, V. Chaudhary, et al. 2021. Beyond english-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48.
  • Gaddy, D. M., Y. Zhang, R. Barzilay, and T. S. Jaakkola. 2016. Ten pairs to tagmultilingual pos tagging via coarse mapping between embeddings. Association for Computational Linguistics.
  • García-Ferrero, I., R. Agerri, and G. Rigau. 2022. Model and data transfer for crosslingual sequence labelling in zero-resource settings. In In Findings of EMNLP.
  • Green, N., E. Cabrio, S. Villata, and A. Wyner. 2014. Argumentation for scientific claims in a biomedical research article. In ArgNLP, pages 21–25.
  • Lee, J., W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang. 2019. Biobert: A pre-trained biomedical language representation model for biomedical text mining. bioinformatics, btz682.
  • Lewis, P., B. Oguz, R. Rinott, S. Riedel, and H. Schwenk. 2020. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7315–7330. Association for Computational Linguistics.
  • Li, M., S. Geng, Y. Gao, S. Peng, H. Liu, and H. Wang. 2017. Crowdsourcing argumentation structures in chinese hotel reviews. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 87–92. IEEE.
  • Liu, Z., G. I. Winata, S. Cahyawijaya, A. Madotto, Z. Lin, and P. Fung. 2020. On the importance of word order information in cross-lingual sequence labeling. In AAAI Conference on Artificial Intelligence.
  • Mayer, T., S. Marro, E. Cabrio, and S. Villata. 2021. Enhancing evidence-based medicine with natural language argumentative analysis of clinical trials. Artificial Intelligence in Medicine, 118:102098.
  • Mochales, R. and A. Ieven. 2009. Creating an argumentation corpus: do theories apply to real arguments? a case study on the legal argumentation of the echr. In Proceedings of the 12th international conference on artificial intelligence and law, pages 21–30.
  • Peldszus, A. and M. Stede. 2013. From argument diagrams to argumentation mining in texts: A survey. International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 7(1):1–31.
  • Pires, T. J. P., E. Schlinger, and D. Garrette. 2019. How Multilingual is Multilingual BERT? In ACL.
  • Sabet, M. J., P. Dufter, F. Yvon, and H. Schütze. 2020. Simalign: High quality word alignments without parallel training data using static and contextualized embeddings. arXiv preprint arXiv:2004.08728.
  • Sackett, D. L., W. M. C. Rosenberg, J. A. M. Gray, R. B. Haynes, and W. S. Richardson. 1996. Evidence based medicine: what it is and what it isn’t. BMJ, 312:71–72.
  • Shankar, R. D., S. W. Tu, and M. A. Musen. 2006. Medical arguments in an automated health care system. In AAAI Spring Symposium: Argumentation for Consumers of Healthcare, pages 96–104.
  • Sousa, A., B. Leite, G. Rocha, and H. L. Cardoso. 2021. Cross-lingual annotation projection for argument mining in portuguese. In Portuguese Conference on Artificial Intelligence.
  • Stab, C. and I. Gurevych. 2014. Annotating argument components and relations in persuasive essays. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers, pages 1501–1510.
  • Stab, C. and I. Gurevych. 2017. Parsing argumentation structures in persuasive essays. Computational Linguistics, 43(3):619–659, September.
  • Tang, Y., C. Tran, X. Li, P.-J. Chen, N. Goyal, V. Chaudhary, J. Gu, and A. Fan. 2020. Multilingual translation with extensible multilingual pretraining and finetuning. arXiv preprint arXiv:2008.00401.
  • Tiedemann, J., S. Thottingal, et al. 2020. Opus-mt–building open translation services for the world. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. European Association for Machine Translation.
  • Toulmin, S. E. 1958. The uses of argument. Cambridge university press.
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  • Wu, S. and M. Dredze. 2020. Are All Languages Created Equal in Multilingual BERT? In Workshop on Representation Learning for NLP.
  • Yang, Z., R. Salakhutdinov, and W. W. Cohen. 2017. Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345.
  • Yarowsky, D., N. Grace, W. Richard, et al. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the First International Conference on Human Language Technology Research, pages 1–8.