Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation

Messina Gallardo, Pablo Alfredo; Vidal, René; Parra Santander, Denis Alejandro; Soto, Álvaro; Araujo, Vladimir

Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation

dc.catalogador	dfo
dc.contributor.author	Messina Gallardo, Pablo Alfredo
dc.contributor.author	Vidal, René
dc.contributor.author	Parra Santander, Denis Alejandro
dc.contributor.author	Soto, Álvaro
dc.contributor.author	Araujo, Vladimir
dc.date.accessioned	2025-06-12T20:55:51Z
dc.date.available	2025-06-12T20:55:51Z
dc.date.issued	2024
dc.description.abstract	Advancing representation learning in specialized fields like medicine remains challenging due to the scarcity of expert annotations for text and images. To tackle this issue, we present a novel two-stage framework designed to extract high-quality factual statements from free-text radiology reports in order to improve the representations of text encoders and, consequently, their performance on various downstream tasks. In the first stage, we propose a Fact Extractor that leverages large language models (LLMs) to identify factual statements from well-curated domain-specific datasets. In the second stage, we introduce a Fact Encoder (CXRFE) based on a BERT model fine-tuned with objective functions designed to improve its representations using the extracted factual data. Our framework also includes a new embedding-based metric (CXRFEScore) for evaluating chest X-ray text generation systems, leveraging both stages of our approach. Extensive evaluations show that our fact extractor and encoder outperform current state-of-the-art methods in tasks such as sentence ranking, natural language inference, and label extraction from radiology reports. Additionally, our metric proves to be more robust and effective than existing metrics commonly used in the radiology report generation literature. The code of this project is available at https://github.com/PabloMessina/CXR-Fact-Encoder.
dc.description.funder	ANID
dc.description.funder	Instituto Milenio en Ingeniería e Inteligencia Artificial para la Salud
dc.description.funder	Centro Nacional de Inteligencia Artificial
dc.description.funder	National Institutes of Health
dc.description.funder	Fondecyt
dc.fuente.origen	SCOPUS
dc.identifier.doi	10.18653/v1/2024.findings-acl.236
dc.identifier.issn	0736587X
dc.identifier.scopusid	SCOPUS_ID:85205302635
dc.identifier.uri	https://repositorio.uc.cl/handle/11534/104664
dc.language.iso	en
dc.pagina.final	3986
dc.pagina.inicio	3955
dc.publisher	Association for Computational Linguistics (ACL)
dc.relation.ispartof	Findings of the Association for Computational Linguistics ACL
dc.revista	Proceedings of the Annual Meeting of the Association for Computational Linguistics
dc.rights	acceso abierto
dc.subject.ddc	620
dc.subject.dewey	Ingeniería	es_ES
dc.title	Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation
dc.type	comunicación de congreso
sipa.trazabilidad	SCOPUS;2024-10-27

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2024.findings-acl.236.pdf
Size:: 3.39 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Artículos de conferencia