Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation

dc.catalogadordfo
dc.contributor.authorMessina Gallardo, Pablo Alfredo
dc.contributor.authorVidal, René
dc.contributor.authorParra Santander, Denis Alejandro
dc.contributor.authorSoto, Álvaro
dc.contributor.authorAraujo, Vladimir
dc.date.accessioned2025-06-12T20:55:51Z
dc.date.available2025-06-12T20:55:51Z
dc.date.issued2024
dc.description.abstractAdvancing representation learning in specialized fields like medicine remains challenging due to the scarcity of expert annotations for text and images. To tackle this issue, we present a novel two-stage framework designed to extract high-quality factual statements from free-text radiology reports in order to improve the representations of text encoders and, consequently, their performance on various downstream tasks. In the first stage, we propose a Fact Extractor that leverages large language models (LLMs) to identify factual statements from well-curated domain-specific datasets. In the second stage, we introduce a Fact Encoder (CXRFE) based on a BERT model fine-tuned with objective functions designed to improve its representations using the extracted factual data. Our framework also includes a new embedding-based metric (CXRFEScore) for evaluating chest X-ray text generation systems, leveraging both stages of our approach. Extensive evaluations show that our fact extractor and encoder outperform current state-of-the-art methods in tasks such as sentence ranking, natural language inference, and label extraction from radiology reports. Additionally, our metric proves to be more robust and effective than existing metrics commonly used in the radiology report generation literature. The code of this project is available at https://github.com/PabloMessina/CXR-Fact-Encoder.
dc.description.funderANID
dc.description.funderInstituto Milenio en Ingeniería e Inteligencia Artificial para la Salud
dc.description.funderCentro Nacional de Inteligencia Artificial
dc.description.funderNational Institutes of Health
dc.description.funderFondecyt
dc.fuente.origenSCOPUS
dc.identifier.doi10.18653/v1/2024.findings-acl.236
dc.identifier.issn0736587X
dc.identifier.scopusidSCOPUS_ID:85205302635
dc.identifier.urihttps://repositorio.uc.cl/handle/11534/104664
dc.language.isoen
dc.pagina.final3986
dc.pagina.inicio3955
dc.publisherAssociation for Computational Linguistics (ACL)
dc.relation.ispartofFindings of the Association for Computational Linguistics ACL
dc.revistaProceedings of the Annual Meeting of the Association for Computational Linguistics
dc.rightsacceso abierto
dc.subject.ddc620
dc.subject.deweyIngenieríaes_ES
dc.titleExtracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation
dc.typecomunicación de congreso
sipa.trazabilidadSCOPUS;2024-10-27
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2024.findings-acl.236.pdf
Size:
3.39 MB
Format:
Adobe Portable Document Format
Description: