NLP modeling recommendations for restricted data availability in clinical settings

Villena, Fabián; Bravo-Marquez, Felipe; Dunstan Escudero, Jocelyn Mariel

NLP modeling recommendations for restricted data availability in clinical settings

dc.article.number	116
dc.catalogador	pva
dc.contributor.author	Villena, Fabián
dc.contributor.author	Bravo-Marquez, Felipe
dc.contributor.author	Dunstan Escudero, Jocelyn Mariel
dc.date.accessioned	2025-03-24T20:26:55Z
dc.date.available	2025-03-24T20:26:55Z
dc.date.issued	2025
dc.date.updated	2025-03-09T01:03:32Z
dc.description.abstract	Background Clinical decision-making in healthcare often relies on unstructured text data, which can be challenging to analyze using traditional methods. Natural Language Processing (NLP) has emerged as a promising solution, but its application in clinical settings is hindered by restricted data availability and the need for domain-specific knowledge. Methods We conducted an experimental analysis to evaluate the performance of various NLP modeling paradigms on multiple clinical NLP tasks in Spanish. These tasks included referral prioritization and referral specialty classification. We simulated three clinical settings with varying levels of data availability and evaluated the performance of four foundation models. Results Clinical-specific pre-trained language models (PLMs) achieved the highest performance across tasks. For referral prioritization, Clinical PLMs attained an 88.85 % macro F1 score when fine-tuned. In referral specialty classification, the same models achieved a 53.79 % macro F1 score, surpassing domain-agnostic models. Continuing pre-training with environment-specific data improved model performance, but the gains were marginal compared to the computational resources required. Few-shot learning with large language models (LLMs) demonstrated lower performance but showed potential in data-scarce scenarios. Conclusions Our study provides evidence-based recommendations for clinical NLP practitioners on selecting modeling paradigms based on data availability. We highlight the importance of considering data availability, task complexity, and institutional maturity when designing and training clinical NLP models. Our findings can inform the development of effective clinical NLP solutions in real-world settings.
dc.fechaingreso.objetodigital	2025-03-09
dc.format.extent	13 páginas
dc.fuente.origen	Biomed Central
dc.identifier.citation	BMC Medical Informatics and Decision Making. 2025 Mar 07;25(1):116
dc.identifier.doi	10.1186/s12911-025-02948-2
dc.identifier.uri	https://doi.org/10.1186/s12911-025-02948-2
dc.identifier.uri	https://repositorio.uc.cl/handle/11534/102972
dc.information.autoruc	Escuela de Ingeniería; Dunstan Escudero, Jocelyn Mariel; S/I; 1285723
dc.issue.numero	1
dc.language.iso	en
dc.nota.acceso	contenido completo
dc.revista	BMC Medical Informatics and Decision Making
dc.rights	acceso abierto
dc.rights.holder	The Author(s)
dc.rights.license	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Artificial intelligence
dc.subject	Natural language processing
dc.subject	Data availability
dc.subject.ddc	000
dc.subject.dewey	Ciencias de la computación	es_ES
dc.title	NLP modeling recommendations for restricted data availability in clinical settings
dc.type	artículo
dc.volumen	25
sipa.codpersvinculados	1285723

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 12911_2025_Article_2948.pdf
Size:: 1.43 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.98 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Artículos de revistas