• La Universidad
    • Historia
    • Rectoría
    • Autoridades
    • Secretaría General
    • Pastoral UC
    • Organización
    • Hechos y cifras
    • Noticias UC
  • 2011-03-15-13-28-09
  • Facultades
    • Agronomía e Ingeniería Forestal
    • Arquitectura, Diseño y Estudios Urbanos
    • Artes
    • Ciencias Biológicas
    • Ciencias Económicas y Administrativas
    • Ciencias Sociales
    • College
    • Comunicaciones
    • Derecho
    • Educación
    • Filosofía
    • Física
    • Historia, Geografía y Ciencia Política
    • Ingeniería
    • Letras
    • Matemáticas
    • Medicina
    • Química
    • Teología
    • Sede regional Villarrica
  • 2011-03-15-13-28-09
  • Organizaciones vinculadas
  • 2011-03-15-13-28-09
  • Bibliotecas
  • 2011-03-15-13-28-09
  • Mi Portal UC
  • 2011-03-15-13-28-09
  • Correo UC
- Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log in
    Log in
    Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of DSpace
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log in
    Log in
    Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Dunstan, Jocelyn"

Now showing 1 - 4 of 4
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Entity normalization in a Spanish medical corpus using a UMLS-based lexicon: findings and limitations
    (2024) Baez, Pablo; Campillos-Llanos, Leonardo; Nunez, Fredy; Dunstan, Jocelyn
    Entity normalization is a common strategy to resolve ambiguities by mapping all the synonym mentions to a single concept identifier in standard terminology. Normalizing medical entities is challenging, especially for languages other than English, where lexical variation is considerably under-represented. Here, we report a new linguistic resource for medical entity normalization in Spanish. We applied a UMLS-based medical lexicon (MedLexSp) to automatically normalize mentions from 2000 medical referrals of the Chilean Waiting List Corpus. Three medical students manually revised the automatic normalization. The inter-coder agreement was computed, and the distribution of concepts, errors, and linguistic sources of variation was analyzed. The automatic method normalized 52% of the mentions, compared to 91% after manual revision. The lowest agreement between automatic and automatic-manual normalization was observed for Finding, Disease, and Procedure entities. Errors in normalization were associated with ortho-typographic, semantic, and grammatical linguistic inadequacies, mainly of the hyponymy/hyperonymy, polysemy/metonymy, and acronym-abbreviation types. This new resource can enrich dictionaries and lexicons with new mentions to improve the functioning of modern entity normalization methods. The linguistic analysis offers insight into the sources of lexical variety in the Spanish clinical environment related to error generation using lexicon-based normalization methods. This article also introduces a workflow that can serve as a benchmark for comparison in studies replicating our analysis in Romance languages.
  • Loading...
    Thumbnail Image
    Item
    Procesamiento de lenguaje natural para texto clínico en español: el caso de las listas de espera en Chile
    (2022) Báez, Pablo; Arancibia, Antonia Paz; Chaparro, Matías Ignacio; Bucarey, Tomás; Núñez, Fredy R.; Dunstan, Jocelyn
    Las listas de espera no cubiertas por el Plan de Garantías Explícitas en Salud para nueva consulta de especialidad en Chile se han visto incrementadas por los efectos de la pandemia del coronavirus SARS-CoV-2 (COVID-19). Esto representa un problema debido a la demora en la resolución y priorización de cada caso de derivación al nivel secundario de atención en salud. El objetivo de este artículo es exponer el problema de la lista de espera en el sistema de salud de Chile, y abordarlo como ejemplo de la aplicación de técnicas de Procesamiento del Lenguaje Natural (PLN). Específicamente, se describe una metodología para el reconocimiento de información clave en narrativas médicas. Actualmente, contamos con un conjunto de interconsultas médicas manualmente anotadas en el desarrollo del Corpus de Lista de Espera Chilena, y con una fracción de 2.000 interconsultas en las que las entidades médicas anotadas fueron normalizadas de forma automatizada a los conceptos del Sistema de Lenguaje Médico Unificado empleando el léxico MedLexSp. Este y otros recursos lingüísticos y herramientas de PLN están siendo desarrollados por el grupo de PLN en Medicina del Centro de Modelamiento Matemático de la Universidad de Chile y otros grupos a nivel nacional, los cuales constituyen aportes relevantes que pueden ser transferidos al sistema de salud chileno, con el objetivo de apoyar la gestión del texto clínico en español.
  • No Thumbnail Available
    Item
    The problem of estimation and forecasting of obesity prevalence using sparsely collected data
    (2024) Rojo-Gonzalez, Luis; Dunstan, Jocelyn; Cuadrado, Cristobal; Avalos, Denisse; Moraga-Correa, Javier; Troncoso, Nelson; Vasquez, oscar C.
    The problem of estimation and forecasting of population nutritional status has been addressed in the literature, showing successful results when the data are available and frequently collected over time. However, most low and middle -income countries collect nutritional status data sparsely, and consequently, the uncertainty/absence of information may negatively affect decisions from policymakers. In this context, the problem of estimation and forecasting of obesity prevalence using sparsely collected cross-sectional data is formally stated and a novel sequential approach to address it is proposed. Specifically, this work describes the nutritional status dynamics using a system of nonlinear difference equations, where the set of transition probabilities are unknown parameters due to the sparsely collected cross-sectional data. Then, an artificial neural network alike model is proposed through its equivalent nonlinear programming model, considering the difference equations system as constraints as well as bounds for the transition probabilities based on literature data. In addition, comprehensive data collection and information analysis processes to compute demographic parameters are defined. As the model is non -convex, an optimal solution is characterized and coined as stable; and thereafter assessed in terms of its goodness -of -fit. Computational experiments and a resolution scheme using a rollinghorizon forecasting/back-casting approach and divergence metrics is proposed. To illustrate the usefulness of this novel approach, Chile is used as a case study. Results show an accuracy up to 90%, forecasting the men and women obese population (BMI >= 30.0 kg/m2) for 2024, reaching 30.6% (95% CI: 28.4-32.8%) and 32.6% (95% CI: 29.1-36.0%), respectively.
  • Loading...
    Thumbnail Image
    Item
    Training and intrinsic evaluation of lightweight word embeddings for the clinical domain in Spanish
    (2022) Chiu, Carolina; Villena, Fabián; Martin, Kinan; Núñez, Fredy R.; Besa Correa, Cecilia; Dunstan, Jocelyn
    Resources for Natural Language Processing (NLP) are less numerous for languages different from English. In the clinical domain, where these resources are vital for obtaining new knowledge about human health and diseases, creating new resources for the Spanish language is imperative. One of the most common approaches in NLP is word embeddings, which are dense vector representations of a word, considering the word's context. This vector representation is usually the first step in various NLP tasks, such as text classification or information extraction. Therefore, in order to enrich Spanish language NLP tools, we built a Spanish clinical corpus from waiting list diagnostic suspicions, a biomedical corpus from medical journals, and term sequences sampled from the Unified Medical Language System (UMLS). These three corpora can be used to compute word embeddings models from scratch using Word2vec and fastText algorithms. Furthermore, to validate the quality of the calculated embeddings, we adapted several evaluation datasets in English, including some tests that have not been used in Spanish to the best of our knowledge. These translations were validated by two bilingual clinicians following an ad hoc validation standard for the translation. Even though contextualized word embeddings nowadays receive enormous attention, their calculation and deployment require specialized hardware and giant training corpora. Our static embeddings can be used in clinical applications with limited computational resources. The validation of the intrinsic test we present here can help groups working on static and contextualized word embeddings. We are releasing the training corpus and the embeddings within this publication.

Bibliotecas - Pontificia Universidad Católica de Chile- Dirección oficinas centrales: Av. Vicuña Mackenna 4860. Santiago de Chile.

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback