Browsing by Author "Delfino Yurin, Alejandro"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemEvaluating GPT-4o in high-stakes medical assessments: performance and error analysis on a Chilean anesthesiology exam(2025) Altermatt Couratier, Fernando René; Neyem, Andrés; Sumonte Fuenzalida, Nicolás Ignacio; Villagrán Gutiérrez, Ignacio Andrés; Mendoza Rocha, Marcelo; Lacassie Quiroga, Héctor; Delfino Yurin, AlejandroBackground Large language models (LLMs) such as GPT-4o have the potential to transform clinical decision-making, patient education, and medical research. Despite impressive performance in generating patient-friendly educational materials and assisting in clinical documentation, concerns remain regarding the reliability, subtle errors, and biases that can undermine their use in high-stakes medical settings. Methods A multi-phase experimental design was employed to assess the performance of GPT-4o on the Chilean anesthesiology exam (CONACEM), which comprised 183 questions covering four cognitive domains—Understanding, Recall, Application, and Analysis—based on Bloom’s taxonomy. Thirty independent simulation runs were conducted with systematic variation of the model’s temperature parameter to gauge the balance between deterministic and creative responses. The generated responses underwent qualitative error analysis using a refined taxonomy that categorized errors such as “Unsupported Medical Claim,” “Hallucination of Information,” “Sticking with Wrong Diagnosis,” “Non-medical Factual Error,” “Incorrect Understanding of Task,” “Reasonable Response,” “Ignore Missing Information,” and “Incorrect or Vague Conclusion.” Two board-certified anesthesiologists performed independent annotations, with disagreements resolved by a third expert. Statistical evaluations—including one-way ANOVA, non-parametric tests, chi-square, and linear mixed-effects modeling—were used to compare performance across domains and analyze error frequency. Results GPT-4o achieved an overall accuracy of 83.69%. Performance varied significantly by cognitive domain, with the highest accuracy observed in the Understanding (90.10%) and Recall (84.38%) domains, and lower accuracy in Application (76.83%) and Analysis (76.54%). Among the 120 incorrect responses, unsupported medical claims were the most common error (40.69%), followed by vague or incorrect conclusions (22.07%). Co-occurrence analyses revealed that unsupported claims often appeared alongside imprecise conclusions, highlighting a trend of compounded errors particularly in tasks requiring complex reasoning. Inter-rater reliability for error annotation was robust, with a mean Cohen’s kappa of 0.73. Conclusions While GPT-4o exhibits strengths in factual recall and comprehension, its limitations in handling higher-order reasoning and diagnostic judgment are evident through frequent unsupported medical claims and vague conclusions. These findings underscore the need for improved domain-specific fine-tuning, enhanced error mitigation strategies, and integrated knowledge verification mechanisms prior to clinical deployment.
- ItemProcess-oriented feedback for ultrasound-guided central venous access training: a randomized controlled trial(2025) Fuente Sanhueza, René Francisco de la; Gálvez Yanjarí, Víctor Andrés; Delfino Yurin, Alejandro; Lira, Ricardo; Hurtado, Claudia ; Munoz-Gama, Jorge; Sepúlveda Fernández, Marcos ErnestoBackground: Process mining is an emerging discipline that allows for the analysis of procedural executions performed in a training context, providing objective information about adherence with a normative procedural model (similarity), the number of repetitions of steps (reworks), and performance metrics, which can be used as objective feedback for trainees to guide learning through a process-oriented feedback approach. The aim of this study was to assess whether interventions based on information derived from process mining analysis improve the attainment of procedural proficiency. Methods: Twenty anaesthesia and emergency medicine residents participated in a training program on ultrasoundguided internal jugular central venous catheter placement that took place in a simulated environment. The participants were randomized into a process-oriented training group (n=10), which received supplementary interventions during training according to the information obtained with process mining tools, and a control group (n=10), for whom the simulation-based training program was unchanged. Video recordings of each student were obtained before and after the training. Two blinded observers evaluated each recording using a global rating scale(primary outcome) and checklist. Procedure execution time and process-oriented metrics (rework and similarity) were measured. The pre- and posttraining performance indicators were compared within groups and between groups. The interrater reliability of the global rating scale scores was calculated using the intraclass correlation coefficient. We used the Wilcoxon signed-rank test for intragroup comparisons and the Mann‒Whitney test for intergroup comparisons. Statistical significance was set at P<.05, adjusted for multiple comparisons. Results: There were no differences between groups in the pretraining measures. During post training, both groups showed improved performance in ultrasound-guided central venous catheter placement compared with their pretraining performance. The global scale results, checklist results, and execution times were not significantly differentbetween the control and process-oriented groups. However, the process-oriented group showed a significant improvement in similarity to the expected performance and a greater reduction in rework than did the control group.
- ItemUso de Direct Observation of Procedural Skills (DOPS) como instrumento evaluativo para la obtención de habilidades procedimentales en estudiantes y profesionales de enfermería: scoping review(2024) Osorio Gutiérrez, Bernardita de los Ángeles; Delfino Yurin, Alejandro; Pontificia Universidad Católica de Chile. Escuela de MedicinaEn la actualidad los programas formativos se basan en la Educación Basada en Competencias (CBME) la cual consiste en la demostración del conocimiento, habilidades y actitudes necesarias para el ejercicio de la práctica médica. Chile no ha sido la excepción y también se ha visto en la necesidad de adaptarse a los cambios antes mencionados, replanteándose el perfil que deben poseer los nuevos profesionales de enfermería. Con relación a esto, es necesario asegurar que los estudiantes hayan alcanzado las diferentes competencias clínicas durante el programa de formación para realizar una atención segura, ya que entornos reales de atención exponen a los estudiantes de enfermería a realidades que no pueden transmitirse a través de un texto, entre ellas, las actividades procedimentales. En la actualidad, y a nivel mundial, la evaluación de habilidades procedimentales posee un carácter mayormente subjetivo. Sin embargo, se han hecho esfuerzos por estandarizar los instrumentos considerando que la evaluación es fundamental para garantizar una atención segura. En tanto, las escuelas chilenas utilizan preferentemente pautas de cotejo de carácter dicotómico asociadas a comentarios más que a una retroalimentación estructurada. Incluso en algunos casos se aplican las mismas pautas para la evaluación de diferentes procedimientos. El esquema más difundido en el ámbito de la educación médica para evaluar la competencia profesional es el creado por Miller en forma de pirámide. El vértice de la pirámide corresponde al “hacer” y se trata de la competencia demostrada en situación y contexto profesional real, así como la retroalimentación frecuente y formativa ya que compromete a los alumnos y otorga una mirada crítica de sus competencias. Uno de los instrumentos utilizados para evaluar la competencia en contextos reales es la Direct Observation of Procedural Skills (DOPS), herramienta particularmente útil en evaluación formativa objetiva y sistemática. Además, brinda a los alumnos la oportunidad de recibir retroalimentación constructiva y oportuna. Este tipo de pauta ha tenido buenos resultados en la evaluación médica en especialidades como anestesia, otorrinolaringología, ginecología y pediatría. Debido a la inconsistencia de los instrumentos de evaluación procedimental a nivel nacional, es muy relevante conocer las experiencias internacionales acerca del uso de instrumentos útiles tales como las pautas DOPS. Esto para desarrollar una línea investigativa en este ámbito e incorporarlas como herramienta evaluativa formativa a nacional con el fin de generar profesionales que demuestren competencias procedimentales mínimas para una atención segura. Propuesta: Realización de scoping review para realizar mapeo sistemático de las investigaciones relacionadas con la aplicación de pautas DOPS como instrumento de evaluación de procedimientos en la práctica clínica de enfermería. La revisión sistemática se llevará a cabo mediante el protocolo Preferred Reporting Items for Systematic Reviews and Meta-Analysis for Scoping Reviews (PRISMA-ScR).
