A methodology for developing dermatological datasets: lessons from retrospective data collection for AI-based applications

dc.article.number251
dc.catalogadoryvc
dc.contributor.authorPedro Pérez, Alma Alheli
dc.contributor.authorRomero Jofré, Pamela Ignacia
dc.contributor.authorVidaurre, Soledad
dc.contributor.authorCabanas, Ana M.
dc.contributor.authorGalaz, Atsuko
dc.contributor.authorHidalgo Acuña, Leonel Esteban
dc.contributor.authorCarrasco, Karina
dc.contributor.authorTamez-Peña, José Gerardo
dc.contributor.authorDíaz-Domínguez, Ricardo
dc.contributor.authorNavarrete Dechent, Cristian Patricio
dc.contributor.authorMery Quiroz, Domingo Arturo
dc.date.accessioned2025-11-18T16:11:24Z
dc.date.available2025-11-18T16:11:24Z
dc.date.issued2025
dc.date.updated2025-11-09T01:04:18Z
dc.description.abstractPurpose The integration of artificial intelligence into dermatological research has underscored the need for robust and well-structured dermatological datasets. However, these datasets vary widely in their development processes, and there is currently no standard methodology to create such datasets. This work identifies three pressing needs for the building of dermatological datasets focus on skin tumor classification: the need for multimodal datasets, the definition of minimum metadata requirements, and the inclusion of underrepresented populations to address the scarcity of health data. Methods We propose a practical methodology to create dermatological datasets from clinical records, incorporating both images and patient metadata. The process consists of four key stages: getting the institutional review board approval and analysis of clinical information sources, data recording and structuring, processing of clinical data and images, and quality assessment. This methodology was derived from hands-on experience in building two datasets from Chilean and Mexican populations, respectively. Results The methodology allows the creation of well-structured datasets by simplifying data organization and enabling replication. Each step includes practical guidance for dealing with typical challenges, such as image metadata categorization and technical validation by dermatologists and computer scientists. Conclusion Our contribution offers a reproducible, scalable, and interdisciplinary framework for creating dermatological datasets, especially useful for countries initiating dataset creation. In addition to the methodological proposal, we highlight common pitfalls and offer recommendations to mitigate them.
dc.fechaingreso.objetodigital2025-11-09
dc.format.extent14 páginas
dc.fuente.origenAutoarchivo
dc.identifier.citationBMC Medical Research Methodology. 2025 Nov 05;25(1):251
dc.identifier.doi10.1186/s12874-025-02706-y
dc.identifier.urihttps://doi.org/10.1186/s12874-025-02706-y
dc.identifier.urihttps://repositorio.uc.cl/handle/11534/107026
dc.information.autorucEscuela de Ingeniería; Pedro Pérez, Alma Alheli; S/I; 1186437
dc.information.autorucEscuela de Ingeniería; Romero Jofré, Pamela Ignacia; S/I; 1049369
dc.information.autorucEscuela de Medicina; Hidalgo Acuña, Leonel Esteban; S/I; 1095646
dc.information.autorucEscuela de Medicina; Navarrete Dechent, Cristian Patricio; 0000-0003-4040-3640; 156251
dc.information.autorucEscuela de Ingeniería; Mery Quiroz, Domingo Arturo; 0000-0003-4748-3882; 102382
dc.issue.numero25
dc.language.isoen
dc.nota.accesocontenido completo
dc.revistaBMC Medical Research Methodology
dc.rightsacceso abierto
dc.rights.holderThe Author(s)
dc.rights.licenseCC BY NC ND Atribución-NoComercial-SinDerivadas International 4.0
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectDataset methodology
dc.subjectSkin cancer
dc.subjectClinical metadata
dc.subjectDermatology
dc.subject.ddc610
dc.titleA methodology for developing dermatological datasets: lessons from retrospective data collection for AI-based applications
dc.typeartículo
sipa.codpersvinculados1186437
sipa.codpersvinculados1049369
sipa.codpersvinculados1095646
sipa.codpersvinculados156251
sipa.codpersvinculados102382
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
12874_2025_Article_2706.pdf
Size:
1.86 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.98 KB
Format:
Item-specific license agreed upon to submission
Description: