Browsing by Author "Rocco, Victor"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemA pseudonymized corpus of occupational health narratives for clinical entity recognition in Spanish(Springer Nature, 2024) Dunstan Escudero, Jocelyn Mariel; Vakili, Thomas; Miranda Huerta, Luis Alberto; Villena, Fabián; Aracena, Claudio; Quiroga Curin, Tamara Nancy; Vera, Paulina; Viteri Valenzuela, Sebastián; Rocco, VictorDespite the high creation cost, annotated corpora are indispensable for robust natural language processing systems. In the clinical field, in addition to annotating medical entities, corpus creators must also remove personally identifiable information (PII). This has become increasingly important in the era of large language models where unwanted memorization can occur. This paper presents a corpus annotated to anonymize personally identifiable information in 1,787 anamneses of work-related accidents and diseases in Spanish. Additionally, we applied a previously released model for Named Entity Recognition (NER) trained on referrals from primary care physicians to identify diseases, body parts, and medications in this work-related text. We analyzed the differences between the models and the gold standard curated by a physician in detail. Moreover, we compared the performance of the NER model on the original narratives, in narratives where personal information has been masked, and in texts where the personal data is replaced by another similar surrogate value (pseudonymization). Within this publication, we share the annotation guidelines and the annotated corpus.
- ItemSex differences in work-related accidents extracted from free text in Spanish using natural language processing(2025) Dunstan Escudero, Jocelyn Mariel; Campaña Herrera, Valentina Andrea; Miranda, Luis; Ladron De Guevara Jara, Rocio Helena; Pincheira, Pablo; Rocco, Victor; Moyano Dávila, Daniela PazEvidence from the global north shows that women and men significantly differ in work accidents and occupational disease rates. However, more data is needed for countries elsewhere. Methods Using natural language processing (NLP), we extracted accident mechanisms from 350,000 admission reports from the largest occupational health provider in Chile. In addition, using the same technique, we normalize occupations written in free text, following the nomenclature from the International Labour Organization (ILO). Results We found that in 57.3% of accidents, a man is affected, while in 42.7% is a woman. The most common occupation for men is operator, while for women, it is related to cleaning duties. The most common form of accident for women is falling from the same height while for men is contact with sharp objects. In this work, we demonstrate the power of NLP in the massive analysis of work-related accidents by reporting the use of large language models with human expert annotation to evaluate mechanisms extraction. Conclusion By sharing our prompts and code, we aim to help other institutions and countries extract crucial information from free text to a controlled vocabulary of ILO. Future work includes the analysis of commuting accidents and occupational diseases.
- ItemWhen the Tides Come, Where Will We Go? Modeling the Impacts of Sea Level Rise on the Greater Boston, Massachusetts, Transport and Land Use System(2017) Han, Yafei; Zegras, P. Christopher; Rocco, Victor; Dowd, Michael; Murga, Mikel; CEDEUS (Chile)For coastal urban areas, an increase in flooding is one of the clearest climate change threats. The research presented in this paper demonstrates how a land use-transport model can be used to forecast the short- and longer-term impacts of a potential 4-ft sea level rise in greater Boston, Massachusetts, by 2030. The short-term scenario represents the immediate transport system response to inundation, which provides a measure of resiliency in the case of an extreme event, such as a storm surge. In the short run, the results reveal that transit captive users will suffer more. Transit, in general, displays less resiliency, at least in part because of the center city's vulnerability and Boston's radial transit system. Trip distances would modestly decrease, and average travel speeds would go down by more than 50%. Rail transit ridership would be decimated, and overall transit usage would go down by 66%. The longer-term scenario predicts how households and firms would prefer to relocate in the so-called new equilibrium when more than 10 mi(2) of land disappears and the transport network inundations become permanent. Assuming no supply constraints, new residential growth centers would emerge on the peripheries of the inundated zones, primarily in the inner-core suburbs. Some regional urban centers and traditional industrial towns would boom. Firms would be hit harder, because of their heavy concentration in the inner core; firm relocation would largely follow households. Transit usage would again be decimated, but walking trips would increase. Results, however, should be viewed as cautious speculation.
