Reconhecimento de entidades nomeadas e extração de relações de registros de prontuários médicos para população de ontologia
Descripción
There has been a significant increase in the number of Electronic Health Records (EHRs) that accommodate unstructured data, such as text and natural language observations. Consequently, there is a growing interest in using this data to promote improvements in health. Manual analysis of these data is not feasible due to the large volume, which continues to increase. Therefore, there is a need for an approach that automatically structures this information, enabling it to assist health professionals in data analysis, treatment recommendations, disease diagnoses, among other applications.An evaluation of the literature in this area has identified demands for addressing this problem in Portuguese. However, there are still a limited number of studies with real data from the health sector. A research opportunity identified is the use of resources based on the Transformers architecture and the application of the results for data structuring in ontologies.In this context, this work aims to develop a model for processing unstructured data from EHRs to support the activity of updating an ontology. The contributions of this research are present in two related aspects. Firstly, it aims to support the development of applications in EHR systems for oncology by enhancing their capacity to utilize unstructured data. Secondly, the research focuses on experimenting and proposing advances in computing approaches for entity recognition and relations extraction, as well as integrating them with an ontology. The study was carried out as a case study in a company operating in the field of Oncology. Detailed analyses of a widely used system in EHRs of oncology clinics were conducted. As a result of this analysis, one of the distinctive features of the work is the creation of unpublished datasets of entities and relations of medical evolutions, containing 1,622 annotated documents, comprising 146,769 entities and 111,716 relations. Another unique aspect of the work is the adaptation of a domain ontology to represent the structured data of this case study. Finally, experiments were conducted with approaches to extract entities and relations in text, achieving results such as 78.24% accuracy in the exams domain and 72.87% in the diagnostics domain. In addition, an ontology focused on oncology was built and integrated into the model, encompassing approximately 181 classes, 14 data properties, 12 object properties, and over 200 individuals. Healthcare specialists evaluated the model, obtaining a 73.52% accuracy rate in relation to their analysis, and the usability research showed excellent acceptance. The training of models using real oncology data and the construction of a knowledge base through ontology stands out as a differential of the work.CNPQ – Conselho Nacional de Desenvolvimento Científico e Tecnológico