Exploring text classification methods in oncological medical notes using machine learning and deep learning

Schwertner, Marco Antonio

dc.contributor.advisor	Rigo, Sandro José
dc.contributor.author	Schwertner, Marco Antonio
dc.date.accessioned	2020-11-25T17:48:54Z
dc.date.accessioned	2022-09-22T19:41:07Z
dc.date.available	2020-11-25T17:48:54Z
dc.date.available	2022-09-22T19:41:07Z
dc.date.issued	2020-08-24
dc.identifier.uri	https://hdl.handle.net/20.500.12032/63782
dc.description.abstract	With the preventive and personalized medicine advances, and technological improvements enabling better interaction from patients with their healthcare information, the volume of healthcare data gathered has increased. A relevant part of these data is recorded as an unstructured format in natural language free-text, making it harder for Clinical Decision Support Systems (CDSS) to process these data. Consequently, healthcare professionals get overwhelmed keeping themselves updated with the patient’s healthcare information because they need more time to gather and analyze it manually. Furthermore, to define an oncology diagnosis and its treatment plan is a complex decision-making process because it is affected by a broad range of parameters. This research’s main objective is to apply several text classification methods in non-synthetic oncology clinical notes corpora to help with this decision-making process. First, the corpora were obtained from an Oncology EHR system from three different oncology clinics. Two corpora versions were created: the per-clinical-event version with each patient’s medical note per record; and the per-patient version with one record per patient with his or her medical notes. Then, these corpora were preprocessed to leverage the performance of the classifiers. As the last step, several machine learning and one deep learning text classification methods were trained using these corpora with each patient’s diagnosis as enriched data. The following machine learning and deep learning classification methods were applied: Multilayer Perceptron (MLP) neural network, Logistic Regression, Decision Tree classifier, Random Forest classifier, K-nearest neighbors (KNN) classifier, and Long-Short Term Memory (LSTM). An additional experiment with an MLP classifier was performed to evaluate the preprocessing step’s influence on the results, and it found that the classifier’s mean accuracy was leveraged from 26.1% to 86.7% with the per-clinical-event corpus, and 93.9% with the perpatient corpus. The classifier that best performed was the MLP with 2 hidden layers (800 and 500 neurons), which achieved 93.90% accuracy, a Macro F1 score of 93.61%, and a Weighted F1 score of 93.99%. The experiments were performed in a dataset with 3,308 medical notes from a small oncology clinic.	en
dc.description.sponsorship	Nenhuma	pt_BR
dc.language	en	pt_BR
dc.publisher	Universidade do Vale do Rio dos Sinos	pt_BR
dc.rights	openAccess	pt_BR
dc.subject	Artificial intelligence	en
dc.subject	Inteligência artificial	pt_BR
dc.title	Exploring text classification methods in oncological medical notes using machine learning and deep learning	en
dc.type	Dissertação	pt_BR

Files in this item

Files	Size	Format	View
Marco Antônio Schwertner_.pdf	4.127Mb	application/pdf	View/Open

This item appears in the following Collection(s)

Documentos - UNISINOS

Show simple item record