Mostrar registro simples

dc.contributor.advisorRecamonde-Mendoza, Marianapt_BR
dc.contributor.authorRodrigues, Diego Dimerpt_BR
dc.date.accessioned2023-05-13T03:26:41Zpt_BR
dc.date.issued2023pt_BR
dc.identifier.urihttp://hdl.handle.net/10183/258017pt_BR
dc.description.abstractMachine learning (ML) is a rapidly growing field of computer science that has found many fruitful applications in several domains, including Health. However, ML is also highly susceptible to bias, which introduces concerns regarding their ability to inflict harm. Bias can come from various sources, such as the design of the algorithm, the selection of data, and the strategies underlying data collection. Thus, data scientists must be vigilant in ensuring that the developed models do not perpetuate social disparities based on gender, religion, sexual orientation, or ethnicity. This work aims to explore pre-training bias met rics to investigate the existence of bias in Health data. The metrics also analyze how pro tected attributes and their correlated features are distributed for the predicted class against the target attributes, giving insight into how the trained model may produce biased pre dictions. Our goal is to evaluate pre-training bias metrics in three different health datasets and assess the impact of bias on the performance of ML algorithms. O Our experiments in volve artificially modified versions of the dataset to increase the values of the pre-training bias metrics to favor privileged classes as well as to lower the values of these metrics to reduce the discrepancy in the data and the risk of bias. We trained models using four supervised learning algorithms: Logistic Regression, Decision Tree, Random Forest, and K-Nearest Neighbors. Each algorithm was tested on six to ten different training sets with varying random seeds to split the data in each iteration. We evaluated the performance of the trained models using the same test sets for every dataset variation, reporting the Accuracy and F1-Score. By analyzing pre-training metric bias and the predictive perfor mance of models, this study demonstrates that performance can be significantly affected by skewed data distribution and that the performance metrics may sometimes mask the bias incorporated by the algorithm. In some cases, classification errors may be more pro nounced in one group (e.g., the disadvantaged group), accentuating specific errors such as false positives and false negatives, which may have different implications depending on the clinical prediction problem under analysis.en
dc.format.mimetypeapplication/pdfpt_BR
dc.language.isoporpt_BR
dc.rightsOpen Accessen
dc.subjectBiasen
dc.subjectAprendizado de máquinapt_BR
dc.subjectSaúdept_BR
dc.subjectBias metricsen
dc.subjectDadopt_BR
dc.subjectPre-trainingen
dc.subjectModel evaluationen
dc.titleAssessing pre-training bias in Health data and estimating its impact on machine learning algorithmspt_BR
dc.typeTrabalho de conclusão de graduaçãopt_BR
dc.identifier.nrb001168640pt_BR
dc.degree.grantorUniversidade Federal do Rio Grande do Sulpt_BR
dc.degree.departmentInstituto de Informáticapt_BR
dc.degree.localPorto Alegre, BR-RSpt_BR
dc.degree.date2023pt_BR
dc.degree.graduationCiência da Computação: Ênfase em Ciência da Computação: Bachareladopt_BR
dc.degree.levelgraduaçãopt_BR


Thumbnail
   

Este item está licenciado na Creative Commons License

Mostrar registro simples