Type of Final Thesis:
Supervisor: Konstantin Pandl
Research Group: Critical Information Infrastructures
Archive Number: 4.684
Status of Thesis: Open
Date of start: 2020-10-15
Research on Machine learning (ML) based solutions for the healthcare industry has demonstrated that they enable a faster and more accurate treatment of patients. ML solutions, however, need large amounts of data to train. A major barrier for adoption of ML in healthcare are high information privacy requirements and the difficulty to share data. Incentivizing data sharing (e.g., by financial compensation) could enable larger data sets to be collected, and ultimately enable the creation of better ML solutions. Data valuation methods exist from theoretical research (i.e., the Shapley value from game theory), but it remains unclear whether and how well they work for ML in general and the healthcare industry specifically. However, you can shed light with your thesis!
Possible topics include, but are not limited to:
- Evaluation of the suitability for data valuation methods (e.g., based on the Shapley value) for machine learning on health care data
This is an umbrella topic since topics of interest change rapidly. A specific topic will be selected during a first meeting.
Introductory literature and material:
- Jia, Ruoxi, et al. "Efficient task-specific data valuation for nearest neighbor algorithms." arXiv preprint arXiv:1908.08619 (2019).
- Jia, Ruoxi, et al. "An Empirical and Comparative Analysis of Data Valuation with Scalable Algorithms." arXiv preprint arXiv:1911.07128 (2019).
- Wang, Tianhao, et al. "A Principled Approach to Data Valuation for Federated Learning." arXiv preprint arXiv:2009.06192 (2020).