Anders is Enhancing AI Performance and Explainability through Data Reduction

Published 8 August 2024

Guo Feng Anders Yeo

AI researcher | Applied maths | Statistics | PhD candidate

LinkedIn: Anders Yeo

The modern push for AI and data driven decision making, coupled with the ease of automated data collection, has resulted in excessive large datasets. The excessive size of modern datasets provides challenges for existing data mining and machine learning techniques. On the other hand, data preprocessing is an integral step in data mining and machine learning as low quality data leads to low quality results and outcomes.

Data reduction algorithms provide a means to address both of these problems, alongside being within the toolbox of explainable artificial intelligence techniques. This figure is a primer to 3 novel optimisation-based wrapper data reduction algorithms, two instance selection (SpFixedIS and SpIS) and one hybrid selection algorithm (SpIFS).

The figure below was created by Anders in collaboration with Graphics et al. as part of a DHCRC visual communication activity.

Click to enlarge

Diagram illustrating three novel optimization-based wrapper data reduction algorithms: Stochastic Perturbation Fixed Instance Selection (SpFixedIS), Stochastic Perturbation Instance Selection (SpIS), and Stochastic Perturbation Instance and Feature Selection (SpIFS). Each section contains a performance graph against the percentage of instances or features, with annotations explaining the purpose and benefits of each algorithm. SpFixedIS seeks to maximize performance for a specified number of instances. SpIS identifies a minimal subset of instances for sufficient performance. SpIFS simultaneously selects instances and features to balance performance and reduction rate. Logos and visual elements are included for the Digital Health CRC.

Emerging leaders in digital health