What-If Probing of ML Models

Although this paper describes a tool rather than a single analytical methodology, it is possible to extract an assumed composite workflow by synthesising the tool's three core workflows (Section 6: sense-making, hypothetical probing, fairness evaluation) with the analytical progressions demonstrated in the three case studies (Sections 5.1–5.3). Each case study follows a different path through the tool's capabilities, and together they describe a coherent analytical pattern for black-box ML model understanding. This assumed workflow captures the typical analytical process enabled by the What-If Tool for black-box ML model understanding, synthesised from the tool's three core workflows and three case studies. Data Understanding. The analyst examines feature distributions in the test dataset, identifying imbalances, anomalous distributions, and data quality issues that may influence model behaviour. Model Performance Overview. Model predictions are computed on the test dataset and visualised through customisable multi-view displays — confusion matrices, score comparison scatterplots, feature-based histograms, and small multiples. The analyst identifies global performance patterns, regions of model agreement and disagreement, and areas warranting deeper investigation, forming initial observations of model behaviour. Hypothetical Investigation. The analyst selects data points, features, and subgroups to probe, then iteratively investigates model behaviour through feature editing with re-inference, nearest counterfactual identification, and partial dependence analysis. Each cycle reveals feature sensitivities, unexpected responses, and decision boundary characteristics. The analyst assesses whether behavioural understanding is sufficient and refines the investigation focus until key questions are answered. Fairness Evaluation. The analyst slices the dataset by individual features and feature intersections, computing per-slice performance and fairness metrics. Initial visualisations reveal performance disparities across demographic subgroups. The analyst then iteratively explores threshold optimisation strategies (demographic parity, equal opportunity, equal accuracy), each time visualising the resulting performance changes and identifying how disparities shift under the new configuration, continuing until a satisfactory fairness–performance trade-off is found. Knowledge Synthesis. The analyst integrates data characteristics, performance patterns, behavioural findings, and fairness assessments into a comprehensive understanding of model behaviour, feature effects, and fairness properties, supporting model improvement and deployment decisions.

Workflow Summary

Graphical view

ATWL Representation