Skip to content

Metric comparisons

FairBench's main goal is to generate fairness reports that contain many types of evaluations. Reports look like the example below, where rows correspond to base metrics computed for each sensitive attribute dimension. Columns, on the other hand, include expansions and reductions that summarize metric comparisons across all protected groups or subgroups (for details on how to replicate this process with custom metrics, see see here). Below we present the comparison mechanisms computed by built-in report methods, as well as the value at which comparisons indicate full bias; scan reports for particularly bad values to notify stakeholders and explore why they occur.

--- Example multireport --- 
Metric          min             wmean           gini            minratio        maxdiff         maxbarea        maxrarea        maxbdcg        
auc             0.861           0.877           0.013           0.948           0.047           0.048           0.071           0.055          
avgscore        0.110           0.239           0.234           0.363           0.193           0.682           0.631           0.749          
tophr           0.667           0.778           0.100           0.667           0.333           nan             nan             nan            
toprec          0.001           0.002           0.392           0.121           0.004           nan             nan             nan            
avghr           0.389           0.592           0.220           0.389           0.611           0.611           0.611           0.696          
avgrepr         0.000           1.000           0.500           0.000           1.499           1.499           1.000           1.499   

Performance

These report columns assess the performance of analysed systems while accounting explicitly for the existence of subgroups.

Performance comparisons

These report columns directly compare the values of base metrics between groups. Depending on the report type, the groups being compared could be the pairs of groups or subgroups you indicated in the sensitive attribute fork, each group against the total population, or each group against its complement in the population.

Curve comparisons

These report columns compare the underlying curves of base measures. Which groups are compared is again determined by the report type. The base measures should keep track of curve explanations, otherwise this comparison is not possible.