FairBench reports
implement several definitions of fairness that
quantify imbalances between groups of people (e.g.,
different genders) in terms of them obtaining different
assessments by base performance metrics. These assessments
are often further
reduced across groups
of samples with different sensitive attribute values.
Here, we present base metrics used to assess AI
that reports use. All metrics computed
on a subset of 'sensitive samples', which form
the group being examined each time. Outputs are
wrapped into explainable objects that keep track of
relevant metadata.
- Classification
- Ranking
- Regression
Classification
Classification metrics assess binary predictions. Unless stated
otherwise, the following arguments need to be provided:
Argument |
Role |
Values |
predictions |
system output |
binary array |
labels |
prediction target |
binary array |
sensitive |
sensitive attribute |
fork of arrays with elements in \([0,1]\) (either binary or fuzzy) |
accuracy
Computes the accuracy for correctly predicting provided binary labels
for sensitive data samples. Returns a float in the range
\([0,1]\).
Explanation:
number of samples, true predictions
pr
Computes the positive rate of binary predictions for sensitive
data samples. Returns a float in the range
\([0,1]\). This metric
does not use the
labels
argument.
Explanation:
number of samples, positive predictions
positives
Computes the number of positive predictions for
sensitive data samples. Returns a float in the range
\([0,\infty)\).
This metric does not use the
labels
argument.
Explanation:
number of samples
tpr
Computes the true positive rate of binary predictions for sensitive
data samples. Returns a float in the range
\([0,1]\).
Explanation:
number of samples, number of positives, number of true positives
tnr
Computes the true negative rate of binary predictions for sensitive
data samples. Returns a float in the range
\([0,1]\).
Explanation:
number of samples, number of negatives, number of true negatives
fpr
Computes the false positive rate of binary predictions for sensitive
data samples. Returns a float in the range
\([0,1]\).
Explanation:
number of samples, number of positives, number of false positives
fnr
Computes the false negative rate of binary predictions for sensitive
data samples. Returns a float in the range
\([0,1]\).
Explanation:
number of samples, number of negatives, number of false negatives
Ranking
Ranking metrics assess scores that aim to approach
provided labels. The following arguments need to be provided:
Argument |
Role |
Values |
scores |
system output |
array with elements in \([0,1]\) |
labels |
prediction target |
binary array |
sensitive |
sensitive attribute |
fork of arrays with elements in \([0,1]\) (either binary or fuzzy) |
auc
Computes the area under curve of the receiver operating
characteristics for sensitive data samples.
Returns a float in the range
\([0,1]\).
Explanation:
number of samples, the receiver operating characteristic curve
phi
Computes the score mass of
sensitive data samples compared
to the total scores.
Returns a float in the range
\([0,1]\).
Explanation:
number of samples, sensitive scores
tophr
Computes the hit rate, i.e., precision, for a set number of
top scores for sensitive data samples. This is
used to assess recommendation systems. By default, the
top-3 hit rate is analysed.
Returns a float in the range
\([0,1]\).
Explanation:
number of samples, top scores, true top scores
Optional argument |
Role |
Values |
top |
parameter |
integer in the range \([1,\infty)\) |
toprec
Computes the recall for a set number of
top scores for sensitive data samples. This is
used to assess recommendation systems. By default, the
top-3 recall is analysed.
Returns a float in the range
\([0,1]\).
Explanation:
number of samples, top scores, true top scores
Optional argument |
Role |
Values |
top |
parameter |
integer in the range \([1,\infty)\) |
topf1
Computes the f1-score for a set number of
top scores for sensitive data samples. This is
the harmonic mean between hr and preck and is
used to assess recommendation systems. By default, the
top-3 f1 is analysed.
Returns a float in the range
\([0,1]\).
Explanation:
number of samples, top scores, true top scores
Optional argument |
Role |
Values |
top |
parameter |
integer in the range \([1,\infty)\) |
tophr
Computes the average hit rate/precession
across different numbers of top scores
with correct predictions. By default, the
top-3 average precision is computed.
Returns a float in the range
\([0,1]\).
Explanation:
number of samples, top scores, hr curve
Optional argument |
Role |
Values |
top |
parameter |
integer in the range \([1,\infty)\) |
Regression
Regression metrics assess scores that aim to reproduce
desired target scores. The following arguments need to be provided:
Argument |
Role |
Values |
scores |
system output |
any float array |
targets |
prediction target |
any float array |
sensitive |
sensitive attribute |
fork of arrays with elements in \([0,1]\) (either binary or fuzzy) |
max_error
Computes the maximum absolute error between scores and targets
for sensitive data samples. Returns a float in the range
\([0,\infty)\).
Explanation: ---
mae
Computes the mean of the absolute error between scores and targets
for sensitive data samples. Returns a float in the range
\([0,\infty)\).
Explanation:
number of samples, sum of absolute errors
mse
Computes the mean of the square error between scores and targets
for sensitive data samples. Returns a float in the range
\([0,\infty)\).
Explanation:
number of samples, sum of square errors
rmse
Computes the root of mse.
Returns a float in the range
\([0,\infty)\).
Explanation:
number of samples, sum of square errors
r2
Computes the r2 score between scores
and target values, adjusted for the
provided degree of freedom (default is zero).
Returns a float in the range
\((-\infty,1]\),
where larger values correspond to better
estimation and models that output the mean
are evaluated to zero.
Explanation:
number of samples, sum of square errors, degrees of freedom
Optional argument |
Role |
Values |
deg_freedom |
parameter |
integer in the range \([0,\infty)\) |
pinball
Computes the pinball deviation between scores
and target values for a balance parameter alpha
(default is 0.5).
Returns a float in the range
\([0,\infty)\),
where smaller values correspond to better
estimation.
Explanation:
number of samples
Optional argument |
Role |
Values |
alpha |
parameter |
float in the range \([0,1]\) |