Forks

Forks declare variables with several named values, called branches. Branches are typically binary arrays of attribute values. For instance, there may a branch for each gender or race to capture which data samples exhibit those protected attribute values.

  1. Fork definition
  2. Intersectional analysis

Fork definition

Generate forks by passing keyword arguments to a constructor. In the snippet below men, women, and nonbinary are branch names. Branch values can be anything, though they will usually be arrays. Provide any number of branches with any names and access their values like object members, as shown below.

Info

When working with specific backends, branch values are internally converted to appropriate data types (e.g., arrays or tensors).

import fairbench as fb
import numpy as np

sensitive = fb.Fork(men=np.array([1, 1, 0, 0, 0]), 
                    women=np.array([0, 0, 1, 1, 0]),
                    nonbinary=np.array([0, 0, 0, 0, 1]))
print(sensitive.nonbinary)
# [0, 0, 0, 0, 1]

To set some (or all) branches programmatically pass a dictionary as a positional argument like so:

sensitive = fb.Fork({"non-binary": np.array([0, 0, 0, 0, 1])}, 
                    men=np.array([1, 1, 0, 0, 0]), 
                    women=np.array([0, 0, 1, 1, 0]))

To create forks by analysing categorical values found in iterables prepend the latter with the categories@ operator. For example, the following code snippet creates two branches genderMan,genderWoman and stores binary membership to each of those.

fork = fb.Fork(gender=fb.categories@["Man", "Woman", "Man", "Woman", "Nonbin"])
print(fork)
# genderMan: [1, 0, 1, 0, 0]
# genderWoman: [0, 1, 0, 1, 0]
# genderNonbin: [0, 0, 0, 0, 1]

Add the outcomes of any number of category analyses to a fork. Use positional arguments (instead of named keyword arguments, such as gender in the above example) to avoid prepending the keyword argument's name to branch names. Any iterable can be analysed into categories instead of a list, including categorical tensors or arrays.

Intersectional analysis

For more than one sensitive attribute, add the branches you would declare for every attribute more branches of the same fork. For instance, the following is a valid fork that considers three gender attribute values and one binary sensitive attribute value for young vs old people:

import numpy as np

sensitive = fb.Fork(fb.categories@["Man", "Woman", "Man", "Woman", "Nonbin"], 
                    IsOld=np.array([0, 1, 0, 1, 0]))
# Man: [1, 0, 1, 0, 0]
# Woman: [0, 1, 0, 1, 0]
# Nonbin: [0, 0, 0, 0, 1]
# IsOld0.0 [1, 0, 0, 0, 1]
# IsOld1.0 [0, 1, 1, 1, 0]

For more than one sensitive attributes, branches capturing the values of different attributes will have overlapping non-zeroes. Thus, you might want to consider intersectional instead of more naive definitions of fairness by creating all branch combinations with at least one data sample per:

sensitive = sensitive.intersectional()