Forks
Forks are flexible data structures that store and manage multidimensional sensitive attributes. They are made up of named data elements called branches. Each branch represents a specific sensitive attribute value. For example, you might have a branch for each gender, which would be a binary array indicating the presence of that attribute in data samples.
Click on the buttons below to see how to create forks with various coding patterns. As a preview of a common case, here's how to define a sensitive attribute fork for intersectional fairness (for a full example using this definition look at the quickstart):
gender = ... # iterable (e.g., list) of gender attribute for each data sample
race = ... # iterable (e.g., list) of race attribute for each data sample
sensitive = fb.Fork(fb.categories @ gender, fb.categories @ race)
sensitive = sensitive.intersectional()
Creating forks
Generate forks by passing keyword
arguments to a constructor.
For example, men
, women
, and nonbinary
are
branch names. Branch values can be anything,
though they will usually be lists,
numpy arrays or deep learning tensors.
Provide any number of branches
with any names and access their values
like members of the fork object.
To set branch names programmatically or use names with invalid characters, pass a dictionary mapping names to values as a positional argument. You can do this in addition to branches declared via keyword arguments. Access branch names as strings by treating the fork as a dictionary.
For multiple sensitive attributes and attribute values, add branches for each attribute value to the same fork. For instance, you can have branches for different gender values and a binary attribute for age.
To add multiple sensitive attributes without worrying about conflicting branch names, pass dictionaries as positional arguments. This prepends the argument name to all generated branch names.
Info
When working with specific backends, branch values are internally converted to appropriate data types (e.g., arrays or tensors).
Unpacking
To create forks by analyzing categorical
values found in iterables, use the
categories@
operator.
For example, appled on a list with entries "Man" and "Woman"
this operator creates two branches storing
binary membership for each.
You can add the outcomes of multiple category analyses to a fork. Use named keyword arguments to prepend the that name to branch names, or put all category analyses as positional arguments to just merge their branches. Any Python iterable can be analyzed into categories. This includes lists, pandas datafragme columns, categorical tensors, or numpy arrays.
Intersectionality
When dealing with multiple sensitive attributes, branches for different attributes will often have overlapping non-zero values. This means that certain groups may intersect. For example, some blacks may also be women.
To consider intersectional definitions of fairness, create all branch combinations with at least one data sample by using the intersectional method of forks.
You may want to allow empty intersections, because
some report types can handle them. To do so, you should not
use the intersectional method but
explicitly combine the outcome of categorical analysis
for multiple attributes with the bitwise and &
.