library(rethnicity)
#> ══ WARNING: ══════════════════════════════════════════════════════ rethnicity ══
#> ! This package predicts race from names, with inherent limitations and bias risks. Use cautiously.
#> ! Critically examine methodology and results for biases and ethical implications.
#> ✖ Results should not be considered definitive and must NOT be used for discrimination of any kind.
#> ✖ Intended for academic research purposes only, NOT for commercial use.
#> ══ INFO: ═════════════════════════════════════════════════════════ rethnicity ══
#> ℹ For detailed documentation, visit: rethnicity homepage (<https://fangzhou-xie.github.io/rethnicity/index.html>) and methodology paper (<https://www.sciencedirect.com/science/article/pii/S2352711021001874>).
#>
#> ══ CITATION: ═════════════════════════════════════════════════════ rethnicity ══
#> ℹ Please use `citation("rethnicity")` to cite my work, thanks!
I built this package to help applied researchers for research on ethnic equality/inequality. More specifically, this package provides a race-prediction method based on names. I designed the package in such way that the method is empowered by deep learning models, without the need to install the deep learning libraries, the installations of which are usually a daunting task. Hence, the methods provided in this package are not designed to be updated/fine-tuned/trained on custom datasets. This is the trade-off one has to be willing to make for the ease of use.
That said, from version 0.2.0
onward, I provide two
additional lower-level functions: predict_fullname
and
predict_lastname
, which would allow users to provided their
customized models. (There is only one function prior to
v0.2.0
: predict_ethnicity
. This function is
still the RECOMMENDED one to use for most people.)
Since the package disables training by design, you need to train your
own model in Keras and then convert the trained model to
.json
format by the frugally-deep
project.
If you are reading this vignette, most likely you know what you are
doing and you must have heard Keras
. Otherwise, you will
have to stick to the default method predict_ethnicity
.
You can refer to the following links to see how I trained the models and create your own version: fullname model, lastname model.
Before training the model, you need to process your dataset and you
will need to use keras.utils.to_categorical()
to transform
the outcome variable into integers and you need to know the mapping
between them. For example, 0, 1, 2, 3
refer to
asian, black, hispanic, white
respectively. You will need
this and we will call it
labels = c("asian", "black", "hispanic", "white")
.
Just remember to save the model without the optimizers (more on the
frugally-deep
website):
model.save('keras_model.h5', include_optimizer=False)
.json
Then, use the convert_model.py
script to convert your model into .json
format. This is
what I did as well. You will encounter an error in the conversion
process, if you include the optimizers in the saved model.
python convert_model.py keras_model.h5 keras_model.json
Now you have the model trained and converted and you need the file path of this model file. I am loading the default models without training new ones.
# remember the list of labels we mentioned?
labels <- c("asian", "black", "hispanic", "white")
# change to your own model file path
model_path <- system.file("models", "fullname_aligned_distill.json", package = "rethnicity", mustWork = TRUE)
# run the prediction
predict_fullname(firstnames = "Alan", lastnames = "Turing", labels = labels, model_path = model_path)
#> firstname lastname prob_asian prob_black prob_hispanic prob_white race
#> 1 Alan Turing 0.02842531 0.2051059 0.02074102 0.7457278 white
In fact, if you tweak the code to predict gender from names, this will also work.