Humanities datasets are collections of human experiences, interactions preserved by humans. This means that they are subject to biases, background assumptions and emotions. In Lab series two, during the visualization project, I really got to work with datasets. A lot of inequity/equity can come into play with the human agent: their biases, environment etc. It can also come into play with the structure in which the dataset is being hosted upon.
In the curation of humanities datasets, even in the selection process, inequity might arise. How do the curators deem what is worthy or unworthy of being added to a dataset? Do biases come into play? For example, if a person with biases against women decides to create a dataset of the greatest American writers, they may decide to deliberately exclude women from the list. As digital humanists, I think it is very important for us to question the veracity of the datasets we interact with.
Inequity might also arise on account of the environment in which people live in. People might judge content that they are familiar with or used to in their own country as more reliable and resourceful than content from strange environments. For example, non-western philosophy papers are often looked down upon because they are not published in western languages. In this way, potentially valuable content is ignored. This can create inequity in the curation of datasets.
Inequity/equity might also arise on account of the structure or organization that hosts the datasets. Data might be influenced based on the mission or goals of the organization. Structures like Project Gutenberg, DH ToyChest and OPenn can be important agents for equity in digital humanities.
I also think that the government has a role to play in everything. The invisible hand of the government is upon the internet. I think certain datasets on the internet might be compromised due to governmental influence. For example, if the narrative of a certain datasets does not align with governmental interests or current propaganda interests, such datasets might be tampered with.
As responsible digital humanists, I think it is our duty as eventual curators and analyzers of humanities datasets, to ensure that there is equity in the datasets we put out there. I think it is also important that we report or try to correct any inequity we spot in any of the datasets we come across.