For the lab series 2, I decided to curate a dataset from DH Toy Chest. The dataset, named “North American Slave Narratives” on DH Toy Chest contains 344 North American Slave narratives. The dataset was chosen because I wanted to paint as accurate a picture as I could, of the lives of the men and women that lived in oppression in that era. These are their own personal accounts; the way they saw the world in which they lived in. I believe that there are lessons to be learnt from the texts. I hoped to notice from the text visualization what their lives were like, in spite of the oppression they faced.
The dataset consists of autobiographies, journal entries and general commentary of then North American slaves. After curation of the dataset, I had to use my discretion as the editor to select or remove particular pieces of text. The process of cleaning involved removing lines of text that I thought were not relevant to the narratives. Things like the Preface, Index, Appendix and Editor’s note were taken out. I felt that the effects of these were minute compared to the actual accounts.
Another important process that occurred in the process of cleaning was the editing of text. Some of the texts had grammatical errors. One of my main goals was making sure the accounts were as true to their narratives as possible; and I was not sure if changing some of the actual words they wrote was taking away from the authenticity. Eventually, I settled for editing errors that seemed to be made repeatedly, rather than one-off errors.
After editing the text, I compressed it into a file and used the tool Voyant. I decided to use Voyant because it has a lot of tools that can be used to present textual data. I particularly like the Cirrus tool in Voyant because you can really get to see the words that comr up the most in your text, in a visually appealing format. In using Cirrus, another form of editing also had to take place. I noticed words like “mr”, “mrs” and “man” appearing and so I had to edit the controls to exclude them from the list of words to be considered. This was another instance of when my editor’s discretion had to come in and also, my main objective for the visualization of the text. To further visualize the text, I used the “Trends” tool. I used keywords such as “sad”, “free” and “slavery” to figure out which of the accounts had the highest frequency of those words. I also searched for words like “laugh”, “joy” and “love” and contrasted it with the ones with negative undertones to see if I could find any one account with generally high frequencies in both categories. I also used the “BubbleLines” tool. Using search terms such as “war”, I was able to trace the timeline in each account, of where there seemed to be such a large mention of war. This was particularly useful because it helped to carved out sort of distinct sections of each text that I could further analyze.
The dataset project helped to present the data in such a way that I was able to think of the writers behind the narratives and tune in to what life might have been for them. It helped me to identify how their lives were disrupted by periods of civil unrest or even, how lives will always bear the effect of injustice regardless of the passage of time.