Ngrams and History: facts of words

One of my keenest interests in both my academic and personal worlds has been the Protestant movement. An avenue of research that I am considering for another Master’s degree (why would I be THAT crazy?!) is the role and impact of the Protestant movement on the socio-political face of the 18th, 19th, and 20th centuries. A great way to begin that path of inquiry is by doing some distant reading using text-analysis programs like Google Ngrams. These programs look at the frequency of a given query throughout a corpus within a given time frame.

Of specific interest to me was the occurrence of the word Protestant in the English language (as it’s the only language I’m completely fluent in currently) between the time frame 1600 to 2000. I chose these dates because it would allow me to see if any trend existed in the use of the word Protestant prior to my centuries in question without focusing on the rise of Protestantism in the 16th century (Martin Luther’s 95 Theses and King Henry VIII’s separation from the Catholic Church). Even though these two instances sparked the beginnings of the Protestant movement across Europe, I wanted to focus on a more recent expression of the impacts of these two events.

I began by searching for Protestants using Google Ngram Viewer, allowing for both a capitalized and non-capitalized search of the word. This increases its likeliness of being associated with groups of individuals, versus potentially looking at texts regarding a single denomination.Protestants_protestants_ngram

There’s a very clear spike in the word around the 1675 mark in the capitalized version of the word that I find fascinating, and a smaller spike around 1725 and 1750. As this searches through all texts in Google Books in English, it isn’t entirely surprising that those last two periods see a spike, as two predominantly Protestant nations were experiencing increased tensions at that point (thanks, King George III). While this doesn’t particularly offer answers (and actually poses new questions for me), it does provide an interesting look into the uses of the word protestant. What were people writing about that they utilized the lower-case version of the word? Is this a text-recognition software issue, or was it intentional?

I then searched for the term using the Time Magazine Corpus, searching for just the frequency of the word Protestant. Fun fact: it did not like me capitalizing the word. So I searched for the frequency of the word without the capital P and found 4515 occurrences of the word Protestant between 1923 and 2000. protestant_timeCorpus_freq

Interesting indeed. I then did a collocation search of 3 words before and after the original parameters. Below is a screenshot of the top 32 words by frequency.Protestant_timeCorpus_hist300

The Protestant church must play a bigger role (at least in the news) in the 20th century than I originally thought! What luck! But this doesn’t line up with the findings from the Google Ngram Viewer, where it showed a decline in the usage of the word Protestants after 1850. More questions!

Having had some interesting results with these two programs, I decided to try a targeted corpus with a wider range than just the Time Magazine corpus. Going to Bookworm, I searched through the HTRC (Hathi Trust Digital Library) Corpus, a digitized collection of public domain texts from multiple institutions. Searching Protestant across all texts once again yielded a spike in the early 1800’s. bookworm HTRC ngram protestant

This spike is much more pronounced than the spike from the Google Ngram Viewer, and I’m not entirely certain what could account for it, outside of the targeted digitization efforts of the institutions that are part of the Hathi Trust Digital Library.

As an aside, I did try searching for Protestant in the Open Library corpus with more targeted search fields, but I was either barking up the wrong tree or being a bit too specific in my queries, as I found no results.  bookworm ngram protestant

While these datasets don’t answer ANY questions at all, they serve as good starting points for directions of inquiry later. I may not be embarking on that second Master’s degree just yet (let’s get through toddler years first maybe), but when I decide to start the research, I’ve got the tools to help me get started, and hopefully more texts will be digitized for me to distance-read by that point.

Leave a comment

Create a free website or blog at WordPress.com.

Up ↑

Archives Gig

careers, jobs, and internships in the world of archives & records management.

Anabaptist Historians

Bringing the Anabaptist Past into a Digital Century

infoliterati

for librarians, researchers and educators interested in information management

The Interpretation Game

Cultural Heritage and the Digital Economy

this is a digital information technology

library and information science

Digital History

Savannah Williams

allforaclass.wordpress.com/

to record my discoveries

Exploring Digital History

And diving into Virtual Reality