+ 500 Billion Words Open a Window on Culture

other topics: use search box or click a “category”

Erez Lieberman Aiden, a junior fellow at the Society of Fellows at Harvard, and a postdoctoral fellow Jean-Baptiste Michel, assembled a mammoth database with Google, according to an article by Patricia Cohen in the New York Times.

The database, culled from nearly 5.2 million digitized books available for free downloads and searches, will open a landscape of possibilities for research in education and the humanities.

The intended audience is scholarly.  However a simple online tool allows almost anyone to plug  in a string of up to five words and see a graph that charts the phrase’s use over time.  According to Cohen, this can be highly addictive.

She says that with a click you can see that “women” in comparison with “men” is rarely mentioned until the early 1970s; the lines cross paths about 1986.

Mickey Mouse and Marilyn Monroe are not as popular as Jimmy Carter.  There are many more references in English than in Chinese to “Tiananmen Square” after 1989.  The word “grilling” ascends over the years and outpaces “roasting” and “frying” by 2004.

Says Lieberman Aiden, “The goal is to give an 8-year-old the ability to browse cultural trends throughout history, as recorded in books.”

Their study, published today in the journal Science, offers a “tantalizing taste of the rich buffet of research opportunities now open to literature, history and other liberal arts professors who may have previously avoided quantitative analysis,” writes Cohen. 

The journal is taking the unusual step of making the paper available online to non-subscribers.

Steven Pinker, a linguist at Harvard who collaborated on the Science paper’s section about language evolution, has been studying changes in grammar and past tense forms for two decades.

They tracked the way English verbs that did not add “ed” at the end for past tense — “learnt,” for example — evolved to conform to the common pattern “learned.”

“When I saw they had this database, I was quite energized,” says Pinker.  “There is so much ignorance.  We’ve had to speculate what might have happened to the language.” 

The information about verb changes “makes the results more convincing and more complete,” he says.  “What we report in this paper is just the beginning.”

The data set can be downloaded for users to build their own search tools.

Writes Cohen

Working with a version of the data set that included Hebrew and started in 1800, researchers measured  the endurance of fame, finding that written references to celebrities faded twice as quickly in the mid-20th century as they did in the early 19th.

Looking at inventions, they found technological advances took, on average, 66 years to be adopted by the larger culture in the early 1800s and only 27 years between 1880 and 1920.

Humanities scholars, however, have been muted in their response to the article, finding some claims exaggerated.  Louis Menand, an English professor at Harvard says he is troubled that among the 13 named authors there is not a single humanist listed.

“There’s not even a historian of the book connected to the project.”

Lieberman Aiden says the researchers don’t want humanists to accept any specific claims: “We’re just throwing a lot of interesting pieces on the table.”

Researchers are calling the method “culturomics.”

Read Patricia Cohen’s entire article at     http://www.nytimes.com/2010/12/17/books/17words.html?_r=1&scp=2&sq=Patricia%20Cohen&st=cse

tutoring in Columbus OH:  Adrienne Edwards  614-579-6021  or email  aedwardstutor@columbus.rr.com


Comments are closed.