Big Data As A Lens On Human Culture

Erez Aiden and Jean-Baptiste Michel

In the late C13, a new invention, eyeglasses, spread like wildfire through Italy. In a few decades they went from non-existent to merely exotic to utterly commonplace. Fashion and function combined in an early triumph of wearable technology. Inevitably, people began to experiment with the technology. They found they could combine to make compound lens to make microscopes and telescopes. And of course, Galileo's telescope didn't just change religion, it ushered in the modern world.

Nathan Myhrwold (who also founded Microsoft Research and wrote the book on molecular gastronomy) is the most successful dinosaur hunter of modern times. His team use detailed geology maps and satellite images to figure where they are most likely to find bones. Since 1999 they have found nine T rex skeletons - in the previous 90 years only 18 had ever been found.

Usage determines which word variants survive. We say 'drove' rather than 'drived' because drove gets used a lot. But 'throve' has gone, replaced by the more regular 'thrived'. In progress is 'wedded' which is becoming more common than 'wed'

Burn/burnt, dwell/dwelt, learn/learnt, smell/smelt, spell/spelt, spill/spilt and spoil/spoilt all follow the same rule and so prop each other up in terms of frequency. They have stayed irregular for a lot longer than would be expected based on how often they are used. But the alliance is breaking up. Now dwell/dwelt is the only one consistently irregular. The US preference for burned, learned, smelled, spelled, spilled and spoiled has spread to UK

By 1900, English had about 550,000 words, and din't add many by 1950. But from 1950 to 2000, it almost doubled in size. Adding over 8000 words a year, so more than 20 words added today. (Based on a Zipf score that meant a word appeared at least 50 times in the 50 billion words collected by Google Books scans)

For centuries, the Vatican library didn't have an index. If you wanted to find a book, you had to ask if anyone knew where it was.

