Bits of Books - Books by Title

Dataclysm

Who We Are When We Think No One's Looking

Christian Rudder

Describes those who disparage modern media such as Twitter as "the weathered sentinels atop Fortress English".

Twitter has recorded more words in last 2 years than in all the other writing to date - books, papers, magazines etc. This database, plus the Google library of 30 million digitized books, has given rise to a new field of study called culturomics.

OK Cupid site found that when Apple launched smartphones, message length dropped by 1/3, to around 100 characters, mainly because people now typing with a tiny keyboard. In fact, the best messages, the ones which got the highest response rate, are now only 40 to 60 characters long.

Templates work. This guy is trying to pick up women who smoke and are into art:

"I'm a smoker too. I picked it up when back-packing in May. It used to be a drinking thing, but now I wake up and fuck, I want a cigarette. I sometimes wish I worked in a Mad Men office. Have you seen the Le Corbusier exhibit at MoMa? It sounds pretty interesting. I just saw a Frank Gehry (sp?) display last week in Montreal, and how he used computer modelling to design a crazy house in Ohio."

42 different woman got this message.

More books on Dating

On OK Cupid people are given about 300 parameters they can use to evaluate potential dates. Many of them, people choose as deal-breakers. But the data shows that the only 2 questions that actually predict future compatibility are 1) Do you like scary movies? and 2) have you ever travelled to a foreign country by yourself? As long as both give same answers to these, you probably have basis of a relationship.

So, do people over-select, simply because they can?

OK Cupid ran a short-lived Blind Date expt - you could only see the first names of other party - no pics, no data. About 10,000 people tried it, and all the data showed they got a lot out of it - they met, and got on well with, people who they otherwise would not have taken the time with.

Author suggests that we erect these artificial criteria which actually aren't relevant, and by doing so we restrict the people we are prepared to try. To paraphrase Jagger: you can always get what you want, but it's hard to get what you need.

Interesting take on beauty. Well established that good-looking people have irrational advantages - better hired, paid, promoted; better treated by justice system. Beautiful women get treated by employers in same way as they do on OK Cupid. Author suggests that this works like the Peter Principle - the beautiful women get promoted into areas where their incompetence shows up, so eventually they have to be sacked. (Old joke - "Have you been shagging my secretary?" "No" "Are you sure?" "Absolutely" "OK then you can fire her".) So this then contributes to the overall image of women not being 'good enough' to do the top jobs - the wrong criteria are being used, so the wrong people are being promoted.

More books on Beauty

More books on Women

More books on Work

"... phones and services like Twitter demand their own adaptations. The eternal here is that writing, like life itself, abides. It changes form. It replicates in odd ways, it finds unexpected niches...we are living through writing's Cambrian explosion, not its mass extinction."

More books on Words

Is Google auto-complete accentuating stereotypes? A user starts to type an unrelated query, and other people's prejudices jump in the way, often suggesting ideas you hadn't even contemplated. It's not PC to say certain things, but suddenly you are shown that significant number of people are searching on those terms.

Twitter/Facebook mobs attacking some careless/stupid/vapid posting - equivalent of OT stonings: lots of emotional excitement and righteous wrath, but very little thought or responsibility.

Bass beer's triangle logo was the first registered trademark in the world, and they make a lot of that fact right on the label. What they don't tell you is that the Bass clerk happened to be first in line the day the British Trademark Registration Act took effect. Brands themselves go back forever - found in Egyptian tombs.

More books on Trade

it is (much) harder to get a million followers on Twitter than it is to make a million dollars. There were 300,000 Americans who reported an income over $1 million in 2011; right now there are 2600 Twitter accounts with over a million followers, of which half are in the US.

Downside of social media - employers etc using data to categorise you. Not just the drunken pics or racist abuse that most people are canny enough to edit out, but the inferences that can be drawn about IQ or whether you are likely to use drugs. That gives you no choice but to game the system - to beat the machine you have to act like a machine, which means you've lost to the machine.

(New Scientist)

FOUR years ago I interviewed Sam Yagan, then CEO of OKCupid, about the mathematics underlying his free matchmaking site. Yagan explained how they had cracked the love problem. The algorithms underlying his site couldn't understand human emotion, but that didn't matter. He simply had to chop people's behaviour up into morsels a computer could digest: grist into a data mill.

I came away with the sense that something big was on the horizon: a time when machines could predict your idiosyncrasies without ever understanding them. Our shared future wouldn't involve machines condensing into human-like androids, but great whirring server farms abstracting messy humans into clean mathematical patterns.

Dataclysm by Christian Rudder maps what that future might look like. Having co-founded OKCupid with Yagan and two friends, Rudder wrote the OKTrends blog to share the company's quirky discoveries, from which camera makes you most attractive (Panasonic Micro) to the best chat-up line ("How's it going?").

When internet conglomerate IAC (owners of paid dating site Match.com) bought OKCupid in 2011, an OKTrends blog post detailing why you should never pay for online dating vanished overnight, and the blog ceased publishing.

Now Christian Rudder has returned, clutching a book packed with discoveries gleaned from the deluge of big data. Dataclysm is packed with the kinds of bon mots and revelations that made the OKTrends blog such a success.

The book is wider in scope, however: Rudder draws from big data sets - Google searches, Twitter updates, illicitly obtained Facebook data passed shiftily between researchers like bags of weed - to draw out subtle patterns in politics, sexuality, identity and behaviour that are only revealed with distance and aggregation.

The true aim of Rudder's book is not an examination of us, but of the data itself.

Given just a few metrics, commercial and governmental algorithms can prise open your private life and glean secrets never spoken aloud. From your history of purchases of everything from vitamins to handbags, for example, such algorithms can determine whether you are breaking up with a partner, or heading for bankruptcy. They know when you are pregnant - and what your due date is. They are already used routinely to sell you things, and, of course, to sell you.

The book has a couple of understandable but important limitations. It is firmly focused on the US, so its conclusions may not be globally relevant. Then, as Rudder himself points out, he has to be careful not to draw too many grand inferences from his store of impressive, but not gigantic, data sets. It should also be noted that while the book is punctuated by clean, compelling graphics, some of his trend lines buck and sawtooth in ways that hint at big variances that go unacknowledged in the text.

It's frustrating that Rudder spends so long examining differences between established groups of people, and less time using trends to identify hitherto hidden subcultures. It's also frustrating that he often doesn't pursue the questions he raises. For example, he documents how threatening mobs gather on Twitter, and how the search results of dating sites all too easily acquire a racist edge, but in neither case does he explore how measuring such patterns might help us combat them. (Maybe no one has really tried.)

Dataclysm will entertain those who want to know how machines see us. It also serves as a call to action, showing us how server farms running everything from home shopping to homeland security turn us into easily digested data products. Rudder's message is clear: in this particular sausage factory, we are the pigs.

More books on Computers

(Scientific American)

One of the beautiful things about digital data, besides its sheer volume, is that it has both physical and social dimensions. A piece of paper has two axes, space-time has four. String theory predicts that our physical existence requires somewhere between 10 and 26 dimensions. Our emotional universe surely has that many and more. And in combining these spaces - our interior landscape with our external world - we can portray existence with a new depth.

Websites and smartphones are gathering ample location data. Tweets are geotagged with latitude and longitude; Facebook asks for your hometown, your college town, your current home; many apps know the very building you're standing in. We can layer identity, emotion, behavior, and belief over our physical spaces and see what new understandings emerge. We can look at how location shapes a person, and how people have laid new borders over our old Earth.

The boundaries of many communities were created by fiat or accident - or both. The United States and the USSR split Korea on the 38th parallel because that line stood out on a map in an officer's National Geographic. Earlier that same month, Germany was divided into zones of occupation that reflected, more than anything else, whose troops were standing where at the time. Many of our own American states were created by royal charter or act of Congress, their borders drawn by people who would never see the land in person.

For websites, political and natural borders are just another set of data points to consider. When information - fluid, unbounded, abstract - is your currency, the physical world with its many arbitrary limits is most often a nuisance. At OkCupid, rivers are an endless irritant to the distance-matching algorithms. Queens is both half a mile and a world away from Manhattan. Try explaining that to a computer. The problem is that when a person is online, he or she is both of the world and removed from it. But that duality also means we can remix our physical spaces along new lines, ones perhaps more meaningful than those drawn by plate tectonics or the dictates of some piece of parchment.

Smartphones, each one with a tiny GPS pinging, have revolutionized cartography. Matthew Zook, a geographer at the University of Kentucky, has partnered with data scientists there to create what they call the DOLLY Project (Digital OnLine Life and You) - it's a searchable repository of every geotagged tweet since December 2011, meaning Zook and his team have compiled billions of interrelated sentiments, each with a latitude and longitude attached. DOLLY is an incredibly versatile resource, the output of which is only now being explored. For Zook, it's already had a few highly personal applications. In February 2012, his office in Lexington was shaken by an earthquake, and he turned to the database to see the psychological aftershocks. The map below shows the density of reaction on Twitter, plotted over the physical epicenter of the fault. Here we see contours of surprise laid over the shifting Earth:

Zook discovered that the quake's emotional epicenter was just northwest of the seismic one, in Hazard, Kentucky.

But Zook's map shows people's instantaneous reaction to an event that lasted a split second. Surveying Kentuckians later, even with infinite effort, he couldn't have generated a true report - not only do emotions change in the remembering, but media coverage and talk about the quake would've hopelessly polluted the data. People with smartphones don't make seismographs obsolete but Zook's plot reflects the 'impact' of the earthquake in a much more direct way than the old Richter scale. Knowing nothing else about a quake, if it were your job to distribute aid to victims, the contours of the Twitter reaction would be a far better guide than the traditional shockwaves around an epicenter model.

Even though each one is transitory, tweets collected together can capture more than ephemera. A demonstration of DOLLY's power on YouTube shows it tracking the Dutch holiday of Sint Maarten, a sort of Germanic Halloween where children go door-to-door singing for candy. In the data, you see people celebrating not only in the major population centers of the northern Netherlands, as you'd expect, but also in Western Belgium - the tweets reconnect old Holland to Flanders, its cultural cousin. Thus we watch an animated visualization of GPSenabled data points, and see shadows of the Habsburgs.

Given the power of what we can already see through software like DOLLY, the lack of longitudinal data is especially painful. On today's research corpus, time often feels like a phantom limb. Twitter currently gives us so much of that multidimensional promise: We have every emotion, we have every spot on the globe, but we still have only a few years to work with. In Europe, where the combination of geography, culture, and language has been so volatile over the centuries, imagine being able to track the Alsace-Lorraine as it changed hands - German, French, German, French - each government imposing its culture on the people, as if the region were a house taking on coats of paint. Or imagine the Caribbean basin in the late fifteenth century and being able to watch first the soldiers, then their religion, then their language overwhelm the land, Arawak to Aztec. To see the ebb and fracture of a culture over decades is what DOLLY was built for. All it needs now is the decades themselves.

Geocultural insights can be found in other sources, too, and though in most of them you lose the immediacy of Twitter, you get a different kind of depth in its place. When websites pose questions directly to their users, we have a chance not only to refine borders but to show they don’t really exist as normally conceived.

Below are 1 million answers to 'Should burning the flag be illegal?' collected by OkCupid. Here my mapping software drew no political or natural boundaries, it just organized belief according to latitude and longitude. This is truly a nation defined by its principles, or, as you can see, two nations: Urban and Rural. You can even see where one encroaches on the other: the rural communities up the Hudson River and in Northern California's wine country, built up with Big City money, have Big City opinions as well.

Similarly, and in support of the earlier Google Trends finding that homosexuality is universal, we see that same-sex searches have no borders, no state, no country. Below is a plot of gay porn downloads, by IP address, taken from the largest torrent network, Pirate Bay. This map, too, is without any pre-drawn guides, and as opposed to the OkCupid plot above, its theme is solidarity: from Edmonton and Calgary down to Monterrey and Chihuahua, this is just where people live.

There are as many ways to draw maps as there are sources of data. We've been slowly working our way up off the page, building a psychological dimension - how we feel about the flag, porn - on top of our maps. But it's possible to go the other way: Data can tie abstractions back down to Earth. Take cleanliness, again via OkCupid. This is how often people say they shower:

On the one hand, the broad trend merely reflects the weather: Where it's hot, people shower more. But down in the details there are a pair of good stories. In Jersey's lightness, you can read the gym/tan/laundry grooming obsession of Pauly D and the Situation - Jersey is much more fastidious than the surrounding states. And in Vermont you find the opposite philosophy: The crunchiness is more than just a stereotype. Vermont's the most unwashed state overall, and truly an outlier compared to its immediate neighbors. According to Google the state animal is the Morgan Horse. It should be a white guy with dreads.

Politics, weather, Walmart, and certainly earthquakes all have a strong connection to the physical world, but in some of our data we can begin to see an exclusively inner geography. Take lust, which in theory, should have no state. But here we see it does, and a surprising one:

This pattern comes up again and again on OkCupid - the north central and west of the country is more sexually open, more sexually adventurous, and more sexually aggressive. Up the Pacific Coast you'd perhaps expect such unconventional attitudes, but for many of these red-meat states, it goes against type. Politically, OkCupid's users in, say, the Dakotas are as conservative as their reputation. Their profile text isn't much different from anyone else's. For all other indicators, the states should not be dark, but in the data we see a mysterious sexual intensification. This unexpected pattern reveals a further power in Internet data; we can now discover communities that transcend geography, rather than reflect it.

This data above does not prove that the Mountain Time Zone is one big high-plains makeout party. In fact, the explanation is rather banal: If you are looking for people to have sex with in a place like Pierre, South Dakota, your local options are limited. So you try a dating site to find what you want. It's simple selection bias in our data, but there's meaning there: Where people can't find satisfaction in person, they create alternative digital communities. On a dating site, that means communities with similar sexual interests. On other sites with more diverse aims, where the users aren't just there to flirt in groups of two (and occasionally three), you get something richer.

Reddit is the fulfillment of that earliest ambition of the Internet - to bring far-flung people together to talk, debate, share, spread news, and laugh. To collapse space and create personal closeness Here, I've plotted the 200 most popular topics, and this is something you could properly call 'the United States of Reddit.' It's a geography like the Craigslist division we saw before - made, in fact, by a similar algorithm - but instead of physical geography, it plots a geography of interests, of the collective Reddit psyche. And it shows distinct yet connected communities. The size of each state corresponds to the popularity of the topic, and the software put 'like with like,' according to cross-commenting between subreddits.

My favorite game, Magic: The Gathering (magicTCG), is correctly surrounded by its unfortunate natural friends MensRights, whowouldwin, and mylittlepony. Similarly, many sports (nfl, nba, formula1, and so on) are grouped at the bottom. Everything pokemon is clustered over to the left. British problems, along the right edge, is next to australia and soccer. It also makes sense that the most popular subreddits are in the center - that is, not too far from anything. The red tint corresponds to how tight-knit each subreddit is. It shows the degree to which the people posting post only there. The darker the red, the more isolated the thread. This whole thing is an abstraction, but it shows how people can locate themselves by what they find interesting or funny or important rather than where they happen to sleep at night. It's a map of one particular collective consciousness.

Books by Title

Books by Author

Books by Topic

Bits of Books - Books by Title

Dataclysm

Bits of Books To Impress