Thursday, December 16, 2010

History through Google Books Ngrams

Each of these graphs plots over time the fraction that one word constitutes of all words in books mined from the Google Books project.

Westward expansion of the U.S.

One invested in railroads, the other, canals.

New religions of the last 200 years.


Note how people didn't start capitalizing "black" until the mid '60s, even though it had been used as a descriptor as often as "colored" for a long time. Note the different curves for "negress" and "Negress." Capital "Negress" rose in usage alongside "Negro woman" and was nearly as common in the '40s, but half as common in the '70s. Lowercase "negress" was mostly replaced by capitalized versions, which were then replaced as well.

Industrialization of food.

Pawpaws were more popular than blueberries, and gooseberries were more popular than raspberries!







Notice how HIV lags AIDS, and condom is correlated more with HIV than with AIDS.




Israel and Palestine as a people and as a place

The names of Muslims

It's 2:00 AM, and there's still worlds to explore just off the top of my head. This is a treasure trove. Google Books Team, this is an amazing accomplishment. I used to envision that this is how history would be done in 50 years when the digital age has ensured that anything that someone cares enough about to transfer is preserved. Now we're getting to do that sort of history, today.

Monday, July 19, 2010


The movie is great. Go see it. What I'm about to write will be entirely spoiler free.

Inception will leave you distrusting your own perceptions in the manner of Descartes's Demon, more so than The Matrix ever did. It does this I think because the experience of the characters in the movie entering and leaving dreams is so similar to our experience entering and leaving the movie. "Waking up" from a good movie is a very similar feeling to waking up from a dream.

When we talk about "suspending disbelief," it isn't supposed to be something you do, but rather something that happens automatically if the movie succeeds in drawing you in. When dreaming, we tolerate wild violations of physics and causality without it ever decreasing our emotional involvement. The objective of a good movie is to tap into that level of credulity, and it wouldn't surprise me if someone discovered that the same pathways in our brains are involved.

When Christopher Nolan was writing the dialogue about how addictive it is to be an architect of dreams, he must have been really talking about is how addictive it is to be an architect of shared dreams: movies.

Thursday, June 24, 2010

Men's Wearhouse is missing a major business opportunity.

Every male in the U.S. who goes to a prom or is someone's groomsman has likely been fitted for a tuxedo at the Men's Wearhouse. Theoretically, the Men's Wearhouse should have the most comprehensive database of men's sizing information of any company. There have been a lot of weddings among my close friends this year, and I assumed that after the first fitting I'd be able to rent subsequent tuxes without going to the store in person. Amazingly, according to one of the attendants, they throw away all information about the customer after 6 months.

This loss goes beyond convenient tuxedo rentals. With this much information about me, they could be delivering clothes custom-tailored overseas to my doorstep at a price no other company could match. That much opportunity is certainly worth the cost of a few dozen more machines. Ancillary data produced by your business can become more valuable than the profits from the business itself. Don't throw anything away.

Sunday, May 23, 2010

Janelle Monáe (Wow)

Imagine if Lauryn Hill became a geek, teamed up with Outkast, and wrote two concept albums about an android uprising in the futuristic city of Metropolis. The result would sound something like Janelle Monae's inaugural two albums, Metropolis, and The ArchAndroid, the first three acts of a six act story. The songs flow seamlessly (literally) from one to the next, usually jumping genres in the process. She can wail, she can scat, she can croon, she can rap, she can dance (oh she can dance), and she can sound like GLaDOS when necessary. Orchestral interludes bookend the chapters; In one, she puts words to a snippet of Claire De Lune.

Pitchfork loves her too. Historically, the intersection of the albums I like and the albums they bother to review has been approximately 0, so these two recommendations are about as independent as they come. Go have a listen! Every track can be heard for free at the links above.

My favorites so far: Sincerely, Jane, Tightrope, and Oh, Maker.

Sunday, March 07, 2010

Tax-time gripes

California Use-Tax

It's really not reasonable for me to try to add up all the money I've spent online (but only on items used in California, not those purchased as gifts to someone out of state.) This is the largest expenditure for me outside of rent and travel, and I don't remember everything I've bought and from where. and searching for "receipt" in gmail helps, but it's really a lost cause. I do my best, but it seems most people I talk to don't bother, despite it being the law. This is by far the most time-consuming part of filing my return, and only hurts me to do so.
A few things confuse me about it.
  1. Is shipping and handling included in the taxable portion? It's not "used" in California.
  2. Do online retailers have enough information themselves that they could provide some sort of automated assistance in tallying these?
  3. Have the courts decided whether use-taxes are constitutional despite being a tax on inter-state trade?
  4. Am I a chump for bothering?
Filing my taxes inevitably involves getting data from financial institutions that I haven't had any interaction with all year, and thus, remembering my various passwords. If you are designing a password system, please don't place restrictions on the content of the password. It's your responsibility as an administrator to keep your users' passwords from being brute-forced. Do not offload this responsibility to them.
Arbitrary restrictions I've seen today alone while trying to get forms from various financial agencies:
  • Must have at least 1 number.
  • Must have at least 2 numbers.
  • Must have at least one capital letter and 1 number.
  • Cannot have special characters.
  • Must be greater than 6 characters, but no more than 15.
  • Must be greater than 6 characters but no more than 8.
If I can't have a small secure set of passwords to use everywhere that I can reasonably remember, I'm just going to email them to myself, thus defeating every high-minded security best practice you are trying to implement.