Be still our beating heart: someone out there thought it would be fun to talk about. . .wait for it. . .you’re right the suspense is killing us, too. Metadata. Digitizing books isn’t just about digitizing words, it’s about digitizing information about the works that contain the words.

Too much fun. Also a boon for those who think statistics can be relaxing. You know who you are and you promised to get help.

We’ve been managing book metadata basically the same way since Callimachus cataloged the 400,000 scrolls in the Alexandrian Library at the turn of the third century BC. Callimachus listed the library’s contents on scrolls, Medieval librarians used ledgers, and we use card catalogs, now mostly electronic. But until information started moving online, the basic strategy has been the same: Arrange the books one way on the shelves, physically separate the metadata from them, and arrange the metadata in convenient ways.

Yes, now we have the opportunity to try new and different things. Not that the Dewey Decimal System isn’t dandy, but it can be a bit arbitrary. So it goes:

You may file a field guide to the birds under natural history, while someone else files it under great examples of the illustrative art and I file it under good eating.

Like all good stories about metadata, this one is a call to arms with an unfortunate roadblock. Based on yesterday’s reading assignment, let’s see if we can identify the fatal flaw:

Second, we’re going to need massive collections of metadata about each book. Some of this metadata will come from the publishers. But much of it will come from users who write reviews, add comments and annotations to the digital text, and draw connections between, for example, chapters in two different books.

Hint: Amazon and patents.

