Tuesday, January 20, 2015

Finding ISBNs in the the digits of π

For some reason, a blog post from 2010 about searching for ISBNs in the first fifty million digits of π suddenly became popular on the net again at the end of last week (mid-January 2015). The only problem is that Geoff, the author, only looks for ISBN-13s, which all start with the sequence "978". There aren't many occurrences of "978" in even the first fifty million digits of π, so it's not hard to check them all to see if they are the beginning of a potential ISBN, and then find out if that potential ISBN was ever assigned to a book. But he completely ignores all of the ISBN-10s that might be hidden in π. So, since I already have code to validate ISBN checksums and to look up ISBNs in OCLC WorldCat, I decided to check for ISBN-10s myself.

I don't have easy access to the first fifty million digits of π, but I did manage to find the first million digits online without too much difficulty.

An ISBN-10 is a ten character long string that uniquely identifies a book. An example is "0-13-152414-3". The dashes are optional and exist mostly to make it easier for humans, just like the dashes in a phone number. The first character of an ISBN-10 indicate the language in which the book is published: 0 and 1 are for English, 2 is for French, and so on. The last character of the ISBN is a "check digit", which is supposed to help systems figure out if the ISBN is correct or not. It will catch many common types of errors, like swapping two characters in the ISBN: "0-13-125414-3" is invalid.

Here are the first one hundred digits of π:

To search for "potential (English) ISBN-10s", all one needs to do is search for every 0 or 1 in the first 999,990 digits of π (there is a "1" three digits from the end, but then there aren't enough digits left over to find a full ISBN, so we can stop early) and check to see if the ten digit sequence of characters starting with that 0 or 1 has a valid check digit at the end. The sequence "1415926535", highlighted in red, fails the test, because "5" is not the correct check digit; but the sequence "0781640628" highlighted in green is a potential ISBN.

There are approximately 200,000 zeros and ones in the first million digits of π, but "only" 18,273 of them appear at the beginning of a potential ISBN-10. Checking those 18,273 potentials against the WorldCat bibliographic database results in 1,168 valid ISBNs. The first one is at position 3,102: ISBN 0306803844, for the book The evolution of weapons and warfare by Trevor N. Dupuy. The last one is at position 996,919: ISBN 0415597234 for the book Exploring language assessment and testing : language in action by Anthony Green.

Here's the full dataset.