I don't have easy access to the first fifty million digits of π, but I did manage to find the first million digits online without too much difficulty.
An ISBN-10 is a ten character long string that uniquely identifies a book. An example is "0-13-152414-3". The dashes are optional and exist mostly to make it easier for humans, just like the dashes in a phone number. The first character of an ISBN-10 indicate the language in which the book is published: 0 and 1 are for English, 2 is for French, and so on. The last character of the ISBN is a "check digit", which is supposed to help systems figure out if the ISBN is correct or not. It will catch many common types of errors, like swapping two characters in the ISBN: "0-13-125414-3" is invalid.
Here are the first one hundred digits of π:
3.141592653589793238462643383279502884197169399375To search for "potential (English) ISBN-10s", all one needs to do is search for every 0 or 1 in the first 999,990 digits of π (there is a "1" three digits from the end, but then there aren't enough digits left over to find a full ISBN, so we can stop early) and check to see if the ten digit sequence of characters starting with that 0 or 1 has a valid check digit at the end. The sequence "1415926535", highlighted in red, fails the test, because "5" is not the correct check digit; but the sequence "0781640628" highlighted in green is a potential ISBN.
105820974944592307816406286208998628034825342117067
There are approximately 200,000 zeros and ones in the first million digits of π, but "only" 18,273 of them appear at the beginning of a potential ISBN-10. Checking those 18,273 potentials against the WorldCat bibliographic database results in 1,168 valid ISBNs. The first one is at position 3,102: ISBN 0306803844, for the book The evolution of weapons and warfare by Trevor N. Dupuy. The last one is at position 996,919: ISBN 0415597234 for the book Exploring language assessment and testing : language in action by Anthony Green.
Here's the full dataset.
4 comments:
Many thanks for this David, I've updated my original post with your further research - now, to start analysing your dataset for trends... :)
The next, ultra-geeky step, is to parse the ISBNs to identify the publishers involved....
Hello David,
thanks for that, fun. Any reason to limit your tool to english ISBN apart from performance questions ?
Since it takes less than a second to check all the zeros and ones to see if they're valid, it would take less than ten seconds to check all the other languages' potential ISBNs. The problem is that it would take a long time to run those ISBNs against WorldCat to see if they were actually used, and WorldCat's data on non-English books is probably not as good as its data on English content.
Post a Comment