Saturday, December 09, 2006

Handbooks and Manuals

It's nice to see that I'm not the only person that compares recipes in different editions of the Joy of Cooking. The changes that Karen mentions for the one recipe she's describing are minor, but they are indicative of how "Handbooks and manuals" change from one edition to the next, and indicate why one should need throw out a handbook, just because a new edition has been printed. The "Joy" is an interesting example in a couple of ways:

The Joy of Cooking is a snapshot of existing practice in American cookery. Because the "Joy" is recording trends, rather than breaking ground, it's an indicator of what household cooks were doing around that time. For example, the 1975 edition uses a lot of canned and packaged ingredients, and it hints at the beginning of the "ethnic" cooking trend (although the recipe for "refried beans" seems to have been created by somebody unfamiliar with Mexico). By 1997, the packaged ingredients are mostly eliminated (aside from "traditional" canned goods like stewed tomatoes) in favour of fresh ingredients.

The Joy of Cooking documents social trends. Aside from the "ethnic foods" I've already mentioned, it's possible to observe how broader social trends and current events affect the "Joy". In 1964, when writing About Water, there are instructions for purifying your water and an admonition to not use water that has been exposed to radioactive fallout. This section has disappeared from the 1975 edition, but the section on mixing cocktails survived from '64 to '75, only to be cut from the 1997 edition. Interestingly enough, the 1975 edition introduces a description of how to tap your own maple trees, which disappeared in the next edition. Unless this is a quirky nod to the hippie back-to-the-earth movement of the late '60s and early '70s, it's an odd thing to include.

For more information about the social history of 20th C. American cooking, see


Shapiro, Laura. 2004. Something from the oven: Reinventing dinner in 1950's America. New York, N.Y.: Viking.


which does a wonderful job of explaining the hideous food my mother cooked for me when I was a child.

Just to show that the usefulness of handbooks as historical source documents is not limited to the domestic world,, I will never throw out my other favourite handbook, the 13th ed. of The Chicago manual of style, since it's the edition of that illustrious orange bible that has an entire chapter devoted to describing the printing process, both letterpress and the newer lithographic printing process.

Thursday, November 16, 2006

Software Tools

I'm a systems programmer, and a tool-maker, and I think that every library would benefit from having a software tool maker around. Being a tool maker means that I write small, relatively simple programs that only do one thing. I've never written an editor, but I've written lots of data conversion and simple analysis programs: programs that read one or two files and produce one or two files of output, and I always rely on having a command line close by to run my programs.

When I became a librarian, my need to write tools decreased, but it didn't disappear. Lists are the bane of collection librarians, and we regularly receive spreadsheets full of book bibliographic data, or generate lists of journal titles from JCR, which we then have to use as checklists to find out how much of it we own, and how many of the titles that the accreditation board expects us to own are absent. When the list of titles is brief, this process isn't too painful, but it primarily involves cutting ISBNs or titles out of a spreadsheet and pasting them into the web OPAC, then making a note in a column of the spreadsheet. Unfortunately, the lists of titles are rarely brief. For most categories of journals in JCR, there are less than one hundred titles, which is most of a day's work. I simplified this for myself by writing code for the emacs editor that would automatically query the ISBN in the catalogue for me, eliminating some of the cutting and pasting, and speeding the process up somewhat. Unfortunately, such primitive tools are insufficient when faced with a list of six hundred e-books, and a need to determine the titles that we already own, especially when the ISBN in the list may be for a different format that the one we own.

So I wrote a program. The challenge is figuring out how to get information out of the catalogue: the web OPAC is useless for programs, since they can't easily read nicely formatted HTML tables, and the system doesn't provide a simple web service interface like SRU for querying the database. Fortunately, my catalogue has a (bad) Z39.50 server, and it's possible to find Z39.50 client modules for most scripting languages nowadays, so I just used Z39.50 to talk to my catalogue. Of course, this will only tell me if I own exactly the same edition of a book as the one that the publisher told me about, and I know that's not true, since we commonly buy the paper edition, rather than the hardcover, and we also already own electronic versions of some books. This is where the whole "Web 2.0" thing takes over. OCLC is providing a cross-ISBN server, xISBN, that is a simple web service: it takes an ISBN as input, and it transmits back an XML document that is a list of "related" ISBNs: the paper, cloth, electronic version, and whatever else they thing might be associated with it.

Adding xISBN into the mix means that if we don't own the exact ISBN given by the spreadsheet, then I ship it off to OCLC and check the ISBNs in the return list to see if we have one of the related titles. In a perfect world, I'd record this information in a new column in the spreadsheet, indicating whether we owned the title or a related title, and providing a link to the web OPAC so that the librarian could click from the spreadsheet into the OPAC to check circulation statistics and other useful staff information. But reading and writing Excel files is non-trivial, and storing an URL in a CSV means you end up with an URL displayed in Excel, rather than a friendly link, so I just write out an HTML file that is a list of the titles we own, as links in the OPAC, as desired. After having spent five or six hour programming (aka "having fun"), it took a mere three minutes to process the list of six hundred computer science titles and identify the one hundred thirty titles that we own. But now I've got the tool built, so when this comes up again, or when I need to check journal holdings, it'll take no time at all.

Web 2.0, and by extension Library 2.0, is about providing modular services so that users can build what they want for whatever reason they have. Mostly on the web 2.0, this is for social or "fun" purposes, but the same philosophy also improves work productivity. Peter Murray spoke at the recent Future of the ILS symposium that the University of Windsor sponsored, and he talked about the importance of componentized business processes for users. But building the right components for our business processes also makes our business more flexible and easier to mash-up. This is a big part of what the library vendors are missing: they think they know how we should use our data, when even we don't know how we want to use it. But that, as they say, is a story for another day.

Thursday, October 19, 2006

Conference Going

There's an ad playing on TV for some sort of nacho chip (Doritos, maybe): a group of travelers are stuck in the airport waiting for a delayed flight and describing the events that they're not sorry to be late for (MAN: I'm going to see my wife... and her lawyer).

The problem with the ad is that one of the sad crew is expressing his mock disappointment about missing his "podiatrists' conference". I've never met a professional, in any field, that was anything but genuinely excited at the idea of spending time in a huge meeting hall listening to other people talk about the biggest hot thing in their field, whether that field was computer programming, teaching, running a MacDonalds franchise, or librarianship.

I just got back from Access, and I'm totally excited about chatbots, diacriticals, and the 2-click web. The biggest challenge about getting back to the office after a fantastic conference is trying to make the conference buzz last long enough that you get through the accumulated email before cynicism sets in.

So, for the past week, I've been talking about what we need to do to make MPOW's website better and reading about information architecture and thinking about usability testing, focus groups, and making something for the users, and wondering how much I'll be able to pull off.

Wish me luck.

Tuesday, September 12, 2006

Gender differences in IT

I've been reading a lot of the postings across various blogs recently about the challenges that women entering IT face, even in librarianship, which is otherwise dominated by women. I'm not going to bother linking to all of the various posts that I've read, or commented on; they're not hard to find. I'm going to tell a story (which I do a lot of).

I'm not sure how many people cited Margolis and Fisher's Unlocking the clubhouse: Women in computing, but enough did that I checked it out from MPOW. One of the interesting things that they report as a common thread through all the women undergraduates that they spoke to was the fact that even women that were doing very well in computer science (to the point of being on the dean's honor list in some cases) didn't feel that they were doing as well as some of their male classmates. I have my own anecdote to add to the weight of their evidence.

When I was in high school, one of my good female friends always felt very insecure about her mathematical skills. She took, and passed, calculus, but always felt that math and she didn't get along that well. We lost touch when we went to different universities, but reconnected after university when we were both living in Toronto with our respective partners. Over dinner one night, she told us all that she had been going through some boxes that she'd finally moved out of her mother's house, and had come across all her high school report cards. She was amazed to discover that she had consistently earned As in all of the mathematics classes that she took. She could only remember her insecurity, and basically "ignored" or didn't recognize the objective evidence of the grades that she was getting. She knew that she was bad at math, and only the distance of time could give her the ability to read the grade and acknowledge that she must have been good at it.

And now I can see how her behaviour was not unique to her, but was part of a pattern of behaviour that women seem to more prone to than men (at least according to the information gathered by Margolis and Fisher). Margolis and Fisher declare that women switching majors out of computer science (or technology in general) to other fields cannot be simply treated as "their choice", but is an institutional problem that must be addressed by the institution. My friend who is a faculty member in Engineering (and the "artist in residence" for the Civil Engineering department) is actively involved in the Go Eng Girl program in the province, to encourage high school girls to continue in technology, so I like to think that my school is working at addressing this issue, and I'll be telling her to read Margolis and Fisher, as soon as I'm finished.

(Unfortunately, I also remember the "I want to be an engineer, just like my mom!" recruiting posters from when I was an undergraduate twenty years ago, and the engineering society that paraded "Lady Godiva" through campus at the same time.)

Monday, July 24, 2006

ResourceShelf: Maps: Weird and Wild in the Parks Map

ResourceShelf : Maps: Weird and Wild in the Parks Map brings out the latent nationalist in me. The quotation they have selected from the "National" Geographic website:
Between the U.S. and Canada, there are more than 148 million acres in the National Park System.

There is no "National Park System" that spans the United States and Canada. There is a National Park system in the United States [of America] and a National Park system in Canada

Friday, May 26, 2006

On Academic Silos and a Failure to Communicate

In today's email alert from Web of Science, I found the following citation:
Vidakovic, Jovana, and Milos Rackovic. 2006. Generating content and display of library catalogue cards using XML technology. Software Practice and Experience 36(5): 513-524.

I haven't read the article, but I did browse the reference list. It cites Kevin Clarke's Medlane/XMLMARC Update,and the Library of Congress's MARC XML site. But it doesn't cite my two minor contributions, nor anything by Roy Tennant, Dick Miller, or even McDonough's paper on SGML and USMARC, which was published in 1998[1]! Now, I can understand that somebody working in Serbia might not have ready access to the literature, but I would expect the referees for a journal of the stature of SP&E to have some familiarity with the area.

This has come up before, especially in Computer Science: if it's not published in the journals they read, and if they can't find it via Google or CiteSeer, then it hasn't been published. The world is becoming more interdisciplinary every day, and researchers that don't search using the traditional tools for locating literature (or, in this case, don't even seem to talk to researchers in related fields), are going to miss previous, relevant, work.

[1] McDonough, Jerome P. 1998. SGML and the USMARC standard: Applying markup to bibliographic data. Technical Services Quarterly 15(3): 21-33.

Thursday, April 20, 2006

The Transparent Library

It's not hard to justify yourself to your funding sources when every single researcher needs to actually walk (or send a grad student) into the building to get the information that they need. But that's much less true today that it was five years ago, especially in the STM arena. Three years ago, before we had our OpenURL resolver, and before Google Scholar, the library was still visible, since anybody searching an online bibliographic database had to use the library's catalogue (or A-Z list of journals) to find out if the article she wanted was available, and if it was print or electronic. Today, the library doesn't exist, except as a little pop-up window that gets in the way between the database and the fulltext.

Because we're making ourselves invisible, it's that much more important to brand the services we provide. Most databases give institutions the option of customizing the OpenURL link in some useful way. If your library is still using a default "SFX" link, you're making two big mistakes:
  1. Washington State University found that nobody knows what "SFX" is, so the only people that found full-text by clicking on the "SFX" button were the people that click on everything. [1]
  2. People that click on a generic OpenURL, or a "full text" link provided by the database (vide PubMed), don't know that it works because the library paid lots of money for it.
As an example of this second item, the nursing librarian told me the other day that the nursing faculty love Google Scholar because "It's so easy to find articles!" They don't realize that the links they find are working, for the most part, because we've paid for access.

We've customized our SFX links to clearly indicate that it's taking you to something that is available because of the University, and we've suppressed database-provided full-text links as much as possible, in order to reduce confusion on the part of the user, or so we thought. There's been a discussion on the web4lib mailing list over the last couple of days about the inconvenience to the user of having to click on every single SFX link to find out which articles were available online, with some people arguing that we don't want to encourage students to limit themselves to just what's online. I have swung around to the idea that the earlier we can present the user with a clear indication that the article is available right now, the better.

As a result of this change of mind, I'm starting to come around to registering MPOW for Google Scholar. Historically my concerns were both practical and philosophical: We dont' want to give Google our holdings because it's a workload (even if it's minimal) and because the whole point of OpenURL was that we didn't have to configure our holdings into the databases; that was the resolver's job. But the branding opportunity of making it clear in Google that the link works because of the university can only be a good thing for the library's visibility. Now, if only the branded"OpenURL" links that google generated were even close to useful, I'd be really happy.

[1] Cummings, Joel, and Ryan Johnson (2003). "The use and usability of SFX: context-sensitive reference linking", Library Hi Tech 21(1): 70-84 [link].

Thursday, March 09, 2006

Stupid Catalogue Tricks

[Updated to fix link to 'principle of least astonishment']

Like most catalogues, our system provides a basic keyword search box, as well as an "advanced keyword" search form that allows the user to set various options and search limits (sort order and date limits are the two obvious ones). The problem with our system (I don't know if this is a general "Innovative" issue, or just limited to the way MPOW has the catalogue configured) is that it doesn't do what the user expects. Consider this scenario:
  • The user goes to the "advanced keyword search" screen, types in some search terms, changes the sort order to "relevance", and limits the results to "books"
  • After looking at the search results, the user uses the keyword search box on the results screen to adjust the search terms and reruns the search
  • The second set of search results is sorted by title (the default), and is no longer limited to just books.
This is a bit surprising to most users, and it's even been known to catch library staff out. If the user had, instead of just typing into the obvious, and very convenient, search box on the result screen, pressed the "Modify Search" button and returned to the advanced search form, then there would have been no problem. But who does that?

Library catalogues seem to regularly violate the principle of least astonishment in ways that would probably be easy to avoid. Why is that?

Sunday, February 26, 2006

Discovery2Delivery

Lorcan Dempsey's blog entry on Libraries Australia introduced the buzzword "discovery2delivery" to me (although he first talked about it last November). The "Discovery to Delivery" model is a great way to describe the perpetual mission of the library and is helping me crystallize my thinking about what I want the website at MPOW to achieve.

Let's compare the typical library web site to Chapters Canada's online presence (I'm comparing to Chapters rather than to Amazon because Chapters provides a brick interface as well as just the click interface). One might argue that the these two websites have similar goals: self-promotion and connecting users with "stuff". Interestingly enough, it's the public institution that seems to focus on "self promotion" and the commercial website that focuses on "making connections". This is most obvious when you look at these websites through the eyes of somebody that wants to find a book.

OCLC's Perceptions of Libraries report emphasizes that libraries are still primarily about books for our clients, but books are almost invisible on the Toronto Public Library's homepage. In fact, the only appearance of the word "book" on the home page is in the link "Find books & materials...." But it doesn't lead you to books: it's for a page of information about the catalogue, on which there is a link that finally leads to the catalogue. Unfortunately, the single search box on that page makes it very difficult to find a known item in the catalogue.

Not only does Chapters' home page have a search box, but the rest of the page is jammed full of information about new and hot stuff. And if I type the title of a book into the search box and press enter, without worrying too much about what I'm looking for, I'm likely to find it easily. (Try searching for Bernard Shaw's "Arms and the man", and see what you end up with.)

If we look beyond "moving product" (a motive that libraries and bookstores share), libraries' claims of being the center of community are also being undermined by the store's promotion of itself as a Meetup Venue Partner. Libraries' meeting spaces, like their books, are hidden away behind library specific processes, which are usually not accessible via the web. For example, although we have introduced a new web-based study room self-booking system at MPOW recently, and the students love it, it's hidden on a page that most of them wouldn't otherwise use: the branch's home page. Why does booking a room at the library require an instructional session with a staff member?

Commercial organizations well know that if they don't make it easy for the client, then the client will find someplace where it is easy. Unfortunately, the examples of finding books and making community connections both seem to demonstrate how libraries are displaying, via the home page, the relative importance of the library's internal processes and structures over the convenience of the client base.

The library's home page is the gateway to its services. It must be structured so that common tasks are simple to do, and the uncommon tasks are simple to find. The challenge is to ensure that commonality and simplicity are decided by the clients, and not by the librarians.

Thursday, February 23, 2006

Why do Users want Lists?

Why are users so adamant about being able to browse journal lists that library IT group, and vendors like Serial Solutions and Ex Libris, spend time creating systems that create the infamous "A-Z List"? MPOW catalogues all of our online resources, just like our print collection (except online resources don't get call numbers). Everything we have is in the catalogue and it's usually not too hard to find the journals in the catalogue (except for the usual suspects: "Cell", "Nature", "Science", etc.) but users keep asking for a list of our electronic resources. We (I) finally broke down a couple of years ago, and we now produce our own A-Z list from a catalogue review file. So the users can browse through a list of over twenty-eight thousand titles. If somebody decided to print it out, the "A" section would run to seventy pages alone.

I hypothesize that at least part of the reason for this behaviour is that the users think "searching" is for when they are looking for information about a topic, and that a list provides faster access when they are looking for a known item (which Cutter identified as one of the prime purposes of the catalogue). There might also be a sense that, at some level, "browsing" is an effective discovery method. But again, how effective a technique is it when the list is 28K long, and alphabetical?

Tuesday, February 21, 2006

Protocol Design by Committee

3M's Standard Interchange Protocol (SIP, not to be confused with the "Session Initiation Protocol" used in internet telephony) is the protocol that library self-check units use to talk to the circulation module of a library's Integrated Library System. It's a small, focused protocol that was designed by people who: knew what they were trying to accomplish, had to implement it, and had to make it work on the low-end computers that get stuffed into the self-checkout units, and over the (potentially slow and crappy) communication lines that a small library system might have.

NCIP is the NISO Circulation Interchange Protocol. This protocol was designed as a "vendor-neutral" replacement for the 3M SIP. The standards committee allowed themselves to be seduced by both new technologies and mission creep that encouraged them to attempt to design a system that could handle all circulation activities, including not just patron self-check, but also inter-library borrowing and the accounting that goes with it. What they produced was... larger than one might expect, given the size of the documents.

For example, when a self-checkout unit wants to verify that a user is valid with SIP, it sends this message:


2300020060221 211723AOLochalsh Public\ Library|AAdjfiander|AC|AD|AY1AZEA14

When a self-checkout unit is using NCIP, it sends this XML:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE NCIPMessage PUBLIC "-//NISO//NCIP DTD Version 1//EN"
     "http://www.niso.org/ncip/v1_0/imp1/dtd/ncip_v1_0.dtd">
<NCIPMessage version="http://www.niso.org/ncip/v1_0/imp1/dtd/ncip_v1_0.dtd">
    <AuthenticateUser>
        <IntiationHeader>
        <FromAgencyId>
            <UniqueAgencyId>
            <Scheme>local-name</Scheme><Value>Taylor-SC-1</Value>
        </UniqueAgencyId>
        </FromAgencyId>
        <FromAgencyAuthentication>xyzzy</FromAgencyAuthentication>
         <ToAgencyId>
            <UniqueAgencyId>
            <Scheme>local-name</Scheme><Value>UWOLS</Value>
        </UniqueAgencyId>
        </ToAgencyId>
       </IntiationHeader>
       <AuthenticationInput>
           <AuthenticationInputData>djfiander</AuthenticationInputData>
       <AuthenticationDataFormatType>
           <Scheme>http://www.iana.org/assignments/media-types</Scheme>
           <Value>text/plain</Value>
       </AuthenticationDataFormatType>
       <AuthenticationInputType>
           <Scheme>http://www.niso.org/ncip/v1_0/imp1/schemes/authenticationinputtype/authenticationinputtype.scm</Scheme>
           <Value>User ID</Value>
       </AuthenticationInputType>
       </AuthenticationInput>
    </AuthenticateUser>
</NCIPMessage>


Need I say more?

Thursday, February 16, 2006

Appropriate Linking, Like Love, is in the Air

A couple of weeks ago Dan Chudnov commented (in #code4lib) that he didn't like putting Amazon.com links on web pages when he referred to books, because he's a librarian, and it doesn't feel right to be pointing to a commercial entity when the book is probably available at the local library. Similarly, Karen Schneider just this weekend felt conflicted about her blog book reviews, for much the same reason. Then, Lorcan Dempsey noticed that the COPAC experimental library catalogue provides COinS in the full record display.

While one might take issue with Dan and Karen's (apparent) distaste for the marketplace, a deeper point is that, just like those bibliographic databases that unconditionally provide links to online fulltext that may or may not be available to a particular user (*cough* PubMed *cough*), the link that is prefered by the creator of the content may not be appropriate, for any number of reasons, for the reader of the content. Even if Dan had no qualms about providing a link to, or maybe even deriving a benefit from affiliating himself with, Amazon.com, and even if I wanted to buy the book, that doesn't mean that Amazon.com is the best place to go. In fact, I just recently spent quite a bit of money at Amazon, but it was at Amazon.ca. But providing Open WorldCat links, as Karen suggested, isn't the right thing either, because (a) my local public libraries aren't members of OCLC and, (b) I might want to buy the book.

These incidents seem to emphasize yet again the importance of providing appropriate links for users. But how can Dan or Karen provide appropriate links for me, when they don't know what libraries, bookstores, or online retailers, I might frequent? Lorcan's post points in the right direction, however: content providers (and not just databases and library catalogues, but also bloggers, and other publishers of original content) need to begin providing metadata about the bibliographic units that they are discussing, so that the reader can decide what to do (and where to link to) with the metadata.

Openly Informatics (recently purchased by OCLC, coincidentally) provides a Mozilla Firefox extension that recognizes COinS and links to the user's own OpenURL resolver. Unfortunately, I don't think that's good enough for the majority of people. MPOW's OpenURL resolver doesn't, for example, provide links to amazon (of any nationality) or to the local public library. I need a personal link resolver; something that runs on the desktop that knows those places from whence I borrow or purchase books. My personal resolver should also provide a fallback, or "smarthost", OpenURL resolver, so that journal links that I run across can be passed along to a "full-service" link resolver, like the one that the university runs, so that I can find appropriate copies of articles.

In fact, I don't think I need a full link resolution server. I just need a shopping/library assistant: a firefox extension, much like Openly's, in which I could configure the online bookstores that I frequented and the libraries that I used, and which presented an OpenURL-style menu of searches based on the the contents of embedded COinS in web pages. Now I just need to find the time to write such a beast. And convince everybody else to start providing COinS.

Wednesday, February 15, 2006

The Librarians' Playlist

I've been working on a playlist for librarians for a while, mostly from my own collection. So far I've got
  • "What do you do with a BA in English" - Avenue Q - Original Broadway cast recording
  • "My baby loves a bunch of authors" - Bargainville - Moxy Fruvous
  • "If I only had a brain" - Wizard of Oz soundtrack (for the undergrads)
  • "Lobachevsky" - Remains of Tom Lehrer - Tom Lehrer (trust me on this one)
  • "Too much information - Ghost in the machine - The Police
  • "Private investigations" - On the night (Live album) - Dire Straits
  • "Hats off to the stranger" - Sunny days again - Lighthouse
  • "The Internet is for porn" - Avenue Q - Original Broadway cast recording