Archive for the ‘usability’ Category
OCR Text Correction is a Good Project for Crowdsourcing
Correcting text created by OCR (optical character recognition) is a great project for crowdsourcing because it can be isolated and scaled. Essentially, it can be made into a small task and the overall need can benefit from loads of small contributions, made through the small task interface. A great deal of digital library work can’t be sliced/scaled/isolated like this, and with so much work to do, it’s always nice when something can involve others for the benefit of everyone.
The National Library of Finland recently came out with new games-as-tools for correcting OCR text, and their website explains:
We need your help. Most of the information in the library’s newspaper archives has already been copied into computer databases using computerized text recognition. The problem is that computers fail to recognize all the words. Especially when the quality of the source material is poor, the results need to be fixed by hand. This requires a lot of manual work.
At the moment, when you play games in Digitalkoot you help correct words. Later this year you will also be able to help structure the documents and tag images.
I’m interested if anyone has reports on the success of this method. This is a higher level of investment than many given its contextualization of the work within a game-as-tool interface, and I don’t know if this would lead to greater or less success. The National Library of Australia has been phenomenally successful by allowing people to simply contribute the corrected text through an easy, no-frills interface as seen here.
I’m partial to the National Library of Australia’s method because it requires less initial resource investment, it’s proven to continue to return on design investment for the long-term, and it appeals to such a large and wide demographic that I would think it would be the most successful model. Of course, I’m most partial to whatever works so I’d love to know if folks have reports on the success rates of Finland’s games or other methods for crowdsourcing OCR correction.
Simplified URLs in Beta
The URLs for the UF Digital Collections are now shorter and easier. The new programming, by Mark Sullivan, to support simplified URLs:
- Shortens the base URL for UFDC
- Prevents server names from showing (an ongoing issue during the major server move)
- Removes the need for the aggregation code type indicator (the a, c , g, or s to indicate the aggregation between the “?” and “=”)
- Removes the requirement for the aggregation query (which is “?=” and very confusing)
- Enables the ability to assign new, persistent names for aggregation codes
With these now in place, the persistent URL displayed for the main collection pages have changed. The first four of these allows for the prior URL form: http://ufdcweb1.uflib.ufl.edu/ufdc/?a=juv
To this: http://ufdc.ufl.edu/juv
The fifth and final item in the list above greatly enhances human readability by allowing for human-style aggregation names. Longer aggregation codes were not possible until recently. The total number of characters was limited, and programming in the past year has removed that strict limitation. Of course, this was recent and so many aggregation codes were created prior and have very short names, optimized for the old system and not for people.
For instance, the Florida Aerials collection was “flap” for Florida Aerials Project. That’s not readable or memorable and “Aerials” is a much better name because it’s a word, and a single word so it balances readability and concision. Now, FLAP has changed to Aerials and has a much simpler URL.
Prior: http://ufdcweb1.uflib.ufl.edu/ufdc/?a=flap
Now: http://ufdc.ufl.edu/aerials
All of the new, simplified URLs persist during usage so that the complicated forms won’t be shown. And, old URLs continue to work and they forward users to the new, simplified version of the URL.
We’re always in perpetual beta. I did not this as “in beta” because we’re hoping to simplify the root a bit more in the near future, so this is more beta than the normal perpetual-beta alone.
**NOTE: On 11/13/2010, I updated only the links in the above post. For the new links, I removed the “.uflib” which is no longer present in the links. Normally I don’t update posts like this (I write a new post or add a comment), but these links will be automatically indexed and so I updated them to make sure all patrons had access to the simplest links. I won’t be updating all prior links in posts, but I did in this post and that was the only change.
Thumbnails for All Newspaper/Journal Covers
The UFDC Programmer, Mark Sullivan, put a browse by thumbnail cover in place not too recently, but so many other wonderful items have been loading and so many other improvements have been made that I’m just now catching up to mention the cover browse.
The cover browse allows anyone to see the thumbnail images for the covers or first pages of all issues/volumes in an item. This means users can browse all of the first pages of a newspaper or all of the covers of a journal.
The cover browse is an excellent example of elegant simplicity. It uses existing information and functionality–the thumbnails for all pages and the browse by thumbnail for an individual item like a single newspaper issue–and then it adds functionality that’s incredibly helpful for users. For instance, if a researcher wanted to quickly get a sense of a newspaper and it’s evolution over time, what better way than to flip through all of the first pages? If a researcher had been reading wanted a journal article and then couldn’t remember the volume or issue number and could only remember the cover, the researcher could choose to use the full text search to find the exact article wanted or the researcher could choose to browse through the issues, creating an opportunity for serendipity.
We’d planning to add the browse by cover as a part of a larger move to browse by shelf (a visual display to show the spines of books, or the typical arrangement of books by call number or size), but then we saw that Chronicling America had already added the browse by cover images. It was better than we’d imagined, and we bumped up the timeline and added it immediately. Thanks to Mark’s savvy programming and the elegant simplicity of this addition, he was able to add the browse by cover images in under a single workday. We also owe thanks to Chronicling America for showing us how useful the cover images browse is, and giving us the nudge we needed to get it added.
I encourage everyone to browse the local Florida newspapers in the Florida Digital Newspaper Library by cover image to see how useful it really is!
