Digital Library Center Blog | UF

Chronicling work on the UF Digital Collections, SobekCM, & the Digital Humanities

Archive for the ‘MARC’ Category

The ‘Machine Readable’ part of the MARC acronym is a lie

without comments

The most recent Code4Lib Journal issue has an excellent article that should be mandatory reading for anyone working or with a library.

The article is “Interpreting MARC: Where’s the Bibliographic Data?” by Jason Thomale. In it, he explains in extremely clear terms exactly what MARC is not. He begins by explaining that MARC pre-dated relational databases. That means everything we think about for computers, digital processing, data structures, and logic doesn’t apply for MARC.

The title of this blog post is from one of the article’s notes:

There is also the statement about working with MARC data purportedly made by Google engineer Leonid Taycher that “the first thing he had to learn was that the ‘Machine Readable’ part of the MARC acronym was a lie” (from http://go-to-hellman.blogspot.com/2010/01/google-exposes-book-metadata-privates.html).

This quote illustrates the core thrust and value of the entire article which is: MARC is presented as being functional for machines/computers and it is not. I didn’t understand what MARC was for over a year after I was working in a digital library center and working with MARC. I couldn’t fathom that any method could ever and certainly not still be in operation where data was treated or handled in the way that MARC operates.

I still don’t understand how library catalogs actually work. I understand computers and I understand punch cards, but MARC isn’t either. I understand how it works at an individual record level, but I don’t see how it could work at a system level. Thomale’s article explains his process of unlearning basic world assumptions in order to deal with MARC. The comments on the article show that there are many people who have undergone the same process to learn “that the ‘Machine Readable’ part of the MARC acronym is a lie.”

Written by Laurie N. Taylor

September 25th, 2010 at 12:38 pm

Posted in MARC,standards

Library Catalog Records

without comments

I’d always assumed that catalog records were based on MARC, and that MARC was a guideline or standard like METS, MODS, or TEI, or even HTML or XML. After all, SGML is one heck of a powerful grandparent for modern record formats, right? And for printing, TeX, LaTeX, and BibTeX have been around for ages, so there’s no way that an archaic punch-card style technology could be in use at almost every library in the US, right? Sadly, no, I was wrong.

My assumptions on what MARC must be have kept me from helping to fix the problems that stem from what it actually is. I’m also now worried about what other dead technologies might be in widespread use that are directly related to library operations. Please note that I’m not in any way attacking the ideas that underly MARC records. We need bibliographic records, and metadata and organizational systems are essential. MARC is just a mix of the transfer protocol, data definition, data structure, data display, and actual data content. It’s a thing optimized to print card catalog cards in a card catalog world.

Cards in card catalogs have defined data elements (author, title subject, call number, etc) and they have an organizational method and so extrapolating that to defined fields should be easy. Except, defined fields in MARC are always within the record. The minimum part of a MARC record is a single full MARC record read line by line. You can’t skip ahead because the field leaders note where the field begins and how long it will be. I saw the weird number sequences and leaders for elements, and I assumed that those were either shorthand or they were habit-based preferences that people chose to use. After all, the catalog record has defined data components for bibliographic and authority records (named people, corporations, other entities), so it had to be a matter of preference to displaying an author like this:

ME:Pers Name     100 1# $a Brenner, Richard J.,
                        $d 1941-

The $a for author had to be a shorthand, and so must be the 100 1#, because they had to be. It could not be the case that this shorthand was actually needed and that almost every library with an electronic catalog was still wedded to the technology made to optimize the printing of card catalog cards in the 1970s (or before? it’s updated to deal with unicode, at least MARC21 is, but this is punch card or telegraph style technology).

Take a look at this MARC record:

01041cam  2200265 a 450000100200000000300040002000
50017000240080041000410100024000820200025001060200
04400131040001800175050002400193082001800217100003
20023524500870026724600360035425000120039026000370
04023000029004395000042004685200220005106500033007
30650001200763^###89048230#/AC/r91^DLC^19911106082
810.9^891101s1990####maua###j######000#0#eng##^##$
a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a
0316107506 (pbk.) :$c$5.95 ($6.95 Can.)^##$aDLC$cD
LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^
10$aBrenner, Richard J.,$d1941-^10$aMake the team.
$pSoccer :$ba heads up guide to super soccer! /$cR
ichard J. Brenner.^30$aHeads up guide to super soc
cer.^##$a1st ed.^##$aBoston :$bLittle, Brown,$cc19
90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill
ustrated for kids book."^##$aInstructions for impr
oving soccer skills. Discusses dribbling, heading,
 playmaking, defense, conditioning, mental attitud
e, how to handle problems with coaches, parents, a
nd other players, and the history of soccer.^#0$aS
occer$vJuvenile literature.^#1$aSoccer.^\

(source)

Sure, that can be formatted nicely, but imagine a modern system having to read all of this to be able to allow users to search by author, title, keyword, and have facets for years, material type, etc. A program then reads all of the records in, indexing all of them and then running purely off of the index, except when forced to look at the MARC records because people are still doing something to/with them, or it somehow queries the records-as-blobs. I’m not even sure how older catalogs actually worked because the format of these is impossible for my concepts of computerized search.

I don’t know how common it must be for people familiar with normal standards to unquestioningly assume that MARC must be a normal standard, but I had trouble even understanding that something as broken as the MARC record could still exist. Now, I understand why people would tell me “that’s not possible” or “that’s not the way the system works” when I’d ask questions about what should be simple tasks. I’d often reply “but it has to be because that’s the way computers work” and I’d keep asking, thinking MARC must be an elaborate way to define data, with ties to legacy systems that made it confusing. That’s true-ish, but the real problem is that MARC is an archaic legacy form, so much so that I couldn’t comprehend when people tried to explain it to me.

When explaining MARC records to those familiar with normal technology standards, Karen Coyle notes hearing “virtual sighs” as  the programmers who “were not familiar with the standard library metadata record, and the standards were not compatible with the general suite of tools that the programmers commonly work with, such as HTML, CSS, and a host of XML-based tools” (source). In my mind, a metadata standard – especially one for library materials, whether books or audio or maps or whatnot – cannot be incompatible with XML.

It looks like the phenomenon of not knowing how to define MARC is fairly common for folks who work regularly with current computing. Hopefully we’ll all learn just enough about MARC to replace it quickly with RDA (or even something that seems like MARC to those who like it, but something that functions as a real data model). Once the archaic MARC-technology-underpinnings – whether or not other aspects of it remain – can be replaced, library data will be so much easier to access, use, and connect for everyone from catalogers to patrons. I feel awful that I didn’t understand how broken MARC was as it tried to act as protocol/structure/display/format/record, and I’m only now learning what MARC is, so I don’t yet know how many problems it’s created or how many innovations or aids it’s prevented.

Written by Laurie N. Taylor

September 6th, 2009 at 9:28 pm

Posted in catalog,Library,MARC

Spirit Authors

without comments

On the Open Library General Discussion List, Edward Betts recently posted that, while tidying author records in Open Library, he found 248 authors-as-spirits. Not unknown ghosts or muses, but the spirit of a particular person listed as a spirit. He included the examples below in the post and the full list on his website.

$a Abraham $c (Spirit)
$a Churchill, Winston $c Sir $d 1874-1965 $c (Spirit)
$a Doyle, Arthur Conan $c Sir $d 1859-1930 $c (Spirit)
$a Jesus Christ $c (Spirit)
$a Shakespeare, William $d 1564-1616 $c (Spirit)

For all those fascinated by dead (and undead) media, this is wonderfully rich. Not only do the dead speak through media (and issues of telepresence continue as we read, hear, and watch, from telegraph to home video–there’s a real horror movies and games are so populated by letters, diaries, books, histories, and the like) but their voices as the dead and even cataloged as such. When I learn bits of wonder like the spirit authors, I wish I had more experience in libraries and archives because there’s so much more to know and I’m so fascinated by the change of technologies, perceptions of the technologies, and the seemingly mundane ways in which people have dealt with the oddities.

I’m not interested in the large scale glam or beliefs in the paranormal or excitement over media revolutions because, while all of those are interesting, I’m so curious about how exceptional beliefs (as the outside-the-curve use cases, unintended uses and consequences, cultural beliefs, sheer oddities) are handled within day to day dealings and how those exceptions inform and shape the mundane and familiar. Noting “(Spirit)” inside a MARC record for ghostly authors–especially when including their birth and death dates–is wonderful. I’m sure there were meetings to discuss whether or not to include the first sighting of the spirit in a spirit birth date, but that was likely decided to be too cumbersome and too subject to interpretation. I’d absolutely love to read any archival files on how the inclusion of “(Spirit)” came about and what all was left out of the addition and why, and now I know to be on the lookout for it.

Written by Laurie N. Taylor

January 17th, 2009 at 7:34 pm

Posted in archives,MARC