Digital Library Center Blog | UF

Chronicling work on the UF Digital Collections, SobekCM, & the Digital Humanities

Archive for the ‘standards’ Category

Announcement: DataCite Summer Meeting – Data and the Scholarly Record: the Changing Landscape

without comments

DataCite will hold its second Summer Meeting on August 24th and 25th at the historic Shattuck Plaza Hotel in Berkeley, California. The Summer Meeting will be a 1.5 day event and you can register at: http://datacite2011.eventbrite.com/ .

The Summer Meeting brings together people from research organisations, data centers, government, and information service providers to hear about the latest developments in data science, data citation, discovery, and reuse. It also provides opportunities to exchange experience and influence the next generation of data citation services.

This year’s program will include sessions on data citation, data publishing, and discussions on the new challenges that come with increased access to scientific data.

The 2010 DataCite summer meeting brought together a strong programme of speakers and participants (http://www.datacite.org/datacite_summer_meeting_2010). Highlights were published in D-Lib (http://dx.doi.org/doi:10.1045/january2011-contents).

DataCite helps researchers find, access, and reuse data. It is an international not-for-profit association founded in 2009 with members across the globe.

Written by Laurie N. Taylor

June 30th, 2011 at 9:28 pm

CFP: DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices

without comments

DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices
September, 19-23, 2011
Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany

Goals

This symposium-style workshop will bring together representatives from major longitudinal data collection efforts to share expertise and to explore the use of the DDI metadata standard as a means of managing and structuring longitudinal study documentation. Participants will work collaboratively to create best practices for documenting longitudinal data in its various forms, including panel data and repeated cross-sections.

Description of the workshop

Longitudinal survey data carry special challenges related to documenting and managing data over time, over geography, and across multiple languages. This complexity is often a barrier to building efficient systems for data access and analysis. DDI (Data Documentation Initiative) Lifecyle, a metadata standard that addresses the full life cycle of social science research data (formerly referred to as DDI 3), is designed to provide an efficient structure for the documentation of complex longitudinal data. In this workshop, participants involved in longitudinal data projects around the world will work together on issues involved in documenting longitudinal data.

Intended audience: Individuals with expertise in longitudinal social science data; knowledge of DDI is desired but not required. The intent is to have a mix of participants with substantive and technical skills. Participants should provide access to materials describing their projects, which can serve as use cases in applying DDI. The workshop is in English. This is the second Dagstuhl workshop on the topic; the first took place in October 2010. The upcoming workshop will continue the in-depth discussion begun last year, expanding into additional topics.

Expected Results

Participants will write best practice papers, to be published in the DDI Working Paper Working Paper Series. Last year’s workshop produced a series of best practice papers on longitudinal data.

Possible Topics

Documenting comparison, harmonization, and the relationship among concepts, questions, and variables over time, as well as the relationship of respondent types (person, household) are typical issues for longitudinal data. Other topics not specific to longitudinal data:
- Classifications (e.g., ISCO, ISCED)
- Data collection details
- Qualitative data, other types of data sources beyond surveys
- Quality of metadata and data
- Data management planning
- Relationship to the Open Archival Information System (OAIS)
- Extension of DDI for specific needs

These topics are often more salient for longitudinal data, making it even more critical manage these metadata in a structured form over time and countries. The current possibilities of DDI Lifecyle will be explored and areas for future extensions identified. Additionally, participants can suggest their area of interest.

Venue

The workshop will take place at the Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany. The non-profit center is a member of the Leibniz Association and is funded jointly by the German federal government and a number of state  governments. The venue provides an intense working atmosphere in a nice remote region. Several seminar rooms and cafeteria while the day, and leisure rooms like wine bar and billiard room while the evening promote intense discussion and communication. Accommodation costs at Dagstuhl including full board is 60 Euro/day/person (subsidized rate).

Sponsors

This workshop is sponsored by the DDI Alliance, GESIS – Leibniz Institute for the Social Sciences, Minnesota Population Center (MPC), and Open Data Foundation (ODaF).

Contact

The names of interested organizations and individuals should be sent to ddi-expert-workshop@icpsr.umich.edu. Please provide contact information, area of interest, and area of expertise for each individual, information regarding DDI Lifecyle implementation, and a statement of what each individual can contribute to the workshop. Direct questions to ddi-expert-workshop@icpsr.umich.edu. Twenty-one participants will be accepted.

Links

Related Web page: http://www.dagstuhl.de/11382
Best practice papers on longitudinal data: http://www.ddialliance.org/resources/publications/working/BestPractices/LongitudinalData
DDI Working Paper Working Paper Series: http://www.ddialliance.org/resources/publications/working
Further information on “How to get to Dagstuhl”: http://www.dagstuhl.de/en/about-dagstuhl/arrival/
Pictures of Dagstuhl: http://www.dagstuhl.de/en/about-dagstuhl/press/downloads/
DDI Alliance: http://www.ddialliance.org/
GESIS – Leibniz Institute for the Social Sciences: http://www.gesis.org/
Minnesota Population Center (MPC): http://www.pop.umn.edu/
Open Data Foundation (ODaF): http://www.opendatafoundation.org/

The organizers would appreciate hearing soon from interested people.

Mary Vardigan, Director DDI Alliance
Wendy Thomas, Chair DDI Technical Implementation Committee
Joachim Wackerow, Vice Chair DDI Technical Implementation Committee
Arofan Gregory, Technical Consultant
(Organizers)

GESIS – Leibniz Institute for the Social Sciences
Department: Monitoring Society and Social Change
Unit: Social Science Metadata Standards
Visiting address: B2 1, 68159 Mannheim, Germany
Postal address: P.O. Box 122155, 68072 Mannheim, Germany
Phone: +49 (0)621 1246 262
Fax: +49 (0)621 1246 100
E-mail: joachim.wackerow@gesis.org
www.gesis.org/en/institute/

Written by Laurie N. Taylor

June 25th, 2011 at 2:03 pm

arXIV Sustainability Initiative Update

without comments

At the end of April, arXiv posted an update on their sustainability initiative. This and all arXiv sustainability work should be mandatory reading for all who are working on large, collaborative digital initiatives. Recent updates include the 2011 projected budget and the full support documentation are also available.

Written by Laurie N. Taylor

May 15th, 2011 at 8:45 pm

Data Documentation Initiative 3 (DDI 3) Data Extraction Tools from Colectica Awarded an NIH Grant

without comments

The Data Documentation Initiative 3 (DDI 3) standard is a simply fabulous and full standard for metadata (data about data) as well as for the data contents, making it a full payload standard.

DDI 3 is such an exciting standard because it allows for the possibility of true and full computational support for data harmonization and for really working with longitudinal data. It’s the type of data standard I’d been waiting for because it gets it. Data standards need to be able to support documenting, containing, expressing, and computing (analysis, harmonization, limitations on disclosure, everything we now do with less than ideal systems and methods). DDI 3 does this and that’s why groups like ICPSR are already using it.  DDI 3 is already on its way to becoming ubiquitous, but more tools for it are needed.

News of others using and supporting DDI 3 is always good. Thus, it’s wonderful news that Colectica has been awarded an NIH Grant for DDI 3-based data extraction tools. From the Colectica website:

The award is a Phase I grant that provides supplemental support of Algenta’s research on an “Open Standards-Based Data Extraction Web Tool for Complex Longitudinal Datasets”. This Phase I feasibility study aims to analyze to data preparation and metadata creation workflow needed to prepare a study for online data extraction, to validate the use of the Data Documentation Initiative’s DDI 3 standard for the basis of such a tool, and to create prototype web-based data extraction software. While the focus is on longitudinal surveys, the proposed system would also handle cross-sectional, time-series, and non-repeated studies. The aim is to improve research methodologies through a simplification of the process used for discovering, retrieving, and analyzing data relevant to a researcher’s investigation and to improve data citations, aiding in reproducible research. The research includes consultation with researchers from ICPSR at the University of Michigan-Ann Arbor and the Mid-Life in the United States Longitudinal Study at the University of Wisconsin-Madison.

Written by Laurie N. Taylor

April 5th, 2011 at 5:18 pm

News: Prototype interface released for searching archival authority records

without comments

Awesome news from CDL, so reposting below. The original is here.

Prototype interface released for searching archival authority records

CDL’s Digital Special Collections program is pleased to announce the public release of a draft prototype historical access system for the Social Networks and Archival Context Project (SNAC).

SNAC is a two-year research project, funded by the National Endowment for the Humanities, that is creating a set of authority records by extracting information from archival finding aids and enhancing it with other sources.  The project uses the new standard Encoded Archival Context—Corporate bodies, Persons, and Families (EAC-CPF).  Data for the research is being provided by the Online Archive of California, among several other sources.  Learn more about SNAC.

CDL’s role in the SNAC project is to build a prototype interface that links the authority records in a “historical social network.”  Such a system has the potential to significantly expand access to a range of humanities resources, as well as our knowledge of the connections between people, families, and organizations over time.

The user prototype is being developed using an iterative approach.  This first release of the system provides the most basic functionality required for researchers to imagine how they might interact with archival authority records.  Development of further iterations of the prototype will continue through Spring 2012.

Tell us what you think!

We welcome your suggestions on both the design of the prototype interface and the processing of the data.  What features do you think would be most useful for researchers?

Direct access to the prototype system, a description of project work to date, and a link to the feedback forum can be found at http://socialarchive.iath.virginia.edu/prototype.html.

Written by Laurie N. Taylor

December 21st, 2010 at 2:18 am

Announcement: MADS/RDF for review

without comments

A MADS/RDF ontology developed at the Library of Congress is available for a public review period until Jan. 14, 2011.  The MADS/RDF (Metadata Authority Description Schema in RDF) vocabulary is a data model for authority and vocabulary data used within the library and information science (LIS) community, which is inclusive of museums, archives, and other cultural institutions. It is presented as an OWL ontology.

Documentation and the ontology are available at: http://www.loc.gov/standards/mads/rdf/

Based on the MADS/XML schema, MADS/RDF provides a means to record data from the Machine Readable Cataloging (MARC) Authorities format in RDF for use in semantic applications and Linked Data projects. MADS/RDF is a knowledge organization system designed for use with controlled values for names (personal, corporate, geographic, etc.), thesauri, taxonomies, subject heading systems, and other controlled value lists. It is closely related to SKOS, the Simple Knowledge Organization System and a widely supported and adopted RDF vocabulary. Unlike SKOS, however, which is very broad in its application, MADS/RDF is designed specifically to support authority data as used by and needed in the LIS community and its technology systems. Given the close relationship between the aim of MADS/RDF and the aim of SKOS, the MADS ontology has been fully mapped to SKOS.

Community feedback is encouraged and welcomed. The MODS listserv – MADS/XML is maintained as part of the community work on MODS (Metadata Object Description Schema) – is the preferred forum for feedback: http://listserv.loc.gov/listarch/mods.html (send mail to: mods@listserv.loc.gov).  Kevin Ford, the primary architect of the model, will be responding on that forum in order to have an open discussion.

Written by Laurie N. Taylor

November 21st, 2010 at 3:52 pm

Finding Guides in SobekCM

without comments

SobekCM – the system powering the UF Digital Collections, the Digital Library of the Caribbean, and many other rich collections – will soon have advanced support for finding guides in EAD. This has been in process as a complete solution for the full workflow and it’s nearing completion. Check out the EADs we’re testing with here.

The benefits from fully supporting EAD within the same digital library system supporting digital objects is enormous:

  • Finding guides can be displayed, searched, and used within the same system as the digital objects they reference (increases usability from consistent navigation, ease of searching a single system, additional benefits from any applicable system enhancements)
  • Finding guides benefit from existing automation. For SobekCM this includes the automatic creation of MARC records from the EADs and the automatic record feed of MARC records into the library catalog.

SobekCM’s support for EADs has been enabled through programming by Mark Sullivan. The programming created an EAD reader for importing the data into the standard SobekCM digital resource object and then reading the description and container list and importing as much information as possible into the digital resource object. Sections in the EAD are autodetected to create the table of contents.

With the support for importing, SobekCM supports the EADs as digital resources that can be searched for within the digital collections. When a user selects any digital resource to view in SobekCM, the METS file is read.  This provides some basic information like wordmarks and the type of digital resource it is. If the digital resource is a finding guide (defined by being Archival/collection and having an EAD listed as one of the downloads ) the EAD is then read into the SobekCM digital resource object.  While the container list will be read identically, the top portion of the EAD is pulled into the display and stored as one large block of text/xml with the XSL transform applied to display the description.

The auto-created table of contents is a bit different from any of the existing table of contents because it floats to the left constantly (scrolling down, it floats down to stay onscreen at all times), and this is needed for reading longer HTML-style documents that have a lot of scrolling, as opposed to our normal page-turner model.

When EAD results show after a search, the search terms are highlighted. This is still being refined, but it’s active in test already and will soon be fully active. After that, the final steps are for handling the container list.

To see it in test (which will only be active for awhile, since this will soon be live):

Written by Laurie N. Taylor

October 31st, 2010 at 10:16 pm

The Federal Agencies Digitization Guidelines Initiative Announces the Release of BWF MetaEdit

without comments

BWF MetaEdit is a free, open source tool that supports embedding, validating, and exporting of metadata in Broadcast WAVE Format (BWF) files. BWF MetaEdit is available for download at SourceForge: http://sourceforge.net/projects/bwfmetaedit/. BWF MetaEdit was developed by the Federal Agencies Digitization Guidelines Initiative to support its guideline for embedded metadata in the bext and INFO chunks (http://www.digitizationguidelines.gov/audio-visual/documents/wave_metadata.html).  The application was developed by AudioVisual Preservation Solutions (http://www.avpreserve.com/).

Users of BWF MetaEdit can:

  • Import, edit, embed, and export specified metadata elements in WAVE audio files
  • Export technical metadata from Format Chunks and minimal metadata from bext and INFO chunks as comma-separated values and/or XML, across a set of files or from individual files
  • Evaluate, verify and embed MD5 checksums, as applied to the WAVE file’s data chunk (audio bitstream only)
  • Enforce the guideline (above) developed by the Federal Agencies Audio-Visual Working Group, as well as specifications from the European Broadcasting Union (EBU), Microsoft, and IBM
  • Generate reports that show errors in the construction of WAVE files
  • Choose from command line and GUI, for Windows/PC, Macintosh OS, Linux. See the list of options at SourceForge: http://sourceforge.net/projects/bwfmetaedit/files/

Written by Laurie N. Taylor

September 29th, 2010 at 3:18 pm

Posted in standards

The ‘Machine Readable’ part of the MARC acronym is a lie

without comments

The most recent Code4Lib Journal issue has an excellent article that should be mandatory reading for anyone working or with a library.

The article is “Interpreting MARC: Where’s the Bibliographic Data?” by Jason Thomale. In it, he explains in extremely clear terms exactly what MARC is not. He begins by explaining that MARC pre-dated relational databases. That means everything we think about for computers, digital processing, data structures, and logic doesn’t apply for MARC.

The title of this blog post is from one of the article’s notes:

There is also the statement about working with MARC data purportedly made by Google engineer Leonid Taycher that “the first thing he had to learn was that the ‘Machine Readable’ part of the MARC acronym was a lie” (from http://go-to-hellman.blogspot.com/2010/01/google-exposes-book-metadata-privates.html).

This quote illustrates the core thrust and value of the entire article which is: MARC is presented as being functional for machines/computers and it is not. I didn’t understand what MARC was for over a year after I was working in a digital library center and working with MARC. I couldn’t fathom that any method could ever and certainly not still be in operation where data was treated or handled in the way that MARC operates.

I still don’t understand how library catalogs actually work. I understand computers and I understand punch cards, but MARC isn’t either. I understand how it works at an individual record level, but I don’t see how it could work at a system level. Thomale’s article explains his process of unlearning basic world assumptions in order to deal with MARC. The comments on the article show that there are many people who have undergone the same process to learn “that the ‘Machine Readable’ part of the MARC acronym is a lie.”

Written by Laurie N. Taylor

September 25th, 2010 at 12:38 pm

Posted in MARC,standards

Encoded Archival Context Project – Social Networks and Archival Context Project (SNAC)

without comments

From the SNAC website:

Leveraging the new standard Encoded Archival Context-Corporate Bodies, Persons, and Families (EAC-CPF), the SNAC Project will use digital technology to “unlock” descriptions of people from finding aids and link them together in exciting new ways. We will:

  • Create efficient open-source tools that allow archivists to separate the process of describing people from that of records.
  • Create a prototype integrated historical resource and access system that will link descriptions of people to one another and to descriptions of resources in archives, libraries and museums; online biographical and historical databases; and other diverse resources.

Written by Laurie N. Taylor

August 16th, 2010 at 1:58 pm