Archive for the ‘standards’ Category
Announcement: DataCite Summer Meeting – Data and the Scholarly Record: the Changing Landscape
DataCite will hold its second Summer Meeting on August 24th and 25th at the historic Shattuck Plaza Hotel in Berkeley, California. The Summer Meeting will be a 1.5 day event and you can register at: http://datacite2011.eventbrite.com/ .
The Summer Meeting brings together people from research organisations, data centers, government, and information service providers to hear about the latest developments in data science, data citation, discovery, and reuse. It also provides opportunities to exchange experience and influence the next generation of data citation services.
This year’s program will include sessions on data citation, data publishing, and discussions on the new challenges that come with increased access to scientific data.
The 2010 DataCite summer meeting brought together a strong programme of speakers and participants (http://www.datacite.org/datacite_summer_meeting_2010). Highlights were published in D-Lib (http://dx.doi.org/doi:10.1045/january2011-contents).
DataCite helps researchers find, access, and reuse data. It is an international not-for-profit association founded in 2009 with members across the globe.
CFP: DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices
DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices
September, 19-23, 2011
Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany
Goals
This symposium-style workshop will bring together representatives from major longitudinal data collection efforts to share expertise and to explore the use of the DDI metadata standard as a means of managing and structuring longitudinal study documentation. Participants will work collaboratively to create best practices for documenting longitudinal data in its various forms, including panel data and repeated cross-sections.
Description of the workshop
Longitudinal survey data carry special challenges related to documenting and managing data over time, over geography, and across multiple languages. This complexity is often a barrier to building efficient systems for data access and analysis. DDI (Data Documentation Initiative) Lifecyle, a metadata standard that addresses the full life cycle of social science research data (formerly referred to as DDI 3), is designed to provide an efficient structure for the documentation of complex longitudinal data. In this workshop, participants involved in longitudinal data projects around the world will work together on issues involved in documenting longitudinal data.
Intended audience: Individuals with expertise in longitudinal social science data; knowledge of DDI is desired but not required. The intent is to have a mix of participants with substantive and technical skills. Participants should provide access to materials describing their projects, which can serve as use cases in applying DDI. The workshop is in English. This is the second Dagstuhl workshop on the topic; the first took place in October 2010. The upcoming workshop will continue the in-depth discussion begun last year, expanding into additional topics.
Expected Results
Participants will write best practice papers, to be published in the DDI Working Paper Working Paper Series. Last year’s workshop produced a series of best practice papers on longitudinal data.
Possible Topics
Documenting comparison, harmonization, and the relationship among concepts, questions, and variables over time, as well as the relationship of respondent types (person, household) are typical issues for longitudinal data. Other topics not specific to longitudinal data:
- Classifications (e.g., ISCO, ISCED)
- Data collection details
- Qualitative data, other types of data sources beyond surveys
- Quality of metadata and data
- Data management planning
- Relationship to the Open Archival Information System (OAIS)
- Extension of DDI for specific needs
These topics are often more salient for longitudinal data, making it even more critical manage these metadata in a structured form over time and countries. The current possibilities of DDI Lifecyle will be explored and areas for future extensions identified. Additionally, participants can suggest their area of interest.
Venue
The workshop will take place at the Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany. The non-profit center is a member of the Leibniz Association and is funded jointly by the German federal government and a number of state governments. The venue provides an intense working atmosphere in a nice remote region. Several seminar rooms and cafeteria while the day, and leisure rooms like wine bar and billiard room while the evening promote intense discussion and communication. Accommodation costs at Dagstuhl including full board is 60 Euro/day/person (subsidized rate).
Sponsors
This workshop is sponsored by the DDI Alliance, GESIS – Leibniz Institute for the Social Sciences, Minnesota Population Center (MPC), and Open Data Foundation (ODaF).
Contact
The names of interested organizations and individuals should be sent to ddi-expert-workshop@icpsr.umich.edu. Please provide contact information, area of interest, and area of expertise for each individual, information regarding DDI Lifecyle implementation, and a statement of what each individual can contribute to the workshop. Direct questions to ddi-expert-workshop@icpsr.umich.edu. Twenty-one participants will be accepted.
Links
Related Web page: http://www.dagstuhl.de/11382
Best practice papers on longitudinal data: http://www.ddialliance.org/resources/publications/working/BestPractices/LongitudinalData
DDI Working Paper Working Paper Series: http://www.ddialliance.org/resources/publications/working
Further information on “How to get to Dagstuhl”: http://www.dagstuhl.de/en/about-dagstuhl/arrival/
Pictures of Dagstuhl: http://www.dagstuhl.de/en/about-dagstuhl/press/downloads/
DDI Alliance: http://www.ddialliance.org/
GESIS – Leibniz Institute for the Social Sciences: http://www.gesis.org/
Minnesota Population Center (MPC): http://www.pop.umn.edu/
Open Data Foundation (ODaF): http://www.opendatafoundation.org/
The organizers would appreciate hearing soon from interested people.
Mary Vardigan, Director DDI Alliance
Wendy Thomas, Chair DDI Technical Implementation Committee
Joachim Wackerow, Vice Chair DDI Technical Implementation Committee
Arofan Gregory, Technical Consultant
(Organizers)
GESIS – Leibniz Institute for the Social Sciences
Department: Monitoring Society and Social Change
Unit: Social Science Metadata Standards
Visiting address: B2 1, 68159 Mannheim, Germany
Postal address: P.O. Box 122155, 68072 Mannheim, Germany
Phone: +49 (0)621 1246 262
Fax: +49 (0)621 1246 100
E-mail: joachim.wackerow@gesis.org
www.gesis.org/en/institute/
arXIV Sustainability Initiative Update
At the end of April, arXiv posted an update on their sustainability initiative. This and all arXiv sustainability work should be mandatory reading for all who are working on large, collaborative digital initiatives. Recent updates include the 2011 projected budget and the full support documentation are also available.
Data Documentation Initiative 3 (DDI 3) Data Extraction Tools from Colectica Awarded an NIH Grant
The Data Documentation Initiative 3 (DDI 3) standard is a simply fabulous and full standard for metadata (data about data) as well as for the data contents, making it a full payload standard.
DDI 3 is such an exciting standard because it allows for the possibility of true and full computational support for data harmonization and for really working with longitudinal data. It’s the type of data standard I’d been waiting for because it gets it. Data standards need to be able to support documenting, containing, expressing, and computing (analysis, harmonization, limitations on disclosure, everything we now do with less than ideal systems and methods). DDI 3 does this and that’s why groups like ICPSR are already using it. DDI 3 is already on its way to becoming ubiquitous, but more tools for it are needed.
News of others using and supporting DDI 3 is always good. Thus, it’s wonderful news that Colectica has been awarded an NIH Grant for DDI 3-based data extraction tools. From the Colectica website:
The award is a Phase I grant that provides supplemental support of Algenta’s research on an “Open Standards-Based Data Extraction Web Tool for Complex Longitudinal Datasets”. This Phase I feasibility study aims to analyze to data preparation and metadata creation workflow needed to prepare a study for online data extraction, to validate the use of the Data Documentation Initiative’s DDI 3 standard for the basis of such a tool, and to create prototype web-based data extraction software. While the focus is on longitudinal surveys, the proposed system would also handle cross-sectional, time-series, and non-repeated studies. The aim is to improve research methodologies through a simplification of the process used for discovering, retrieving, and analyzing data relevant to a researcher’s investigation and to improve data citations, aiding in reproducible research. The research includes consultation with researchers from ICPSR at the University of Michigan-Ann Arbor and the Mid-Life in the United States Longitudinal Study at the University of Wisconsin-Madison.
News: Prototype interface released for searching archival authority records
Awesome news from CDL, so reposting below. The original is here.
Prototype interface released for searching archival authority records
CDL’s Digital Special Collections program is pleased to announce the public release of a draft prototype historical access system for the Social Networks and Archival Context Project (SNAC).
SNAC is a two-year research project, funded by the National Endowment for the Humanities, that is creating a set of authority records by extracting information from archival finding aids and enhancing it with other sources. The project uses the new standard Encoded Archival Context—Corporate bodies, Persons, and Families (EAC-CPF). Data for the research is being provided by the Online Archive of California, among several other sources. Learn more about SNAC.
CDL’s role in the SNAC project is to build a prototype interface that links the authority records in a “historical social network.” Such a system has the potential to significantly expand access to a range of humanities resources, as well as our knowledge of the connections between people, families, and organizations over time.
The user prototype is being developed using an iterative approach. This first release of the system provides the most basic functionality required for researchers to imagine how they might interact with archival authority records. Development of further iterations of the prototype will continue through Spring 2012.
Tell us what you think!
We welcome your suggestions on both the design of the prototype interface and the processing of the data. What features do you think would be most useful for researchers?
Direct access to the prototype system, a description of project work to date, and a link to the feedback forum can be found at http://socialarchive.iath.virginia.edu/prototype.html.
Announcement: MADS/RDF for review
A MADS/RDF ontology developed at the Library of Congress is available for a public review period until Jan. 14, 2011. The MADS/RDF (Metadata Authority Description Schema in RDF) vocabulary is a data model for authority and vocabulary data used within the library and information science (LIS) community, which is inclusive of museums, archives, and other cultural institutions. It is presented as an OWL ontology.
Documentation and the ontology are available at: http://www.loc.gov/standards/mads/rdf/
Based on the MADS/XML schema, MADS/RDF provides a means to record data from the Machine Readable Cataloging (MARC) Authorities format in RDF for use in semantic applications and Linked Data projects. MADS/RDF is a knowledge organization system designed for use with controlled values for names (personal, corporate, geographic, etc.), thesauri, taxonomies, subject heading systems, and other controlled value lists. It is closely related to SKOS, the Simple Knowledge Organization System and a widely supported and adopted RDF vocabulary. Unlike SKOS, however, which is very broad in its application, MADS/RDF is designed specifically to support authority data as used by and needed in the LIS community and its technology systems. Given the close relationship between the aim of MADS/RDF and the aim of SKOS, the MADS ontology has been fully mapped to SKOS.
Community feedback is encouraged and welcomed. The MODS listserv – MADS/XML is maintained as part of the community work on MODS (Metadata Object Description Schema) – is the preferred forum for feedback: http://listserv.loc.gov/listarch/mods.html (send mail to: mods@listserv.loc.gov). Kevin Ford, the primary architect of the model, will be responding on that forum in order to have an open discussion.
Finding Guides in SobekCM
SobekCM – the system powering the UF Digital Collections, the Digital Library of the Caribbean, and many other rich collections – will soon have advanced support for finding guides in EAD. This has been in process as a complete solution for the full workflow and it’s nearing completion. Check out the EADs we’re testing with here.
The benefits from fully supporting EAD within the same digital library system supporting digital objects is enormous:
- Finding guides can be displayed, searched, and used within the same system as the digital objects they reference (increases usability from consistent navigation, ease of searching a single system, additional benefits from any applicable system enhancements)
- Finding guides benefit from existing automation. For SobekCM this includes the automatic creation of MARC records from the EADs and the automatic record feed of MARC records into the library catalog.
SobekCM’s support for EADs has been enabled through programming by Mark Sullivan. The programming created an EAD reader for importing the data into the standard SobekCM digital resource object and then reading the description and container list and importing as much information as possible into the digital resource object. Sections in the EAD are autodetected to create the table of contents.
With the support for importing, SobekCM supports the EADs as digital resources that can be searched for within the digital collections. When a user selects any digital resource to view in SobekCM, the METS file is read. This provides some basic information like wordmarks and the type of digital resource it is. If the digital resource is a finding guide (defined by being Archival/collection and having an EAD listed as one of the downloads ) the EAD is then read into the SobekCM digital resource object. While the container list will be read identically, the top portion of the EAD is pulled into the display and stored as one large block of text/xml with the XSL transform applied to display the description.
The auto-created table of contents is a bit different from any of the existing table of contents because it floats to the left constantly (scrolling down, it floats down to stay onscreen at all times), and this is needed for reading longer HTML-style documents that have a lot of scrolling, as opposed to our normal page-turner model.
When EAD results show after a search, the search terms are highlighted. This is still being refined, but it’s active in test already and will soon be fully active. After that, the final steps are for handling the container list.
To see it in test (which will only be active for awhile, since this will soon be live):
- Go here: http://ufdc.ufl.edu/testcol
- Search for RAWLINGS
- Click on the EAD
- The search terms are highlighted.
The Federal Agencies Digitization Guidelines Initiative Announces the Release of BWF MetaEdit
BWF MetaEdit is a free, open source tool that supports embedding, validating, and exporting of metadata in Broadcast WAVE Format (BWF) files. BWF MetaEdit is available for download at SourceForge: http://sourceforge.net/projects/bwfmetaedit/. BWF MetaEdit was developed by the Federal Agencies Digitization Guidelines Initiative to support its guideline for embedded metadata in the bext and INFO chunks (http://www.digitizationguidelines.gov/audio-visual/documents/wave_metadata.html). The application was developed by AudioVisual Preservation Solutions (http://www.avpreserve.com/).
Users of BWF MetaEdit can:
- Import, edit, embed, and export specified metadata elements in WAVE audio files
- Export technical metadata from Format Chunks and minimal metadata from bext and INFO chunks as comma-separated values and/or XML, across a set of files or from individual files
- Evaluate, verify and embed MD5 checksums, as applied to the WAVE file’s data chunk (audio bitstream only)
- Enforce the guideline (above) developed by the Federal Agencies Audio-Visual Working Group, as well as specifications from the European Broadcasting Union (EBU), Microsoft, and IBM
- Generate reports that show errors in the construction of WAVE files
- Choose from command line and GUI, for Windows/PC, Macintosh OS, Linux. See the list of options at SourceForge: http://sourceforge.net/projects/bwfmetaedit/files/
The ‘Machine Readable’ part of the MARC acronym is a lie
The most recent Code4Lib Journal issue has an excellent article that should be mandatory reading for anyone working or with a library.
The article is “Interpreting MARC: Where’s the Bibliographic Data?” by Jason Thomale. In it, he explains in extremely clear terms exactly what MARC is not. He begins by explaining that MARC pre-dated relational databases. That means everything we think about for computers, digital processing, data structures, and logic doesn’t apply for MARC.
The title of this blog post is from one of the article’s notes:
There is also the statement about working with MARC data purportedly made by Google engineer Leonid Taycher that “the first thing he had to learn was that the ‘Machine Readable’ part of the MARC acronym was a lie” (from http://go-to-hellman.blogspot.com/2010/01/google-exposes-book-metadata-privates.html).
This quote illustrates the core thrust and value of the entire article which is: MARC is presented as being functional for machines/computers and it is not. I didn’t understand what MARC was for over a year after I was working in a digital library center and working with MARC. I couldn’t fathom that any method could ever and certainly not still be in operation where data was treated or handled in the way that MARC operates.
I still don’t understand how library catalogs actually work. I understand computers and I understand punch cards, but MARC isn’t either. I understand how it works at an individual record level, but I don’t see how it could work at a system level. Thomale’s article explains his process of unlearning basic world assumptions in order to deal with MARC. The comments on the article show that there are many people who have undergone the same process to learn “that the ‘Machine Readable’ part of the MARC acronym is a lie.”
Encoded Archival Context Project – Social Networks and Archival Context Project (SNAC)
From the SNAC website:
Leveraging the new standard Encoded Archival Context-Corporate Bodies, Persons, and Families (EAC-CPF), the SNAC Project will use digital technology to “unlock” descriptions of people from finding aids and link them together in exciting new ways. We will:
- Create efficient open-source tools that allow archivists to separate the process of describing people from that of records.
- Create a prototype integrated historical resource and access system that will link descriptions of people to one another and to descriptions of resources in archives, libraries and museums; online biographical and historical databases; and other diverse resources.