Digital Library Center Blog | UF

Chronicling work on the UF Digital Collections, SobekCM, & the Digital Humanities

Archive for the ‘technologies’ Category

National Archives of Australia’s Vrroom and Enabling Access

without comments

The National Archives of Australia developed and maintain Vroom – Virtual Reading Room (http://vrroom.naa.gov.au/). Vrroom is like many systems in that it provides access to archival collection records and digitized materials. To those, Vrroom has added educational and contextual materials for a number of the items. Also, items are presented together in groups with more educational context for the group of items; thus, people can learn more about specific things/people/etc as well as the larger context for those items in relation to other items all in context together.

From this description, Vrroom may seem like many educational websites. It is, but it is also an excellent example of policy needs can dictate technology (and the opposite should never be true; technology should not dictate policy) to provide needed supports that enable access. As a website, Vrroom enables access in expected ways. As a cultural heritage website, Vrroom enables access by supporting cultural heritage protections specifically by blurring thumbnail images of people and providing a warning before showing the full image and text. The warning states: “Warning. Indigenous Australians are advised that this document includes images or names of people now deceased.” (example). Technologically, this is simple. While simple, it’s also very important because enabling access means more than simply putting materials online.

Enabling access means ensuring materials can be found (outreach, promotion, search engine optimization, etc) and that the materials are usable (usability studies, help documents, etc), as well as ensuring that the materials can be made sense of and used (contextual supports, educational guides, exhibits, cultural heritage supports, etc).  Vrroom is an excellent resource for archival research and teaching, as well as being an excellent example of how cultural heritage institutions support access and what supporting access really means.

 

Written by Laurie N. Taylor

December 10th, 2011 at 9:18 pm

Announcement: DataCite Summer Meeting – Data and the Scholarly Record: the Changing Landscape

without comments

DataCite will hold its second Summer Meeting on August 24th and 25th at the historic Shattuck Plaza Hotel in Berkeley, California. The Summer Meeting will be a 1.5 day event and you can register at: http://datacite2011.eventbrite.com/ .

The Summer Meeting brings together people from research organisations, data centers, government, and information service providers to hear about the latest developments in data science, data citation, discovery, and reuse. It also provides opportunities to exchange experience and influence the next generation of data citation services.

This year’s program will include sessions on data citation, data publishing, and discussions on the new challenges that come with increased access to scientific data.

The 2010 DataCite summer meeting brought together a strong programme of speakers and participants (http://www.datacite.org/datacite_summer_meeting_2010). Highlights were published in D-Lib (http://dx.doi.org/doi:10.1045/january2011-contents).

DataCite helps researchers find, access, and reuse data. It is an international not-for-profit association founded in 2009 with members across the globe.

Written by Laurie N. Taylor

June 30th, 2011 at 9:28 pm

CFP: DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices

without comments

DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices
September, 19-23, 2011
Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany

Goals

This symposium-style workshop will bring together representatives from major longitudinal data collection efforts to share expertise and to explore the use of the DDI metadata standard as a means of managing and structuring longitudinal study documentation. Participants will work collaboratively to create best practices for documenting longitudinal data in its various forms, including panel data and repeated cross-sections.

Description of the workshop

Longitudinal survey data carry special challenges related to documenting and managing data over time, over geography, and across multiple languages. This complexity is often a barrier to building efficient systems for data access and analysis. DDI (Data Documentation Initiative) Lifecyle, a metadata standard that addresses the full life cycle of social science research data (formerly referred to as DDI 3), is designed to provide an efficient structure for the documentation of complex longitudinal data. In this workshop, participants involved in longitudinal data projects around the world will work together on issues involved in documenting longitudinal data.

Intended audience: Individuals with expertise in longitudinal social science data; knowledge of DDI is desired but not required. The intent is to have a mix of participants with substantive and technical skills. Participants should provide access to materials describing their projects, which can serve as use cases in applying DDI. The workshop is in English. This is the second Dagstuhl workshop on the topic; the first took place in October 2010. The upcoming workshop will continue the in-depth discussion begun last year, expanding into additional topics.

Expected Results

Participants will write best practice papers, to be published in the DDI Working Paper Working Paper Series. Last year’s workshop produced a series of best practice papers on longitudinal data.

Possible Topics

Documenting comparison, harmonization, and the relationship among concepts, questions, and variables over time, as well as the relationship of respondent types (person, household) are typical issues for longitudinal data. Other topics not specific to longitudinal data:
- Classifications (e.g., ISCO, ISCED)
- Data collection details
- Qualitative data, other types of data sources beyond surveys
- Quality of metadata and data
- Data management planning
- Relationship to the Open Archival Information System (OAIS)
- Extension of DDI for specific needs

These topics are often more salient for longitudinal data, making it even more critical manage these metadata in a structured form over time and countries. The current possibilities of DDI Lifecyle will be explored and areas for future extensions identified. Additionally, participants can suggest their area of interest.

Venue

The workshop will take place at the Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany. The non-profit center is a member of the Leibniz Association and is funded jointly by the German federal government and a number of state  governments. The venue provides an intense working atmosphere in a nice remote region. Several seminar rooms and cafeteria while the day, and leisure rooms like wine bar and billiard room while the evening promote intense discussion and communication. Accommodation costs at Dagstuhl including full board is 60 Euro/day/person (subsidized rate).

Sponsors

This workshop is sponsored by the DDI Alliance, GESIS – Leibniz Institute for the Social Sciences, Minnesota Population Center (MPC), and Open Data Foundation (ODaF).

Contact

The names of interested organizations and individuals should be sent to ddi-expert-workshop@icpsr.umich.edu. Please provide contact information, area of interest, and area of expertise for each individual, information regarding DDI Lifecyle implementation, and a statement of what each individual can contribute to the workshop. Direct questions to ddi-expert-workshop@icpsr.umich.edu. Twenty-one participants will be accepted.

Links

Related Web page: http://www.dagstuhl.de/11382
Best practice papers on longitudinal data: http://www.ddialliance.org/resources/publications/working/BestPractices/LongitudinalData
DDI Working Paper Working Paper Series: http://www.ddialliance.org/resources/publications/working
Further information on “How to get to Dagstuhl”: http://www.dagstuhl.de/en/about-dagstuhl/arrival/
Pictures of Dagstuhl: http://www.dagstuhl.de/en/about-dagstuhl/press/downloads/
DDI Alliance: http://www.ddialliance.org/
GESIS – Leibniz Institute for the Social Sciences: http://www.gesis.org/
Minnesota Population Center (MPC): http://www.pop.umn.edu/
Open Data Foundation (ODaF): http://www.opendatafoundation.org/

The organizers would appreciate hearing soon from interested people.

Mary Vardigan, Director DDI Alliance
Wendy Thomas, Chair DDI Technical Implementation Committee
Joachim Wackerow, Vice Chair DDI Technical Implementation Committee
Arofan Gregory, Technical Consultant
(Organizers)

GESIS – Leibniz Institute for the Social Sciences
Department: Monitoring Society and Social Change
Unit: Social Science Metadata Standards
Visiting address: B2 1, 68159 Mannheim, Germany
Postal address: P.O. Box 122155, 68072 Mannheim, Germany
Phone: +49 (0)621 1246 262
Fax: +49 (0)621 1246 100
E-mail: joachim.wackerow@gesis.org
www.gesis.org/en/institute/

Written by Laurie N. Taylor

June 25th, 2011 at 2:03 pm

California Weekly Newspapers to be Preserved Online

without comments

The University of California Riverside’s California Digital Newspaper Collection (CDNC) is expanding to include weekly papers in searchable archive. The full news story from June 21, 2011 is “California Weekly Newspapers to be Preserved Online” and it’s online here. This is great news about the California Digital Newspaper Collection’s growth and success!

Also, there is a minor point in the news story that I wanted to clarify. The news story notes:

Libraries in Minnesota and Florida also are collecting PDFs of newspaper pages, but do not offer the ability to search text across titles, Geiger said. Software developed to process historical newspapers in the California Digital Newspaper Collection makes it possible to archive PDF pages in a way that permits text searches.

The Florida Digital Newspaper Library does allow users to search text across titles. The searching does not use the PDF versions of the files for this, but the derivatives, so the process is different. This is important for the Florida Digital Newspaper Library’s users, but the point the article is trying to make is also very important. The article is trying to explain that CDNC has implemented new technology that allows this to happen from the PDFs which may be the optimal method for many other digital newspaper libraries/collections/archives. Thus, CDNC is sharing great news both in terms of more content being preserved and accessible in an ever-improving interface and  for the software that could be useful for others. I don’t know that the importance of both aspects come through in the article (which may be my own mis-reading or it may be that it isn’t completely clear in the article, which is a normal occurrence when technical news is fused with easier/fun news on new content).

Search the California Digital Newspaper Collection >>

 

Written by Laurie N. Taylor

June 22nd, 2011 at 3:43 am

News on News from DigitalPreservation.gov

without comments

News on News from DigitalPreservation.gov:

CRL Report Describes Digital Newspaper Production
May 5, 2011 — Preserving News in the Digital Environment: Mapping the Newspaper Industry in Transition (PDF) was produced for the National Digital Information Infrastructure and Preservation Program by a team from the Center for Research Libraries.

This report provides a vivid glimpse inside the workplaces that produce what – not long ago – we would have called newspapers.  As digital news-gathering and production methods proliferate, and as digital avenues for distribution emerge, these workplaces are being transformed in profound ways, with electronic facsimiles and websites (and probably more) overtaking the paper format.

The report is an outgrowth of the Preserving Digital News meeting held at the Library in September 2009, and it features illustrative examples from four American newspapers: The Arizona Republic, Seattle Post-Intelligencer (since 2008, seattlepi.com), Wisconsin State Journal, and The Chicago Tribune. There is additional information pertaining to the work of The New York Times, Investor’s Business Daily, and the Associated Press.  Altogether, the report makes it clear that the transition to the digital environment is not a neat, throw-the-switch change.

The CRL team of researchers, writers and illustrators included Jessica Alverson, Kalev Leetaru, Victoria McCargar, Kayla Ondracek, Bernard Reilly, James Simon and Eileen Wagner. Their narrative takes us through three major stages in the newspaper workflow: sourcing (gathering news information), editing and production and distribution.  Each newspaper applies somewhat different practices in each stage, ranging from the formatting of the content, the types of metadata employed, and the methods applied to manage the content in the information technology systems that support the workflow.

Here are a few highlights:

  • Most editorial systems are built around the traditional concept of a news article or story. Following longstanding newsroom practice, most text is maintained in the editorial system as part of a standard unit of content: it is a news item (for wire service reports) or a story, article or feature. (page 25)
  • After the articles and other components of a newspaper print edition are assembled and tagged in the editorial system they are usually exported to a pagination system where the page layouts for the print edition and e-facsimile edition are created. It is in the pagination system that most of the content for these editions of a newspaper is brought together for the first time. (page 28)
  • Once a locally produced news story is retired from the active or current pages of a newspaper’s website, it is often posted as the “archived” version or “version of record” in a separate part of the Web. Some newspapers outsource maintenance of these archived stories and features to archiving services like NewsBank, NewspaperArchives.com, and ProQuest. These services add value by formatting and indexing the stories and presenting them in searchable databases, which are normally hosted by the archiving service, but made to appear seamlessly connected to the newspaper site. (page 51)
  • A computer-assisted analysis of the Chicago Tribune Web site yielded a granular picture of the rate or “velocity” of updates on news web sites . . . [examining] the number of page URLs against minutes of persistence for a two-day period.  The analysis showed that in general business, entertainment, and sports news tended to be updated most frequently (sometimes several times within the half hour), while features, opinion, travel, and blog content changed less frequently. Hence the difference between print and electronic versions of newspaper content will vary considerably by type of content. (pages 55-56)
  • The newer model of the news Web, however, is exemplified by seattlepi.com, the Hearst Seattle Media’s “flagship site,” [which] focuses heavily on information of local interest, such as crime, regional politics and local sports teams.  But seattlepi.com is . . . fundamentally different from its now defunct predecessor, the Seattle Post-Intelligencer newspaper.  It features not only original staff reporting and breaking news, but blogs by staff and readers, links to other journalism and news web sites, community databases and photo galleries.  Through partnerships with other Seattle media (i.e., radio and television broadcasters), seattlepi.com also has access to video and audio produced by their local staff. (pages 52-53)

Written by Laurie N. Taylor

May 8th, 2011 at 7:28 pm

Projects to Watch: RoSE

without comments

On Alan Liu’s website, he provides an overview of RoSE, a research-oriented social environment:

Created as an outcome of the Transliteracies Project, RoSE is a Web-based knowledge-exploration system that fuses a social-computing model to humanities bibliographical resources to allow users to explore the present and past of the human record as one “social network.” Stocked with initial information data-mined from YAGO and Project Gutenberg (with plans for data-mining the SNAC Project), RoSE provides profile pages about persons and documents, keywords and other data, and visualizations that help users see the relationships between people and documents. Uniquely, it also allows users (humanities students, scholars, and research groups) to add “thickly described” metadata on top of standard bibliographical data. This facilitates a social-network-like sense of active, dynamic interrelation with the objects of research. (cite)

This is a very exciting project because it promises to fuse archival and current researcher networks for tracking and studying relations between authors and documents. A such, it will allow users to explore and study the lives and social networks shared by and through both documents and authors. RoSE currently requires a login, so I’ll be anxiously awaiting its opening for general access and play.

Written by Laurie N. Taylor

April 21st, 2011 at 2:43 am

Spatial Humanities

without comments

The University of Virginia Libraries has announced the launch of “Spatial Humanities,” a community-driven resource for place-based digital scholarship:

http://spatial.scholarslab.org/

The site was developed in response to needs identified by faculty and the site includes:

  • an evolving, crowdsourced catalog of research resources, projects, and organizations
  • a set of framing essays on the spatial turn across the disciplines by Dr. Jo Guldi of the Harvard Society of Fellows
  • GIS-related feeds from Q&A sites and other forms of social media
  • a peer-reviewed, occasional publication for step-by-step tutorials in spatial tools and methods

UVa is inviting everyone to participate:

  • use Zotero to freely upload research citations, projects, and links to groups
  • contribute your own tutorials and helpsheets in “Step By Step” format for peer review and formal publication
  • adopt the #geoinst hashtag on Twitter and Delicious
  • ask related questions and offer help on DH Answers or the GIS Stack Exchange
  • post commentary on the essays

This looks like another great resource for all scholars.

Written by Laurie N. Taylor

April 15th, 2011 at 2:46 pm

Data Documentation Initiative 3 (DDI 3) Data Extraction Tools from Colectica Awarded an NIH Grant

without comments

The Data Documentation Initiative 3 (DDI 3) standard is a simply fabulous and full standard for metadata (data about data) as well as for the data contents, making it a full payload standard.

DDI 3 is such an exciting standard because it allows for the possibility of true and full computational support for data harmonization and for really working with longitudinal data. It’s the type of data standard I’d been waiting for because it gets it. Data standards need to be able to support documenting, containing, expressing, and computing (analysis, harmonization, limitations on disclosure, everything we now do with less than ideal systems and methods). DDI 3 does this and that’s why groups like ICPSR are already using it.  DDI 3 is already on its way to becoming ubiquitous, but more tools for it are needed.

News of others using and supporting DDI 3 is always good. Thus, it’s wonderful news that Colectica has been awarded an NIH Grant for DDI 3-based data extraction tools. From the Colectica website:

The award is a Phase I grant that provides supplemental support of Algenta’s research on an “Open Standards-Based Data Extraction Web Tool for Complex Longitudinal Datasets”. This Phase I feasibility study aims to analyze to data preparation and metadata creation workflow needed to prepare a study for online data extraction, to validate the use of the Data Documentation Initiative’s DDI 3 standard for the basis of such a tool, and to create prototype web-based data extraction software. While the focus is on longitudinal surveys, the proposed system would also handle cross-sectional, time-series, and non-repeated studies. The aim is to improve research methodologies through a simplification of the process used for discovering, retrieving, and analyzing data relevant to a researcher’s investigation and to improve data citations, aiding in reproducible research. The research includes consultation with researchers from ICPSR at the University of Michigan-Ann Arbor and the Mid-Life in the United States Longitudinal Study at the University of Wisconsin-Madison.

Written by Laurie N. Taylor

April 5th, 2011 at 5:18 pm

Alliance for Networking Visual Culture & Video Book Published by MIT Press

without comments

The Alliance for Networking Visual Culture:

seeks to enrich the intellectual potential of our fields to inform understandings of an expanding array of visual practices as they are reshaped within digital culture, while also creating scholarly contexts for the use of digital media in film, media and visual studies.  By working with humanities centers, scholarly societies, and key library, archive, and university press partners, we are investigating and developing sustainable platforms for publishing interactive and rich media scholarship.

The Alliance has strategic partnerships with four archives (the Shoah Foundation, Critical Commons, the Hemispheric Institute’s Digital Video Library, and the Internet Archive) and three university presses (MIT, California and Duke). These  partners are providing the initial testing ground for the investigation of new publishing templates. Through working with the partners and disseminating the research and experimental methods and tools, the Alliance is working to better connect and integrate curated digital archives and scholarly publication by better enabling scholars to work with archival materials and to enable new forms of scholarship and new ways of doing scholarly work. “By creating an alliance between scholars, presses and archives, we will identify broad types of emerging scholarly communication and produce working demonstration projects with each partner press to illustrate these types.”

MIT Press has now published one of the Alliance projects, Learning from YouTube, which is available online. Read more about it on the Alliance blog, which is here.

Written by Laurie N. Taylor

February 25th, 2011 at 4:19 am

OCR Text Correction is a Good Project for Crowdsourcing

with one comment

Correcting text created by OCR (optical character recognition) is a great project for crowdsourcing because it can be isolated and scaled. Essentially, it can be made into a small task and the overall need can benefit from loads of small contributions, made through the small task interface. A great deal of digital library work can’t be sliced/scaled/isolated like this, and with so much work to do, it’s always nice when something can involve others for the benefit of everyone.

The National Library of Finland recently came out with new games-as-tools for correcting OCR text, and their website explains:

We need your help. Most of the information in the library’s newspaper archives has already been copied into computer databases using computerized text recognition. The problem is that computers fail to recognize all the words. Especially when the quality of the source material is poor, the results need to be fixed by hand. This requires a lot of manual work.

At the moment, when you play games in Digitalkoot you help correct words. Later this year you will also be able to help structure the documents and tag images.

I’m interested if anyone has reports on the success of this method. This is a higher level of investment than many given its contextualization of the work within a game-as-tool interface, and I don’t know if this would lead to greater or less success. The National Library of Australia has been phenomenally successful by allowing people to simply contribute the corrected text through an easy, no-frills interface as seen here.

I’m partial to the National Library of Australia’s method because it requires less initial resource investment, it’s proven to continue to return on design investment for the long-term, and it appeals to such a large and wide demographic that I would think it would be the most successful model. Of course, I’m most partial to whatever works so I’d love to know if folks have reports on the success rates of Finland’s games or other methods for crowdsourcing OCR correction.

Written by Laurie N. Taylor

February 13th, 2011 at 3:32 pm