Digital Library Center Blog | UF

Chronicling work on the UF Digital Collections, SobekCM, & the Digital Humanities

Archive for the ‘tools’ Category

Humanities Grant Proposal Review Opportunity, Fall 2011; UF Center for the Humanities and the Public Sphere

without comments

Humanities Grant Proposal Review Opportunity, Fall 2011
UF Center for the Humanities and the Public Sphere

Grant Proposal Review Opportunity

Faculty members in the humanities are invited to submit complete draft proposals (minus reference letters) by December 16th for single-blind review by three UF referees with experience serving on grant review panels at the national level. Feedback will be returned by February 5th, to enable revision and submission of proposals for spring 2012 deadlines. This opportunity is limited to 15 faculty members; in the case of over-subscription, preference will be given to those who did not participate in the Spring/Summer 2011 opportunity.

To participate, please RSVP by Dec. 9th to Sophia Acord (skacord@ufl.edu)

Humanities Grant Support and Databases

The Center for the Humanities and the Public Sphere grants resource pages have been reorganized and revamped, with new information on UF grant-writing resources, digital humanities, public humanities, and many more listings for internal and external funding opportunities for graduate students and faculty.

These activities are made possible with support from the CLAS Dean’s Office and the UF Office of Research.

Written by Laurie N. Taylor

November 7th, 2011 at 11:44 am

Posted in cfp,grant,tools,tutorials

CFP: DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices

without comments

DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices
September, 19-23, 2011
Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany

Goals

This symposium-style workshop will bring together representatives from major longitudinal data collection efforts to share expertise and to explore the use of the DDI metadata standard as a means of managing and structuring longitudinal study documentation. Participants will work collaboratively to create best practices for documenting longitudinal data in its various forms, including panel data and repeated cross-sections.

Description of the workshop

Longitudinal survey data carry special challenges related to documenting and managing data over time, over geography, and across multiple languages. This complexity is often a barrier to building efficient systems for data access and analysis. DDI (Data Documentation Initiative) Lifecyle, a metadata standard that addresses the full life cycle of social science research data (formerly referred to as DDI 3), is designed to provide an efficient structure for the documentation of complex longitudinal data. In this workshop, participants involved in longitudinal data projects around the world will work together on issues involved in documenting longitudinal data.

Intended audience: Individuals with expertise in longitudinal social science data; knowledge of DDI is desired but not required. The intent is to have a mix of participants with substantive and technical skills. Participants should provide access to materials describing their projects, which can serve as use cases in applying DDI. The workshop is in English. This is the second Dagstuhl workshop on the topic; the first took place in October 2010. The upcoming workshop will continue the in-depth discussion begun last year, expanding into additional topics.

Expected Results

Participants will write best practice papers, to be published in the DDI Working Paper Working Paper Series. Last year’s workshop produced a series of best practice papers on longitudinal data.

Possible Topics

Documenting comparison, harmonization, and the relationship among concepts, questions, and variables over time, as well as the relationship of respondent types (person, household) are typical issues for longitudinal data. Other topics not specific to longitudinal data:
- Classifications (e.g., ISCO, ISCED)
- Data collection details
- Qualitative data, other types of data sources beyond surveys
- Quality of metadata and data
- Data management planning
- Relationship to the Open Archival Information System (OAIS)
- Extension of DDI for specific needs

These topics are often more salient for longitudinal data, making it even more critical manage these metadata in a structured form over time and countries. The current possibilities of DDI Lifecyle will be explored and areas for future extensions identified. Additionally, participants can suggest their area of interest.

Venue

The workshop will take place at the Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany. The non-profit center is a member of the Leibniz Association and is funded jointly by the German federal government and a number of state  governments. The venue provides an intense working atmosphere in a nice remote region. Several seminar rooms and cafeteria while the day, and leisure rooms like wine bar and billiard room while the evening promote intense discussion and communication. Accommodation costs at Dagstuhl including full board is 60 Euro/day/person (subsidized rate).

Sponsors

This workshop is sponsored by the DDI Alliance, GESIS – Leibniz Institute for the Social Sciences, Minnesota Population Center (MPC), and Open Data Foundation (ODaF).

Contact

The names of interested organizations and individuals should be sent to ddi-expert-workshop@icpsr.umich.edu. Please provide contact information, area of interest, and area of expertise for each individual, information regarding DDI Lifecyle implementation, and a statement of what each individual can contribute to the workshop. Direct questions to ddi-expert-workshop@icpsr.umich.edu. Twenty-one participants will be accepted.

Links

Related Web page: http://www.dagstuhl.de/11382
Best practice papers on longitudinal data: http://www.ddialliance.org/resources/publications/working/BestPractices/LongitudinalData
DDI Working Paper Working Paper Series: http://www.ddialliance.org/resources/publications/working
Further information on “How to get to Dagstuhl”: http://www.dagstuhl.de/en/about-dagstuhl/arrival/
Pictures of Dagstuhl: http://www.dagstuhl.de/en/about-dagstuhl/press/downloads/
DDI Alliance: http://www.ddialliance.org/
GESIS – Leibniz Institute for the Social Sciences: http://www.gesis.org/
Minnesota Population Center (MPC): http://www.pop.umn.edu/
Open Data Foundation (ODaF): http://www.opendatafoundation.org/

The organizers would appreciate hearing soon from interested people.

Mary Vardigan, Director DDI Alliance
Wendy Thomas, Chair DDI Technical Implementation Committee
Joachim Wackerow, Vice Chair DDI Technical Implementation Committee
Arofan Gregory, Technical Consultant
(Organizers)

GESIS – Leibniz Institute for the Social Sciences
Department: Monitoring Society and Social Change
Unit: Social Science Metadata Standards
Visiting address: B2 1, 68159 Mannheim, Germany
Postal address: P.O. Box 122155, 68072 Mannheim, Germany
Phone: +49 (0)621 1246 262
Fax: +49 (0)621 1246 100
E-mail: joachim.wackerow@gesis.org
www.gesis.org/en/institute/

Written by Laurie N. Taylor

June 25th, 2011 at 2:03 pm

Projects to Watch: RoSE

without comments

On Alan Liu’s website, he provides an overview of RoSE, a research-oriented social environment:

Created as an outcome of the Transliteracies Project, RoSE is a Web-based knowledge-exploration system that fuses a social-computing model to humanities bibliographical resources to allow users to explore the present and past of the human record as one “social network.” Stocked with initial information data-mined from YAGO and Project Gutenberg (with plans for data-mining the SNAC Project), RoSE provides profile pages about persons and documents, keywords and other data, and visualizations that help users see the relationships between people and documents. Uniquely, it also allows users (humanities students, scholars, and research groups) to add “thickly described” metadata on top of standard bibliographical data. This facilitates a social-network-like sense of active, dynamic interrelation with the objects of research. (cite)

This is a very exciting project because it promises to fuse archival and current researcher networks for tracking and studying relations between authors and documents. A such, it will allow users to explore and study the lives and social networks shared by and through both documents and authors. RoSE currently requires a login, so I’ll be anxiously awaiting its opening for general access and play.

Written by Laurie N. Taylor

April 21st, 2011 at 2:43 am

Spatial Humanities

without comments

The University of Virginia Libraries has announced the launch of “Spatial Humanities,” a community-driven resource for place-based digital scholarship:

http://spatial.scholarslab.org/

The site was developed in response to needs identified by faculty and the site includes:

  • an evolving, crowdsourced catalog of research resources, projects, and organizations
  • a set of framing essays on the spatial turn across the disciplines by Dr. Jo Guldi of the Harvard Society of Fellows
  • GIS-related feeds from Q&A sites and other forms of social media
  • a peer-reviewed, occasional publication for step-by-step tutorials in spatial tools and methods

UVa is inviting everyone to participate:

  • use Zotero to freely upload research citations, projects, and links to groups
  • contribute your own tutorials and helpsheets in “Step By Step” format for peer review and formal publication
  • adopt the #geoinst hashtag on Twitter and Delicious
  • ask related questions and offer help on DH Answers or the GIS Stack Exchange
  • post commentary on the essays

This looks like another great resource for all scholars.

Written by Laurie N. Taylor

April 15th, 2011 at 2:46 pm

Data Documentation Initiative 3 (DDI 3) Data Extraction Tools from Colectica Awarded an NIH Grant

without comments

The Data Documentation Initiative 3 (DDI 3) standard is a simply fabulous and full standard for metadata (data about data) as well as for the data contents, making it a full payload standard.

DDI 3 is such an exciting standard because it allows for the possibility of true and full computational support for data harmonization and for really working with longitudinal data. It’s the type of data standard I’d been waiting for because it gets it. Data standards need to be able to support documenting, containing, expressing, and computing (analysis, harmonization, limitations on disclosure, everything we now do with less than ideal systems and methods). DDI 3 does this and that’s why groups like ICPSR are already using it.  DDI 3 is already on its way to becoming ubiquitous, but more tools for it are needed.

News of others using and supporting DDI 3 is always good. Thus, it’s wonderful news that Colectica has been awarded an NIH Grant for DDI 3-based data extraction tools. From the Colectica website:

The award is a Phase I grant that provides supplemental support of Algenta’s research on an “Open Standards-Based Data Extraction Web Tool for Complex Longitudinal Datasets”. This Phase I feasibility study aims to analyze to data preparation and metadata creation workflow needed to prepare a study for online data extraction, to validate the use of the Data Documentation Initiative’s DDI 3 standard for the basis of such a tool, and to create prototype web-based data extraction software. While the focus is on longitudinal surveys, the proposed system would also handle cross-sectional, time-series, and non-repeated studies. The aim is to improve research methodologies through a simplification of the process used for discovering, retrieving, and analyzing data relevant to a researcher’s investigation and to improve data citations, aiding in reproducible research. The research includes consultation with researchers from ICPSR at the University of Michigan-Ann Arbor and the Mid-Life in the United States Longitudinal Study at the University of Wisconsin-Madison.

Written by Laurie N. Taylor

April 5th, 2011 at 5:18 pm

UFDC/SobekCM Tracking System

without comments

The UF Digital Collections System, SobekCM, is always being enhanced to better meet user and internal needs.

Normally the vast majority of time is spent on the user side because user support is the priority. With dozens of partners who use the online and locally installed tools to manage their digitization work and to contribute digitized items to the collaborative digital collections hosted on SobekCM, user support also includes many of the internal tools.

Most recently, however, the very-internal users received a major boost in support through the addition of a tracking system within SobekCM. Before, we had a legacy tracking system that was riddled with problems, couldn’t generate reports, and wouldn’t track the location of physical materials among other problems. Now, that legacy system is gone and it’s been replaced with tracking functionality within SobekCM. This tracking functionality includes tracking milestones, a work log for all work, reports, private/public flags, born digital/analog flags, internal notes, ticklers, internal fields on physical box location for item tracking during production, and more. It’s fabulous and there’s more on it here: http://ufdc.ufl.edu/sobekcm/tracking

Ideas, feedback, and suggestions are always welcome.

Written by Laurie N. Taylor

March 13th, 2011 at 2:53 am

Alliance for Networking Visual Culture & Video Book Published by MIT Press

without comments

The Alliance for Networking Visual Culture:

seeks to enrich the intellectual potential of our fields to inform understandings of an expanding array of visual practices as they are reshaped within digital culture, while also creating scholarly contexts for the use of digital media in film, media and visual studies.  By working with humanities centers, scholarly societies, and key library, archive, and university press partners, we are investigating and developing sustainable platforms for publishing interactive and rich media scholarship.

The Alliance has strategic partnerships with four archives (the Shoah Foundation, Critical Commons, the Hemispheric Institute’s Digital Video Library, and the Internet Archive) and three university presses (MIT, California and Duke). These  partners are providing the initial testing ground for the investigation of new publishing templates. Through working with the partners and disseminating the research and experimental methods and tools, the Alliance is working to better connect and integrate curated digital archives and scholarly publication by better enabling scholars to work with archival materials and to enable new forms of scholarship and new ways of doing scholarly work. “By creating an alliance between scholars, presses and archives, we will identify broad types of emerging scholarly communication and produce working demonstration projects with each partner press to illustrate these types.”

MIT Press has now published one of the Alliance projects, Learning from YouTube, which is available online. Read more about it on the Alliance blog, which is here.

Written by Laurie N. Taylor

February 25th, 2011 at 4:19 am

OCR Text Correction is a Good Project for Crowdsourcing

with one comment

Correcting text created by OCR (optical character recognition) is a great project for crowdsourcing because it can be isolated and scaled. Essentially, it can be made into a small task and the overall need can benefit from loads of small contributions, made through the small task interface. A great deal of digital library work can’t be sliced/scaled/isolated like this, and with so much work to do, it’s always nice when something can involve others for the benefit of everyone.

The National Library of Finland recently came out with new games-as-tools for correcting OCR text, and their website explains:

We need your help. Most of the information in the library’s newspaper archives has already been copied into computer databases using computerized text recognition. The problem is that computers fail to recognize all the words. Especially when the quality of the source material is poor, the results need to be fixed by hand. This requires a lot of manual work.

At the moment, when you play games in Digitalkoot you help correct words. Later this year you will also be able to help structure the documents and tag images.

I’m interested if anyone has reports on the success of this method. This is a higher level of investment than many given its contextualization of the work within a game-as-tool interface, and I don’t know if this would lead to greater or less success. The National Library of Australia has been phenomenally successful by allowing people to simply contribute the corrected text through an easy, no-frills interface as seen here.

I’m partial to the National Library of Australia’s method because it requires less initial resource investment, it’s proven to continue to return on design investment for the long-term, and it appeals to such a large and wide demographic that I would think it would be the most successful model. Of course, I’m most partial to whatever works so I’d love to know if folks have reports on the success rates of Finland’s games or other methods for crowdsourcing OCR correction.

Written by Laurie N. Taylor

February 13th, 2011 at 3:32 pm

New UFDC Features: Browse By, Admin Header, and Export to Excel

without comments

Browse By Metadata (i.e., list of all publishers in an item aggregation)

The UF Digital Collections (UFDC) have more new features. These are all in progress, as is the norm with the perpetual beta of growing and evolving systems, but the “Browse By” feature is already publicly viewable here for the Baldwin Library of Historical Children’s Literature Digital Collection.

This is still in process as we test to see how to be display so much rich metadata with significant distinctions, as when an author is also an editor and printer – should it all be collapsed into one, if so then should all types be listed at the end, should they all remain, what about when there are multiple types and then another unclear not on the affiliation? We’ll be working through these questions and more, and we’ll be doing so with abundant feedback from users.

Further Simplification for Simplified URLs

UFDC’s already simplified URLs are even simpler, with the base now http://ufdc.ufl.edu instead of http://ufdc.uflib.ufl.edu. The longer version is still fully supported.

Features for Internal Users

Internal Header

Internal Header

Internal Header

UFDC now has an internal header (internal meaning it’s only for internal folks who are logged in). It allows internal users to easily search by BibID. Right now, this can also be done using the main search box, but the internal header will eventually allow for specialized internal searches and for searching the records for items in process that are not yet online. This is part of the merging of offline workflows into the SobekCM system.

Export to XLS or CSV

Export to XLS or CSV

Export to Excel

This is also internal-only and it allows internal users to export a list of items directly from SobekCM/UFDC. This complements an update to the UFDC_CM (currently an offline-only tool) which can now pull MARC records for items online. Both of these changes are part of the work to add reporting to SobekCM and the work to integrate existing tools into one system (for greater efficiency for supporting and using the tools).

Related

Like these, other seemingly internal-only enhancements also benefit external users by increasing SobekCM’s capabilities as a system and the Digital Library Center’s ability to work more effectively in digitizing materials and adding them to the UF Digital Collections.

Written by Laurie N. Taylor

October 3rd, 2010 at 7:07 pm

SobekCM, weighing in at 113,643 lines of code (plus comments)

without comments

Mark Sullivan, the UFDC/DLC/dLOC programmer, recently shared this information. It’s exciting to see that SobekCM (our digital asset management system, digital library system, and digital production tool set) is such a streamlined solution with so much functionality. There are seven projects which make up the SobekCM solution. In those projects, there are:

113,643 lines of code ( not comments or empty lines )
23,452 lines of comments
420 files
60 folders
544 classes ( 55 abstract classes, 1 windows form, 5 ASPX pages )
14 interfaces

The main two projects are:

1) SobekCM_Bib_Package which has all the code to represent digital objects, read metadata, write metadata, etc.. This is used throughout all the DLC/UFDC/dLOC applications.

36,554 lines of code
5,796 lines of comments
121 files
19 folders
165 classes ( 3 abstract classes, 1 windows form )

2) SobekCM_Library which does all the rendering, navigation, authentication, etc for the SobekCM library. This relies heavily upon the above library for reading and displaying of digital resources and is utilized by both the builder and the customization manager.

68,803 lines of code
13,825 lines of comments
251 files
30 folders
328 classes ( 52 abstract classes )
11 interfaces

This does not include the 22 separate javascript files of which eight are written by me and include 3951 new lines of code and 702 lines of comments.

3) While the main SobekCM web project is not strictly a library, it is the third project in the SobekCM solution. It is the first project which a user interacts with when entering the library. This project is actually very small, containing only about 1300 .NET lines. It does house the five web forms used in the application, although these forms are quite small and are just basically skeletons into which the SobekCM_Library renders HTML or controls.

4) SobekCM_URL_Rewriter is a tiny library which is essentially just a HttpModule for rewriting and translating the URL to allow for cleaner URLs.

5) SobekCM_Tools is a small library ( about 4000 code lines ) which contains additional classes for logging, interacting with the tracking database, and interacting with the Florida Dark Archive (DAITSS). This is kept seperate from the general library since this is not strictly involved in rendering the HTML but is used by some modules and is used with the SobekCM_Library and SobekCM_Builder libraries for building collection text indexes and loading new items through the Builder.

6) FileUploadLibrary ( written by Darren Johnstone ) is about 3000 .NET code lines and 5500 lines of javascript used for uploading data via HTTP with a real-time upload progress bar. Quite useful and cool library which was adopted with very few changes and worked quite simply. Highly recommended… ( http://darrenjohnstone.net/ )

7) SobekCM_Builder. In addition to these libraries/projects used by the digital library, this library is employed (along with the SobekCM_Bib_Package, SobekCM_Library, and SobekCM_Tools) for the builder software which runs constantly in the background on another server, loading new items which are deposited into network folders or FTP folder. It also updates and builds all static pages, OAI feeds, RSS feeds, and builds the text indexes. Additionally, it reads and loads all of the FDA ingest reports from DAITSS.

6793 lines of code
858 lines of comments
26 files
3 folders
37 classes

Written by Laurie N. Taylor

August 16th, 2010 at 12:48 pm