Archive for the ‘tools’ Category
Humanities Grant Proposal Review Opportunity, Fall 2011; UF Center for the Humanities and the Public Sphere
Humanities Grant Proposal Review Opportunity, Fall 2011
UF Center for the Humanities and the Public Sphere
Grant Proposal Review Opportunity
Faculty members in the humanities are invited to submit complete draft proposals (minus reference letters) by December 16th for single-blind review by three UF referees with experience serving on grant review panels at the national level. Feedback will be returned by February 5th, to enable revision and submission of proposals for spring 2012 deadlines. This opportunity is limited to 15 faculty members; in the case of over-subscription, preference will be given to those who did not participate in the Spring/Summer 2011 opportunity.
To participate, please RSVP by Dec. 9th to Sophia Acord (skacord@ufl.edu)
Humanities Grant Support and Databases
The Center for the Humanities and the Public Sphere grants resource pages have been reorganized and revamped, with new information on UF grant-writing resources, digital humanities, public humanities, and many more listings for internal and external funding opportunities for graduate students and faculty.
These activities are made possible with support from the CLAS Dean’s Office and the UF Office of Research.
CFP: DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices
DDI Workshop: Managing Metadata for Longitudinal Data – Best Practices
September, 19-23, 2011
Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany
Goals
This symposium-style workshop will bring together representatives from major longitudinal data collection efforts to share expertise and to explore the use of the DDI metadata standard as a means of managing and structuring longitudinal study documentation. Participants will work collaboratively to create best practices for documenting longitudinal data in its various forms, including panel data and repeated cross-sections.
Description of the workshop
Longitudinal survey data carry special challenges related to documenting and managing data over time, over geography, and across multiple languages. This complexity is often a barrier to building efficient systems for data access and analysis. DDI (Data Documentation Initiative) Lifecyle, a metadata standard that addresses the full life cycle of social science research data (formerly referred to as DDI 3), is designed to provide an efficient structure for the documentation of complex longitudinal data. In this workshop, participants involved in longitudinal data projects around the world will work together on issues involved in documenting longitudinal data.
Intended audience: Individuals with expertise in longitudinal social science data; knowledge of DDI is desired but not required. The intent is to have a mix of participants with substantive and technical skills. Participants should provide access to materials describing their projects, which can serve as use cases in applying DDI. The workshop is in English. This is the second Dagstuhl workshop on the topic; the first took place in October 2010. The upcoming workshop will continue the in-depth discussion begun last year, expanding into additional topics.
Expected Results
Participants will write best practice papers, to be published in the DDI Working Paper Working Paper Series. Last year’s workshop produced a series of best practice papers on longitudinal data.
Possible Topics
Documenting comparison, harmonization, and the relationship among concepts, questions, and variables over time, as well as the relationship of respondent types (person, household) are typical issues for longitudinal data. Other topics not specific to longitudinal data:
- Classifications (e.g., ISCO, ISCED)
- Data collection details
- Qualitative data, other types of data sources beyond surveys
- Quality of metadata and data
- Data management planning
- Relationship to the Open Archival Information System (OAIS)
- Extension of DDI for specific needs
These topics are often more salient for longitudinal data, making it even more critical manage these metadata in a structured form over time and countries. The current possibilities of DDI Lifecyle will be explored and areas for future extensions identified. Additionally, participants can suggest their area of interest.
Venue
The workshop will take place at the Leibniz Center for Informatics, Schloss Dagstuhl, Wadern, Germany. The non-profit center is a member of the Leibniz Association and is funded jointly by the German federal government and a number of state governments. The venue provides an intense working atmosphere in a nice remote region. Several seminar rooms and cafeteria while the day, and leisure rooms like wine bar and billiard room while the evening promote intense discussion and communication. Accommodation costs at Dagstuhl including full board is 60 Euro/day/person (subsidized rate).
Sponsors
This workshop is sponsored by the DDI Alliance, GESIS – Leibniz Institute for the Social Sciences, Minnesota Population Center (MPC), and Open Data Foundation (ODaF).
Contact
The names of interested organizations and individuals should be sent to ddi-expert-workshop@icpsr.umich.edu. Please provide contact information, area of interest, and area of expertise for each individual, information regarding DDI Lifecyle implementation, and a statement of what each individual can contribute to the workshop. Direct questions to ddi-expert-workshop@icpsr.umich.edu. Twenty-one participants will be accepted.
Links
Related Web page: http://www.dagstuhl.de/11382
Best practice papers on longitudinal data: http://www.ddialliance.org/resources/publications/working/BestPractices/LongitudinalData
DDI Working Paper Working Paper Series: http://www.ddialliance.org/resources/publications/working
Further information on “How to get to Dagstuhl”: http://www.dagstuhl.de/en/about-dagstuhl/arrival/
Pictures of Dagstuhl: http://www.dagstuhl.de/en/about-dagstuhl/press/downloads/
DDI Alliance: http://www.ddialliance.org/
GESIS – Leibniz Institute for the Social Sciences: http://www.gesis.org/
Minnesota Population Center (MPC): http://www.pop.umn.edu/
Open Data Foundation (ODaF): http://www.opendatafoundation.org/
The organizers would appreciate hearing soon from interested people.
Mary Vardigan, Director DDI Alliance
Wendy Thomas, Chair DDI Technical Implementation Committee
Joachim Wackerow, Vice Chair DDI Technical Implementation Committee
Arofan Gregory, Technical Consultant
(Organizers)
GESIS – Leibniz Institute for the Social Sciences
Department: Monitoring Society and Social Change
Unit: Social Science Metadata Standards
Visiting address: B2 1, 68159 Mannheim, Germany
Postal address: P.O. Box 122155, 68072 Mannheim, Germany
Phone: +49 (0)621 1246 262
Fax: +49 (0)621 1246 100
E-mail: joachim.wackerow@gesis.org
www.gesis.org/en/institute/
Projects to Watch: RoSE
On Alan Liu’s website, he provides an overview of RoSE, a research-oriented social environment:
Created as an outcome of the Transliteracies Project, RoSE is a Web-based knowledge-exploration system that fuses a social-computing model to humanities bibliographical resources to allow users to explore the present and past of the human record as one “social network.” Stocked with initial information data-mined from YAGO and Project Gutenberg (with plans for data-mining the SNAC Project), RoSE provides profile pages about persons and documents, keywords and other data, and visualizations that help users see the relationships between people and documents. Uniquely, it also allows users (humanities students, scholars, and research groups) to add “thickly described” metadata on top of standard bibliographical data. This facilitates a social-network-like sense of active, dynamic interrelation with the objects of research. (cite)
This is a very exciting project because it promises to fuse archival and current researcher networks for tracking and studying relations between authors and documents. A such, it will allow users to explore and study the lives and social networks shared by and through both documents and authors. RoSE currently requires a login, so I’ll be anxiously awaiting its opening for general access and play.
Spatial Humanities
The University of Virginia Libraries has announced the launch of “Spatial Humanities,” a community-driven resource for place-based digital scholarship:
http://spatial.scholarslab.org/
The site was developed in response to needs identified by faculty and the site includes:
- an evolving, crowdsourced catalog of research resources, projects, and organizations
- a set of framing essays on the spatial turn across the disciplines by Dr. Jo Guldi of the Harvard Society of Fellows
- GIS-related feeds from Q&A sites and other forms of social media
- a peer-reviewed, occasional publication for step-by-step tutorials in spatial tools and methods
UVa is inviting everyone to participate:
- use Zotero to freely upload research citations, projects, and links to groups
- contribute your own tutorials and helpsheets in “Step By Step” format for peer review and formal publication
- adopt the #geoinst hashtag on Twitter and Delicious
- ask related questions and offer help on DH Answers or the GIS Stack Exchange
- post commentary on the essays
This looks like another great resource for all scholars.
Data Documentation Initiative 3 (DDI 3) Data Extraction Tools from Colectica Awarded an NIH Grant
The Data Documentation Initiative 3 (DDI 3) standard is a simply fabulous and full standard for metadata (data about data) as well as for the data contents, making it a full payload standard.
DDI 3 is such an exciting standard because it allows for the possibility of true and full computational support for data harmonization and for really working with longitudinal data. It’s the type of data standard I’d been waiting for because it gets it. Data standards need to be able to support documenting, containing, expressing, and computing (analysis, harmonization, limitations on disclosure, everything we now do with less than ideal systems and methods). DDI 3 does this and that’s why groups like ICPSR are already using it. DDI 3 is already on its way to becoming ubiquitous, but more tools for it are needed.
News of others using and supporting DDI 3 is always good. Thus, it’s wonderful news that Colectica has been awarded an NIH Grant for DDI 3-based data extraction tools. From the Colectica website:
The award is a Phase I grant that provides supplemental support of Algenta’s research on an “Open Standards-Based Data Extraction Web Tool for Complex Longitudinal Datasets”. This Phase I feasibility study aims to analyze to data preparation and metadata creation workflow needed to prepare a study for online data extraction, to validate the use of the Data Documentation Initiative’s DDI 3 standard for the basis of such a tool, and to create prototype web-based data extraction software. While the focus is on longitudinal surveys, the proposed system would also handle cross-sectional, time-series, and non-repeated studies. The aim is to improve research methodologies through a simplification of the process used for discovering, retrieving, and analyzing data relevant to a researcher’s investigation and to improve data citations, aiding in reproducible research. The research includes consultation with researchers from ICPSR at the University of Michigan-Ann Arbor and the Mid-Life in the United States Longitudinal Study at the University of Wisconsin-Madison.
UFDC/SobekCM Tracking System
The UF Digital Collections System, SobekCM, is always being enhanced to better meet user and internal needs.
Normally the vast majority of time is spent on the user side because user support is the priority. With dozens of partners who use the online and locally installed tools to manage their digitization work and to contribute digitized items to the collaborative digital collections hosted on SobekCM, user support also includes many of the internal tools.
Most recently, however, the very-internal users received a major boost in support through the addition of a tracking system within SobekCM. Before, we had a legacy tracking system that was riddled with problems, couldn’t generate reports, and wouldn’t track the location of physical materials among other problems. Now, that legacy system is gone and it’s been replaced with tracking functionality within SobekCM. This tracking functionality includes tracking milestones, a work log for all work, reports, private/public flags, born digital/analog flags, internal notes, ticklers, internal fields on physical box location for item tracking during production, and more. It’s fabulous and there’s more on it here: http://ufdc.ufl.edu/sobekcm/tracking
Ideas, feedback, and suggestions are always welcome.
Alliance for Networking Visual Culture & Video Book Published by MIT Press
The Alliance for Networking Visual Culture:
seeks to enrich the intellectual potential of our fields to inform understandings of an expanding array of visual practices as they are reshaped within digital culture, while also creating scholarly contexts for the use of digital media in film, media and visual studies. By working with humanities centers, scholarly societies, and key library, archive, and university press partners, we are investigating and developing sustainable platforms for publishing interactive and rich media scholarship.
The Alliance has strategic partnerships with four archives (the Shoah Foundation, Critical Commons, the Hemispheric Institute’s Digital Video Library, and the Internet Archive) and three university presses (MIT, California and Duke). These partners are providing the initial testing ground for the investigation of new publishing templates. Through working with the partners and disseminating the research and experimental methods and tools, the Alliance is working to better connect and integrate curated digital archives and scholarly publication by better enabling scholars to work with archival materials and to enable new forms of scholarship and new ways of doing scholarly work. “By creating an alliance between scholars, presses and archives, we will identify broad types of emerging scholarly communication and produce working demonstration projects with each partner press to illustrate these types.”
MIT Press has now published one of the Alliance projects, Learning from YouTube, which is available online. Read more about it on the Alliance blog, which is here.
OCR Text Correction is a Good Project for Crowdsourcing
Correcting text created by OCR (optical character recognition) is a great project for crowdsourcing because it can be isolated and scaled. Essentially, it can be made into a small task and the overall need can benefit from loads of small contributions, made through the small task interface. A great deal of digital library work can’t be sliced/scaled/isolated like this, and with so much work to do, it’s always nice when something can involve others for the benefit of everyone.
The National Library of Finland recently came out with new games-as-tools for correcting OCR text, and their website explains:
We need your help. Most of the information in the library’s newspaper archives has already been copied into computer databases using computerized text recognition. The problem is that computers fail to recognize all the words. Especially when the quality of the source material is poor, the results need to be fixed by hand. This requires a lot of manual work.
At the moment, when you play games in Digitalkoot you help correct words. Later this year you will also be able to help structure the documents and tag images.
I’m interested if anyone has reports on the success of this method. This is a higher level of investment than many given its contextualization of the work within a game-as-tool interface, and I don’t know if this would lead to greater or less success. The National Library of Australia has been phenomenally successful by allowing people to simply contribute the corrected text through an easy, no-frills interface as seen here.
I’m partial to the National Library of Australia’s method because it requires less initial resource investment, it’s proven to continue to return on design investment for the long-term, and it appeals to such a large and wide demographic that I would think it would be the most successful model. Of course, I’m most partial to whatever works so I’d love to know if folks have reports on the success rates of Finland’s games or other methods for crowdsourcing OCR correction.
New UFDC Features: Browse By, Admin Header, and Export to Excel
Browse By Metadata (i.e., list of all publishers in an item aggregation)
The UF Digital Collections (UFDC) have more new features. These are all in progress, as is the norm with the perpetual beta of growing and evolving systems, but the “Browse By” feature is already publicly viewable here for the Baldwin Library of Historical Children’s Literature Digital Collection.
This is still in process as we test to see how to be display so much rich metadata with significant distinctions, as when an author is also an editor and printer – should it all be collapsed into one, if so then should all types be listed at the end, should they all remain, what about when there are multiple types and then another unclear not on the affiliation? We’ll be working through these questions and more, and we’ll be doing so with abundant feedback from users.
Further Simplification for Simplified URLs
UFDC’s already simplified URLs are even simpler, with the base now http://ufdc.ufl.edu instead of http://ufdc.uflib.ufl.edu. The longer version is still fully supported.
Features for Internal Users
Internal Header
UFDC now has an internal header (internal meaning it’s only for internal folks who are logged in). It allows internal users to easily search by BibID. Right now, this can also be done using the main search box, but the internal header will eventually allow for specialized internal searches and for searching the records for items in process that are not yet online. This is part of the merging of offline workflows into the SobekCM system.
Export to Excel
This is also internal-only and it allows internal users to export a list of items directly from SobekCM/UFDC. This complements an update to the UFDC_CM (currently an offline-only tool) which can now pull MARC records for items online. Both of these changes are part of the work to add reporting to SobekCM and the work to integrate existing tools into one system (for greater efficiency for supporting and using the tools).
Related
Like these, other seemingly internal-only enhancements also benefit external users by increasing SobekCM’s capabilities as a system and the Digital Library Center’s ability to work more effectively in digitizing materials and adding them to the UF Digital Collections.
SobekCM, weighing in at 113,643 lines of code (plus comments)
Mark Sullivan, the UFDC/DLC/dLOC programmer, recently shared this information. It’s exciting to see that SobekCM (our digital asset management system, digital library system, and digital production tool set) is such a streamlined solution with so much functionality. There are seven projects which make up the SobekCM solution. In those projects, there are:
113,643 lines of code ( not comments or empty lines )
23,452 lines of comments
420 files
60 folders
544 classes ( 55 abstract classes, 1 windows form, 5 ASPX pages )
14 interfaces
The main two projects are:
1) SobekCM_Bib_Package which has all the code to represent digital objects, read metadata, write metadata, etc.. This is used throughout all the DLC/UFDC/dLOC applications.
36,554 lines of code
5,796 lines of comments
121 files
19 folders
165 classes ( 3 abstract classes, 1 windows form )
2) SobekCM_Library which does all the rendering, navigation, authentication, etc for the SobekCM library. This relies heavily upon the above library for reading and displaying of digital resources and is utilized by both the builder and the customization manager.
68,803 lines of code
13,825 lines of comments
251 files
30 folders
328 classes ( 52 abstract classes )
11 interfaces
This does not include the 22 separate javascript files of which eight are written by me and include 3951 new lines of code and 702 lines of comments.
3) While the main SobekCM web project is not strictly a library, it is the third project in the SobekCM solution. It is the first project which a user interacts with when entering the library. This project is actually very small, containing only about 1300 .NET lines. It does house the five web forms used in the application, although these forms are quite small and are just basically skeletons into which the SobekCM_Library renders HTML or controls.
4) SobekCM_URL_Rewriter is a tiny library which is essentially just a HttpModule for rewriting and translating the URL to allow for cleaner URLs.
5) SobekCM_Tools is a small library ( about 4000 code lines ) which contains additional classes for logging, interacting with the tracking database, and interacting with the Florida Dark Archive (DAITSS). This is kept seperate from the general library since this is not strictly involved in rendering the HTML but is used by some modules and is used with the SobekCM_Library and SobekCM_Builder libraries for building collection text indexes and loading new items through the Builder.
6) FileUploadLibrary ( written by Darren Johnstone ) is about 3000 .NET code lines and 5500 lines of javascript used for uploading data via HTTP with a real-time upload progress bar. Quite useful and cool library which was adopted with very few changes and worked quite simply. Highly recommended… ( http://darrenjohnstone.net/ )
7) SobekCM_Builder. In addition to these libraries/projects used by the digital library, this library is employed (along with the SobekCM_Bib_Package, SobekCM_Library, and SobekCM_Tools) for the builder software which runs constantly in the background on another server, loading new items which are deposited into network folders or FTP folder. It also updates and builds all static pages, OAI feeds, RSS feeds, and builds the text indexes. Additionally, it reads and loads all of the FDA ingest reports from DAITSS.
6793 lines of code
858 lines of comments
26 files
3 folders
37 classes

