Archive for the ‘newspapers’ Category
California Weekly Newspapers to be Preserved Online
The University of California Riverside’s California Digital Newspaper Collection (CDNC) is expanding to include weekly papers in searchable archive. The full news story from June 21, 2011 is “California Weekly Newspapers to be Preserved Online” and it’s online here. This is great news about the California Digital Newspaper Collection’s growth and success!
Also, there is a minor point in the news story that I wanted to clarify. The news story notes:
Libraries in Minnesota and Florida also are collecting PDFs of newspaper pages, but do not offer the ability to search text across titles, Geiger said. Software developed to process historical newspapers in the California Digital Newspaper Collection makes it possible to archive PDF pages in a way that permits text searches.
The Florida Digital Newspaper Library does allow users to search text across titles. The searching does not use the PDF versions of the files for this, but the derivatives, so the process is different. This is important for the Florida Digital Newspaper Library’s users, but the point the article is trying to make is also very important. The article is trying to explain that CDNC has implemented new technology that allows this to happen from the PDFs which may be the optimal method for many other digital newspaper libraries/collections/archives. Thus, CDNC is sharing great news both in terms of more content being preserved and accessible in an ever-improving interface and for the software that could be useful for others. I don’t know that the importance of both aspects come through in the article (which may be my own mis-reading or it may be that it isn’t completely clear in the article, which is a normal occurrence when technical news is fused with easier/fun news on new content).
Search the California Digital Newspaper Collection >>
“Google abandons master-plan to archive the world’s newspapers”
According to a blog story from the Boston Phoenix, “Google abandons master-plan to archive the world’s newspapers“:
Google told partners in its News Archive project that it would cease accepting, scanning, and indexing microfilm and other archival material from newspapers, and was instead focusing its energies on “newer projects that help the industry, such as Google One Pass, a platform that enables publishers to sell content and subscriptions directly from their own sites.”
The deal Google struck with partner newspapers stipulated that, somewhere down the line, a paper could purchase Google’s digital scans of its content for a fee. That fee is now being waived, and Google is not only giving publishers free access to the scanned files, but also the rights to publish them with other partners. In essence, Google just scanned a huge chunk of the newspaper industry’s valuable long-tail content, and then handed it to the publishers.
This frees newspapers to partner with new institutions to develop new features for their historic archives and to ensure the long-term preservation of materials. For instance, the Library of Congress and NEH’s project, Chronicling America, started before the Google News Archive and is an ongoing program to digitize historical newspapers and ensure long-term free access and preservation for all of its contents. The work already done by Google is a great public benefit, made all the more so by allowing newspapers to partner and repurpose their content without restriction for even more impact.
News on News from DigitalPreservation.gov
News on News from DigitalPreservation.gov:
CRL Report Describes Digital Newspaper Production
May 5, 2011 — Preserving News in the Digital Environment: Mapping the Newspaper Industry in Transition (PDF) was produced for the National Digital Information Infrastructure and Preservation Program by a team from the Center for Research Libraries.This report provides a vivid glimpse inside the workplaces that produce what – not long ago – we would have called newspapers. As digital news-gathering and production methods proliferate, and as digital avenues for distribution emerge, these workplaces are being transformed in profound ways, with electronic facsimiles and websites (and probably more) overtaking the paper format.
The report is an outgrowth of the Preserving Digital News meeting held at the Library in September 2009, and it features illustrative examples from four American newspapers: The Arizona Republic, Seattle Post-Intelligencer (since 2008, seattlepi.com), Wisconsin State Journal, and The Chicago Tribune. There is additional information pertaining to the work of The New York Times, Investor’s Business Daily, and the Associated Press. Altogether, the report makes it clear that the transition to the digital environment is not a neat, throw-the-switch change.
The CRL team of researchers, writers and illustrators included Jessica Alverson, Kalev Leetaru, Victoria McCargar, Kayla Ondracek, Bernard Reilly, James Simon and Eileen Wagner. Their narrative takes us through three major stages in the newspaper workflow: sourcing (gathering news information), editing and production and distribution. Each newspaper applies somewhat different practices in each stage, ranging from the formatting of the content, the types of metadata employed, and the methods applied to manage the content in the information technology systems that support the workflow.
Here are a few highlights:
- Most editorial systems are built around the traditional concept of a news article or story. Following longstanding newsroom practice, most text is maintained in the editorial system as part of a standard unit of content: it is a news item (for wire service reports) or a story, article or feature. (page 25)
- After the articles and other components of a newspaper print edition are assembled and tagged in the editorial system they are usually exported to a pagination system where the page layouts for the print edition and e-facsimile edition are created. It is in the pagination system that most of the content for these editions of a newspaper is brought together for the first time. (page 28)
- Once a locally produced news story is retired from the active or current pages of a newspaper’s website, it is often posted as the “archived” version or “version of record” in a separate part of the Web. Some newspapers outsource maintenance of these archived stories and features to archiving services like NewsBank, NewspaperArchives.com, and ProQuest. These services add value by formatting and indexing the stories and presenting them in searchable databases, which are normally hosted by the archiving service, but made to appear seamlessly connected to the newspaper site. (page 51)
- A computer-assisted analysis of the Chicago Tribune Web site yielded a granular picture of the rate or “velocity” of updates on news web sites . . . [examining] the number of page URLs against minutes of persistence for a two-day period. The analysis showed that in general business, entertainment, and sports news tended to be updated most frequently (sometimes several times within the half hour), while features, opinion, travel, and blog content changed less frequently. Hence the difference between print and electronic versions of newspaper content will vary considerably by type of content. (pages 55-56)
- The newer model of the news Web, however, is exemplified by seattlepi.com, the Hearst Seattle Media’s “flagship site,” [which] focuses heavily on information of local interest, such as crime, regional politics and local sports teams. But seattlepi.com is . . . fundamentally different from its now defunct predecessor, the Seattle Post-Intelligencer newspaper. It features not only original staff reporting and breaking news, but blogs by staff and readers, links to other journalism and news web sites, community databases and photo galleries. Through partnerships with other Seattle media (i.e., radio and television broadcasters), seattlepi.com also has access to video and audio produced by their local staff. (pages 52-53)
News: JTA Archives Online
The news item below is from the newslib list-serve. I’m posting it because it connects to the work being done at the Price Library of Judaica at the University of Florida to build a Newspaper Digital Collection from the Price Library of Judaica. One of the projects is to build the Price Library of Judaica Anniversary Collection, which represents the first stage of a project to digitize a unique and important collection of over 200 anniversary editions of Jewish newspapers held in the Isser and Rae Price Library of Judaica. These jubilee issues have never been catalogued by the Library and until now have remained ‘hidden’ from Library users.
News from the newslib list-serve:
The remarkable collection of JTA news reports from 1923 to the present is now available for free at archive.jta.org. Formerly the Jewish Telegraphic Agency, now JTA: The Global News Service of the Jewish People, the organization is a not-for-profit media company similar to the Associated Press. The tag line is “Writing the first draft of Jewish history”. The archive of original reporting from around the world documents the Jewish experience of the 20th century, much of it not written about in the mainstream media.
There are more than 7,000 contemporaneous articles reported from Europe between 1937-1945 that document the Holocaust on a daily basis, at least that many documenting the experience of Russian Jews throughout entire reign of Communism, coverage of life in then-Palestine before the new state was inaugurated in 1948, and much more.
Cool YouTube video: http://www.youtube.com/watch?v=yB5I5wiL41A&feature=youtu.be
News Preservation Summit
More than 160 U. S. Newspapers have either quit business or stopped publishing a print edition during the past three years. How can we make sure that a community’s history and cultural record does not cease to exist? How can we make sure that digital news products currently being created by online news organizations are preserved and accessible for citizens and scholars in the twenty-second century?
On April 10-12, 2011, a diverse group of stakeholders will meet here at the Reynolds Journalism Institute (University of Missouri, Columbia) to have a conversation about preserving news content. We’re calling it the Newspaper Archive Summit: Rescuing orphaned and digital content.
Conference website: http://www.rjionline.org/events/stories/newpaper-archive/index.php.
Registration is free.
Preserving newspaper collections involves many disciplines. If you’re among any of the following, we hope you’ll join us in this important conversation:
- Stewardship organizations (libraries, museums, digital archives)
- Print and Online News content publishers and organizations
- Experts in news copyright
- Academic and community scholars who depend on news content for their research
- Genealogy community
- Commercial vendors and content aggregators
Panel discussions on Day 1 will include:
- How it is in the public interest to preserve and provide access to news content.
- Copyright and third party vendor issues
- The need for preservation and access of this content from the perspectives of scholars and genealogists
- The needs and concerns of news content creators and publishers
- Successful commercial and non-commercial digitization projects.
Day 2 will bring together diverse groups in developing a plan for creating partnerships and incentives to preserve and provide access to analog and digital news content.
Among the significant recommendations of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (http://brtf.sdsc.edu/) is the creation of public/private partnerships and to define incentives for commercial entities to hand off public interest content to stewardship organizations for preservation. This conference is an important first step in those goals. We will look forward to seeing you in Columbia in April.
Contact
Dorothy Carner
carnerd@missouri.edu
Adjunct Professor, Missouri School of Journalism
Head, Journalism Libraries
University of Missouri-Columbia
102 Reynolds Journalism Institute
Columbia, MO 65211
Phone: 573-882-6591
Fax: 573-884-4963
Web: http://mulibraries.missouri.edu/journalism
The Longtail of News
“The Longtail of News” by the Toronto Star‘s public editor Kathy English is an excellent report for the effects of networked, persistent access to information in respect to the public good and the accuracy of current and archived information.
The Florida Digital Newspaper Library in the News
The Wakulla News from October 22, 2009 has a news story on their archives in the Florida Digital Newspaper Library.
Upton Sinclair’s The Jungle, Syndicated from 1906
While I’ve read Upton Sinclair’s The Jungle, I’d only ever seen it in book format, until now. It’s always nice to find exciting new materials from the UF Digital Collections, but it’s equally wonderful to find amazingly interesting versions of familiar materials, especially with cover page political cartoons like this.


Chronicling America Adds Topics
Chronicling America, the amazing historical newspaper digital collection from the Library of Congress and NEH, has added “Topics“. With over a million pages of historical newspapers online, “Topics” are an essential need–helping users who aren’t sure what they’re looking for find a way into so much content and helping to showcase some of the highlights of so much great content for all users.
Some of the topics that include Florida content (as the Interim Director for the University of Florida Digital Library Center, which supports the Florida Digital Newspaper Library, the ones with Florida content are those of greatest interest to me):
Newspaper Archives
The American Historical Association has a recent blog post over the problems caused by the lack of access to certain newspapers during transition from “Paper of Record” to Google’s news archives. The blog post notes:
Regrettably, this proves yet again Roy Rosenzweig’s warning to the profession six years ago about the “the fragility of evidence in the digital era.” While it may be beyond our capacity to adjust copyright laws and the behavior of large corporations (however well meaning), as a profession we can and perhaps should develop new habits for working with digital materials—by copying down information when we see it online, and not becoming overly dependent on any one data source or having illusions about its permanence.
Seeing the problems from the Paper of Record transitioning to Google as a call to “develop new habits for working with digital materials—by copying down information when we see it online, and not becoming overly dependent on any one data source or having illusions about its permanence,” is essentially a call to develop personal copies of existing archives and it’s a poor solution to the larger problem.*
In this particular instance, there are several concerns related to technology, trust, and the public good. For technology, the transition is a normal instance of downtime (which is still normal for any technology related transition, and its normalcy is why so many of the tech folks were amazed at the speed and elegance of the most recent Whitehouse.gov transition that overcame the normal problems). However, technical issues are a parallel to the very real potential for loss if digital records are not supported and the very real problem of lost access if digital records are not supported as a need for the public good. One of the respondents to the blog post notes that perhaps newspapers should be moved into the public domain, which is a concern because copyright is often an obstacle to access, but even papers in the public domain still need financial support to ensure access to them whether in digital or physical form.
Even after covering the initial costs for requesting permissions, digitization, and hosting, new costs emerge. For instance, the University of Florida Digital Collections (UFDC) has grown by leaps and bounds in the past two years and now has over 664,269 pages of Florida newspapers alone. These newspapers include historic newspapers and current newspapers. The Digital Library Center has successfully requested and received permissions to digitize over 60 current newspapers, newspapers that in many cases were microfilmed and that are now being digitized for online access and longterm preservation (and we’re also slowly digitizing earlier years from the microfilm and will continue to do so until all of the microfilm holdings are digital).
All of the collections in UFDC, including the Florida Digital Newspaper Library, continue to grow and that growth encourages a growth in usage that, in turn, requires UFDC have more resources to support the higher usage rates. In March 2009, UFDC had 618,148 unique hits and that many hits along with the knowledge that the hits are only going to increase means that the UF Libraries have to implement additional programming to ensure the server memory usage can handle the increased load without problems for users. Other digital collections will have similar needs as they grow, and that will require support from users and the public.
Rather than attempting to copy existing resources (which would reduce the resource to a single item photocopy instead of a point within the full context and content of the database), the emphasis should be on building and supporting trusted digital archives to ensure access. The Florida Digital Newspaper Library presents one of many models, housing historic and current newspapers for open online access for all in perpetuity (and it was luck enough to build the digital model from that same model for microfilm, allowing it to utilize the existing support infrastructure that was already available). Many archives already offer the same promises for access in perpetuity, albeit for physical access to items not yet digital, and those archives will need support to ensure they place the same importance on access and preservation for their digital collections.
Digital collections and archives need support for new and existing digital collections to build and sustain the infrastructure needed to ensure open access in perpetuity. As La Asociación Mexicana de Historia Económica (AMHE) explains in their protest to the lack of access to Mexican newspapers, the newspapers on Paper of Record are essential reference materials for research. The removal of access–even if only a delay for technical reasons–does harm. The public needs to have trust in their archival institutions, and ensuring access to physical and digital archives is a necessity to build and maintain that trust.
*{Copying single items or even attempting to copy masses of materials without infrastructure is still like photocopying. The materials would not be structured (or minimally so) and would not benefit from organization and identification. If a physical archive was in danger and photocopying was the only option, then photocopying the resource makes sense. This is not to say that photocopying is a bad solution in all cases–researches regularly photocopy materials from archives and those photocopies are then copied and shared and, in some cases, those are the only available copies for access. Photocopying is a poor solution to the overall problem, but for researchers who need access to the materials right now and who cannot wait for a new trusted archive to built over years of advocacy and funding, photocopying style solutions are wise temporary options. Internet Archive’s Wayback Machine maintains copies of many web sites and pages for just this reason.}
