Friday, December 14, 2007

Digital Preservation Matters - 14 December 2007

CNI in DC: Integrated Digital Library on the Fedora Platform. David Kennedy. December 12, 2007.
This is one item in a blog report of the CNI conference and the Digital Curation Conference: National Perspectives conference. It is worth reading the others also. University of Maryland uses Fedora not for the IR (they use DSpace), but for the digital collections. They wanted to use it to build in sustainability and transitions. Some of their organizational issues were institutional support, development time, off the shelf vs. Fedora-type system, and others. It took almost 18 months of development. They found working with Fedora similar to java, and "programmer friendly." They use a hybrid metadata schema with METS wrappers. What have they learned?
  • metadata - uses a complex schema, but don't force users to understand the underlying schema
  • authentication - not dealt with yet, but need to do more work
  • archival storage - greater need for more space
  • need to have Quality Control standards when modifying objects and creating metadata

They have at least three or four developers working on the project, as well as a number of other team members. Since they use their own metadata scheme, it may not be possible to offer their work to others, so if they were to do it again, they may use a standard metadata schema.


New 1 day AIIM PDF/Archive Training Program. Atle Skjekkeland. AIIM Knowledge Center Blog. December 12, 2007.
The AIIM organization intends to introduce a new PDF/A training program next year. It will be focused on the use of PDF/A and its use as a file format in the archiving of data. The concept of PDF/Archive began as an AIIM standards committee in 2002 and has been accepted as an ISO standard.


Digital Preservation Pioneers: Margaret Hedstrom. Resource Shelf. December 13, 2007.
A brief bio about Margaret Hedstrom who has done a great deal for digital preservation. Her works include several articles that are definitely worth reading: Digital Preservation: A Time Bomb for Digital Libraries, It’s About Time, Invest to Save, and Incentives for Data Producers to Create Archive-Ready Data Sets.


Pooling Scholars’ Digital Resources. Andy Guess. Inside Higher Ed. December 12, 2007.
Access to documents and copyright issues have been two factors slowing the development of online scholarly repositories. George Mason University seeks to bypass libraries entirely and go directly to scholars by creating an open archive of scholarly resources in the public domain. They are creating a way for scholars to upload existing documents, make them text –searchable, and put them in a database available to the public. It will use the Zotero plug-in for Firebox, which stores web pages, collects citations and lets scholars annotate and organize online documents. It is funded by a two year Mellon grant.


Manakin: A New Face for DSpace. Scott Phillips et al. D-Lib Magazine. November/December 2007.
The increasing online scholarly communication makes digital repositories more important for preserving and managing information. This looks at Manakin which was designed to help create individual, customized repository interfaces separate from the underlying repository, which is currently DSpace. It helps a library ‘brand’ its content, better understanding of the metadata, and provides tools to create extensions of the repository. It uses schema, aspects and themes as the basic components. There is a movement to adopt Manakin as the default DSpace user interface.


SOA. IT Strategy Guide. Dave Linthicum. InforWorld. December 10, 2007. [pdf]
The essence of an organization must be identified so all activities influencing that can be identified and improved. This is the first step in realizing the benefits of a service-oriented architecture (SOA). This requires not only technology, but also a shift in the way business and IT work together. Organizations need to adopt clearly defined roles within an organization, allowing the stakeholders to understand each other’s goals and tasks. This includes understanding both the human aspects and the lifecycle management of the services. Management support for the strategy is crucial. This requires an investment in people and technology to establish the appropriate context for the strategy. “the hardest part isn’t the technology; it’s redrawing the business processes that provide the basis for the architecture — and the often contentious reshuffling of roles and responsibilities that ensues. It is important to define the value, get investment and commitment from the top, and concentrate on the long term.”


Census of Institutional Repositories in the U.S. Soo Young Rieh, et al. D-Lib Magazine. November/December 2007.
There are great uncertainties underlying institutional repositories regarding practices, policies, content, systems, and other infrastructure issues. This article looks at IR’s in five areas: leaders, funding, content, contributors, and systems, and how they are perceived. Some notes:

  • college and university libraries are the driving force behind most IRs,
  • vast majority of survey respondents have done no planning of IRs to date
  • only 10.8% respondents have actually implemented an IR
    • 52.1% have been operational less than one year,
    • 27.1% have been operational between one and two years,
  • respondents agree that the funding comes or will come from the library, typically by absorbing costs into routine library operating expenses
  • Majority of existing IRs contain fewer than 1000 items
  • DSpace is the most prevalent system for pilot-testing and use. Fedora and ContentDM are regularly pilot-tested but rarely implemented.

“Once each academic institution has a clear vision and definition of what the IR will be for its own community, subsequent decisions such as content recruitment, software redesigning, file formats guaranteed in perpetuity, metadata, and policies can flow from that vision.”

Digital Preservation Matters - 14 December 2007

CNI in DC: Integrated Digital Library on the Fedora Platform. David Kennedy. December 12, 2007.
This is one item in a blog report of the CNI conference and the Digital Curation Conference: National Perspectives conference. It is worth reading the others also. University of Maryland uses Fedora not for the IR (they use DSpace), but for the digital collections. They wanted to use it to build in sustainability and transitions. Some of their organizational issues were institutional support, development time, off the shelf vs. Fedora-type system, and others. It took almost 18 months of development. They found working with Fedora similar to java, and "programmer friendly." They use a hybrid metadata schema with METS wrappers. What have they learned?
  • metadata - uses a complex schema, but don't force users to understand the underlying schema
  • authentication - not dealt with yet, but need to do more work
  • archival storage - greater need for more space
  • need to have Quality Control standards when modifying objects and creating metadata

They have at least three or four developers working on the project, as well as a number of other team members. Since they use their own metadata scheme, it may not be possible to offer their work to others, so if they were to do it again, they may use a standard metadata schema.


New 1 day AIIM PDF/Archive Training Program. Atle Skjekkeland. AIIM Knowledge Center Blog. December 12, 2007.
The AIIM organization intends to introduce a new PDF/A training program next year. It will be focused on the use of PDF/A and its use as a file format in the archiving of data. The concept of PDF/Archive began as an AIIM standards committee in 2002 and has been accepted as an ISO standard.


Digital Preservation Pioneers: Margaret Hedstrom. Resource Shelf. December 13, 2007.
A brief bio about Margaret Hedstrom who has done a great deal for digital preservation. Her works include several articles that are definitely worth reading: Digital Preservation: A Time Bomb for Digital Libraries, It’s About Time, Invest to Save, and Incentives for Data Producers to Create Archive-Ready Data Sets.


Pooling Scholars’ Digital Resources. Andy Guess. Inside Higher Ed. December 12, 2007.
Access to documents and copyright issues have been two factors slowing the development of online scholarly repositories. George Mason University seeks to bypass libraries entirely and go directly to scholars by creating an open archive of scholarly resources in the public domain. They are creating a way for scholars to upload existing documents, make them text –searchable, and put them in a database available to the public. It will use the Zotero plug-in for Firebox, which stores web pages, collects citations and lets scholars annotate and organize online documents. It is funded by a two year Mellon grant.


Manakin: A New Face for DSpace. Scott Phillips et al. D-Lib Magazine. November/December 2007.
The increasing online scholarly communication makes digital repositories more important for preserving and managing information. This looks at Manakin which was designed to help create individual, customized repository interfaces separate from the underlying repository, which is currently DSpace. It helps a library ‘brand’ its content, better understanding of the metadata, and provides tools to create extensions of the repository. It uses schema, aspects and themes as the basic components. There is a movement to adopt Manakin as the default DSpace user interface.


SOA. IT Strategy Guide. Dave Linthicum. InforWorld. December 10, 2007. [pdf]
The essence of an organization must be identified so all activities influencing that can be identified and improved. This is the first step in realizing the benefits of a service-oriented architecture (SOA). This requires not only technology, but also a shift in the way business and IT work together. Organizations need to adopt clearly defined roles within an organization, allowing the stakeholders to understand each other’s goals and tasks. This includes understanding both the human aspects and the lifecycle management of the services. Management support for the strategy is crucial. This requires an investment in people and technology to establish the appropriate context for the strategy. “the hardest part isn’t the technology; it’s redrawing the business processes that provide the basis for the architecture — and the often contentious reshuffling of roles and responsibilities that ensues. It is important to define the value, get investment and commitment from the top, and concentrate on the long term.”


Census of Institutional Repositories in the U.S. Soo Young Rieh, et al. D-Lib Magazine. November/December 2007.
There are great uncertainties underlying institutional repositories regarding practices, policies, content, systems, and other infrastructure issues. This article looks at IR’s in five areas: leaders, funding, content, contributors, and systems, and how they are perceived. Some notes:

  • college and university libraries are the driving force behind most IRs,
  • vast majority of survey respondents have done no planning of IRs to date
  • only 10.8% respondents have actually implemented an IR
    • 52.1% have been operational less than one year,
    • 27.1% have been operational between one and two years,
  • respondents agree that the funding comes or will come from the library, typically by absorbing costs into routine library operating expenses
  • Majority of existing IRs contain fewer than 1000 items
  • DSpace is the most prevalent system for pilot-testing and use. Fedora and ContentDM are regularly pilot-tested but rarely implemented.

“Once each academic institution has a clear vision and definition of what the IR will be for its own community, subsequent decisions such as content recruitment, software redesigning, file formats guaranteed in perpetuity, metadata, and policies can flow from that vision.”

Friday, December 7, 2007

Digital Preservation Matters - 07 December 2007

Ten years after. Priscilla Caplan. Library Hi Tech. Editorial. Vol. 25 N. 4 2007.

This editorial from Priscilla reflects on the progress made in digital preservation in the past 10 years. Digital preservation in no longer a little known concept, but a problem to be solved. It is part of the mainstream. Much has been accomplished, though there is still a lot of progress to be made. Europe has a different approach; it sees this as “part of a set of curation activities.” Their approach would “help reduce our apparent confusion between institutional repositories and preservation repositories.” Few institutions will have the resources to run a true preservation repository. “Digital curation may be departmental, and archiving institutional, but I believe preservation will have to be consortial.” The US approach has been to focus on short term projects rather than long term infrastructure. There are still some basic infrastructure needs: schema, conversion utilities, and registries. We also need to develop centers to promote and assist digital preservation. We need to provide more education for both data creators and data curators.


Standards Group Accepts PDF. Sumner Lemon. IDG News Service. December 05, 2007.

Adobe PDF 1.7 has been approved as an ISO standard. The ballot for approval of PDF 1.7 to become the ISO 32000 Standard was passed by a vote of

13-1. Specialized subsets of PDF (PDF/Archive etc) had been proposed or approved as standards by ISO. The approval of PDF 1.7 is now an "umbrella" standard to unify these different subsets. Adobe gives up some control over the development of future versions.


Project SPECTRa: JISC Final Report. March 2007.

The principal aim of the SPECTRa project (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) was to provide the high-volume ingest and reuse of experimental data through institutional repositories. It used the DSpace platform because of existing infrastructure and previous experience. They developed Open Source software tools and customizations which could easily be incorporated within chemists' workflows. Metadata was based on Dublin Core. They felt that serious preservation work must be at the institutional, rather than departmental, level. The metadata, identifiers, and normalizing data in open formats would make long-term preservation more possible. Preservation of chemistry data file formats is a difficult area. Their approach was to capture essential metadata at submission or extract it automatically from the data files if possible. All files should be validated against specifications. Depositing files in an institutional repository should guarantee against the loss or corruption of the raw data, but this is insufficient to ensure future usability. A policy of format migration will be necessary for much of the data.

Other project's findings included:

• it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organizational capability of digital repositories;

• scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;

• the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;

• institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;

• IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.


Google Plans Service to Store Users' Data. Kevin J. Delaney. Wall Street Journal. November 27, 2007.

Google is developing a service to let users store contents of their computers, such as word-processing documents, digital music, video clips and images. It would let users access their files via the Internet from different computers and share them online with friends. The service would face questions on issues such as data privacy, copyright, cost, and technical challenges of offering service without interruption.


Iron Mountain Acquires Xepa Digital, LLP. Press Release. November 19, 2007.

Iron Mountain acquired Xepa, a company that deals with converting analog and out of date digital audio and video to high resolution digital file formats. They will offer on-site digital conversion for the items being stored.


Digital Preservation Matters - 07 December 2007

Ten years after. Priscilla Caplan. Library Hi Tech. Editorial. Vol. 25 N. 4 2007.

This editorial from Priscilla reflects on the progress made in digital preservation in the past 10 years. Digital preservation in no longer a little known concept, but a problem to be solved. It is part of the mainstream. Much has been accomplished, though there is still a lot of progress to be made. Europe has a different approach; it sees this as “part of a set of curation activities.” Their approach would “help reduce our apparent confusion between institutional repositories and preservation repositories.” Few institutions will have the resources to run a true preservation repository. “Digital curation may be departmental, and archiving institutional, but I believe preservation will have to be consortial.” The US approach has been to focus on short term projects rather than long term infrastructure. There are still some basic infrastructure needs: schema, conversion utilities, and registries. We also need to develop centers to promote and assist digital preservation. We need to provide more education for both data creators and data curators.


Standards Group Accepts PDF. Sumner Lemon. IDG News Service. December 05, 2007.

Adobe PDF 1.7 has been approved as an ISO standard. The ballot for approval of PDF 1.7 to become the ISO 32000 Standard was passed by a vote of

13-1. Specialized subsets of PDF (PDF/Archive etc) had been proposed or approved as standards by ISO. The approval of PDF 1.7 is now an "umbrella" standard to unify these different subsets. Adobe gives up some control over the development of future versions.


Project SPECTRa: JISC Final Report. March 2007.

The principal aim of the SPECTRa project (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) was to provide the high-volume ingest and reuse of experimental data through institutional repositories. It used the DSpace platform because of existing infrastructure and previous experience. They developed Open Source software tools and customizations which could easily be incorporated within chemists' workflows. Metadata was based on Dublin Core. They felt that serious preservation work must be at the institutional, rather than departmental, level. The metadata, identifiers, and normalizing data in open formats would make long-term preservation more possible. Preservation of chemistry data file formats is a difficult area. Their approach was to capture essential metadata at submission or extract it automatically from the data files if possible. All files should be validated against specifications. Depositing files in an institutional repository should guarantee against the loss or corruption of the raw data, but this is insufficient to ensure future usability. A policy of format migration will be necessary for much of the data.

Other project's findings included:

• it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organizational capability of digital repositories;

• scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;

• the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;

• institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;

• IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.


Google Plans Service to Store Users' Data. Kevin J. Delaney. Wall Street Journal. November 27, 2007.

Google is developing a service to let users store contents of their computers, such as word-processing documents, digital music, video clips and images. It would let users access their files via the Internet from different computers and share them online with friends. The service would face questions on issues such as data privacy, copyright, cost, and technical challenges of offering service without interruption.


Iron Mountain Acquires Xepa Digital, LLP. Press Release. November 19, 2007.

Iron Mountain acquired Xepa, a company that deals with converting analog and out of date digital audio and video to high resolution digital file formats. They will offer on-site digital conversion for the items being stored.


Saturday, December 1, 2007

IT Disasters

The top 10 IT disasters of all time. Colin Barker ZDNet.co.uk. 22 Nov 2007.

A list of some of the worst IT-related disasters and failures caused by faulty hardware and software or human error.

  1. Faulty Soviet early warning system nearly causes WWIII (1983)
  2. The AT&T network collapse (1990)
  3. The explosion of the Ariane 5 (1996)
  4. Airbus A380 suffers from incompatible software issues (2006)
  5. Mars Climate Observer metric problem (1998)
  6. EDS and the Child Support Agency (2004)
  7. The two-digit year-2000 problem (1999/2000)
  8. When the laptops exploded (2006)
  9. Siemens and the passport system (1999)
  10. LA Airport flights grounded (2007)

IT Disasters

The top 10 IT disasters of all time. Colin Barker ZDNet.co.uk. 22 Nov 2007.

A list of some of the worst IT-related disasters and failures caused by faulty hardware and software or human error.

  1. Faulty Soviet early warning system nearly causes WWIII (1983)
  2. The AT&T network collapse (1990)
  3. The explosion of the Ariane 5 (1996)
  4. Airbus A380 suffers from incompatible software issues (2006)
  5. Mars Climate Observer metric problem (1998)
  6. EDS and the Child Support Agency (2004)
  7. The two-digit year-2000 problem (1999/2000)
  8. When the laptops exploded (2006)
  9. Siemens and the passport system (1999)
  10. LA Airport flights grounded (2007)

Friday, November 30, 2007

Digital Preservation Matters - 30 November 2007

Council Conclusions on scientific information in the digital age: access, dissemination and preservation. The Council Of The European Union. November 2007.

The Council of the European Union presents some conclusions regarding digital preservation and recommendations during the next few years:

  • access to and dissemination of scientific information is crucial and can help accelerate innovation;
  • effective digital preservation of scientific information is fundamental for current and future development of research
  • it is important to ensure the long term preservation of scientific information, publications and data, and include scientific information in preservation strategies;
  • monitor good practices for open access to scientific information and development new models
  • experiment with open access to scientific data and publications to understand contractual needs
  • encourage research and experiments into digital preservation on deploying scientific data as widely as possible for open access to and preservation of scientific information.


Shifting Gears: Gearing Up to Get Into the Flow. Ricky Erwayr. OCLC. October 2007.

Efforts to digital special collections mean we need to re-look at what we are doing. Do we digitize for access or preservation, or both. How do our selection criteria affect the digitizing efforts. Access is important. We should preserve the unique items to the best of our ability, but it doesn’t mean we only have once chance to do it right. We may want to re-digitize when the technology improves. Scan items as part of the initial accessioning process; create a single unified process. Metadata can be improved as needed; it can be an iterative approach. Move to a program approach, not just special projects. It should be part of the regular budget. To do a better job we need to “integrate digitization into all workflows and user services”.


Digital library surpasses initial goal of 1 million books. International Herald Tribune. November 27, 2007.

The Universal Library project has surpassed its latest target, having scanned more than 1.5 million books. At least half the books are out of copyright or scanned with the permission of copyright holders. The library's mission is to make information freely available and to preserve rare and decaying texts. It is the largest university-based digital library of free books and its purpose is noncommercial. The library has books published in 20 languages, including 970,000 in Chinese, 360,000 in English, 50,000 in the southern Indian language of Telugu and 40,000 in Arabic.


Presentations from iPRES - 2007 International Conference on Preservation of Digital Objects. National Science Library . November 2007.

This site contains many pdf files of the presentations given at the October iPres conference in China. These are interesting to review. Some that I found particularly useful include:

  • Exploring and Charting the Digital Preservation Research Landscape, Seamus Ross
  • Chinese Digital Archival Network of Foreign STM Material, Xiaolin Zhang
  • A Practical Approach to Digital Preservation: Update from PLANETS, Helen Hockx-Yu
  • Challenges of Digital Preservation: Early Lessons from the Portico Archive, Eileen Fenton
  • Developing a CAS E-Journal Archiving System, Zhixiong Zhang
  • Comparative Evaluation of Major IR Systems for Preservation, Ting Zeng
  • New Partnerships for Scientific Data Preservation and Publication Systems, Zhongming Zhu


Towards the Australian Data Commons: A proposal for an Australian National Data Service. The ANDS Technical Working Group. October 2007.

This paper, among other topics, discusses the reasons to focus on data management, the issues, and the programs to deliver the data. While the paper looks specifically at a national data service, there are aspects that are useful for local digital preservation. Here are some interesting notes from it.

  • Important activities include identifying and deploying policies and technologies to allow users to gain seamless access to data collected within multiple institutionally operated repositories.
  • The intent is to provide common services to support research to make it easier to discover, access, use, analyze, and combine digital resources as part of their activities. They should also support and advise researchers and research data managers about appropriate digital preservation strategies.
  • We are in a data deluge. It can only continue and grow in intensity as the number, frequency and resolution of data sources rises and as information becomes universally ‘born digital’.
  • Data is an increasingly important and expensive ingredient of research activities and needs increasing attention to be managed efficiently and effectively.
  • The sponsors of data capture and care should help determine the accessibility of the data
  • Not everyone can use the same solution, so there may need to be multiple responses.
  • There should be a registry of repositories with services offered
  • Provide assistance to others on adopting the plans and getting the service they need.
  • Collecting and managing the metadata is critical. Best to collect early and automatically.

The data service believes it can contribute most effectively by developing services and activities that enable stewardship within multiple federations of data management and data user communities.

In ten years time, it will be successful if:

  • A data commons exists in a network of research repositories and the data is discoverable;
  • Researchers and data managers perform well with well formed data management policies;
  • More research data is routinely deposited into stable, accessible and sustainable environments;
  • More people have relevant expertise in data management


Stewardship of digital resources involves both preservation and curation. Preservation entails standards-based, active management practices that guide data throughout the research life cycle, as well as ensure the long-term usability of these digital resources. Curation involves ways of organizing, displaying, and repurposing preserved data.


Digital Preservation Matters - 30 November 2007

Council Conclusions on scientific information in the digital age: access, dissemination and preservation. The Council Of The European Union. November 2007.

The Council of the European Union presents some conclusions regarding digital preservation and recommendations during the next few years:

  • access to and dissemination of scientific information is crucial and can help accelerate innovation;
  • effective digital preservation of scientific information is fundamental for current and future development of research
  • it is important to ensure the long term preservation of scientific information, publications and data, and include scientific information in preservation strategies;
  • monitor good practices for open access to scientific information and development new models
  • experiment with open access to scientific data and publications to understand contractual needs
  • encourage research and experiments into digital preservation on deploying scientific data as widely as possible for open access to and preservation of scientific information.


Shifting Gears: Gearing Up to Get Into the Flow. Ricky Erwayr. OCLC. October 2007.

Efforts to digital special collections mean we need to re-look at what we are doing. Do we digitize for access or preservation, or both. How do our selection criteria affect the digitizing efforts. Access is important. We should preserve the unique items to the best of our ability, but it doesn’t mean we only have once chance to do it right. We may want to re-digitize when the technology improves. Scan items as part of the initial accessioning process; create a single unified process. Metadata can be improved as needed; it can be an iterative approach. Move to a program approach, not just special projects. It should be part of the regular budget. To do a better job we need to “integrate digitization into all workflows and user services”.


Digital library surpasses initial goal of 1 million books. International Herald Tribune. November 27, 2007.

The Universal Library project has surpassed its latest target, having scanned more than 1.5 million books. At least half the books are out of copyright or scanned with the permission of copyright holders. The library's mission is to make information freely available and to preserve rare and decaying texts. It is the largest university-based digital library of free books and its purpose is noncommercial. The library has books published in 20 languages, including 970,000 in Chinese, 360,000 in English, 50,000 in the southern Indian language of Telugu and 40,000 in Arabic.


Presentations from iPRES - 2007 International Conference on Preservation of Digital Objects. National Science Library . November 2007.

This site contains many pdf files of the presentations given at the October iPres conference in China. These are interesting to review. Some that I found particularly useful include:

  • Exploring and Charting the Digital Preservation Research Landscape, Seamus Ross
  • Chinese Digital Archival Network of Foreign STM Material, Xiaolin Zhang
  • A Practical Approach to Digital Preservation: Update from PLANETS, Helen Hockx-Yu
  • Challenges of Digital Preservation: Early Lessons from the Portico Archive, Eileen Fenton
  • Developing a CAS E-Journal Archiving System, Zhixiong Zhang
  • Comparative Evaluation of Major IR Systems for Preservation, Ting Zeng
  • New Partnerships for Scientific Data Preservation and Publication Systems, Zhongming Zhu


Towards the Australian Data Commons: A proposal for an Australian National Data Service. The ANDS Technical Working Group. October 2007.

This paper, among other topics, discusses the reasons to focus on data management, the issues, and the programs to deliver the data. While the paper looks specifically at a national data service, there are aspects that are useful for local digital preservation. Here are some interesting notes from it.

  • Important activities include identifying and deploying policies and technologies to allow users to gain seamless access to data collected within multiple institutionally operated repositories.
  • The intent is to provide common services to support research to make it easier to discover, access, use, analyze, and combine digital resources as part of their activities. They should also support and advise researchers and research data managers about appropriate digital preservation strategies.
  • We are in a data deluge. It can only continue and grow in intensity as the number, frequency and resolution of data sources rises and as information becomes universally ‘born digital’.
  • Data is an increasingly important and expensive ingredient of research activities and needs increasing attention to be managed efficiently and effectively.
  • The sponsors of data capture and care should help determine the accessibility of the data
  • Not everyone can use the same solution, so there may need to be multiple responses.
  • There should be a registry of repositories with services offered
  • Provide assistance to others on adopting the plans and getting the service they need.
  • Collecting and managing the metadata is critical. Best to collect early and automatically.

The data service believes it can contribute most effectively by developing services and activities that enable stewardship within multiple federations of data management and data user communities.

In ten years time, it will be successful if:

  • A data commons exists in a network of research repositories and the data is discoverable;
  • Researchers and data managers perform well with well formed data management policies;
  • More research data is routinely deposited into stable, accessible and sustainable environments;
  • More people have relevant expertise in data management


Stewardship of digital resources involves both preservation and curation. Preservation entails standards-based, active management practices that guide data throughout the research life cycle, as well as ensure the long-term usability of these digital resources. Curation involves ways of organizing, displaying, and repurposing preserved data.


Friday, November 16, 2007

Digital Preservation Matters - 16 November 2007

Electronic Records Management and Digital Preservation: Protecting the Knowledge Assets of the State Government Enterprise. Eric Sweden. NASCIO. October 2007. [pdf]

Electronic records management and digital preservation must be a shared responsibility, including understanding and support, from the CIO. Everyone needs to be part of managing digital assets. These initiatives must be managed on the organizational level. The team needs enterprise architects, project managers, electronic records managers, librarians and archivists to ensure the knowledge assets are managed properly. Technology create both opportunities and challenges. The goal of Digital Preservation systems is to make sure the information they contain remains accessible to users over a long period of time. A challenge is to keep bit streams intact and usable long term. You need to know what to preserve and how to preserve the records. The strategy must address preservation for the life of the record. There is not a single best way to preserve digital materials. Digital materials do not allow preservation procrastination. If a record needs to be maintained for over 10 years, the original technology will probably be obsolete. Digital Preservation must be a routine operation, not a special event.


RSA 2007: long-term data storage presents legal risks. Ian Grant. Computer Weekly. 23 Oct 2007.

Art Coviello, executive vice-president of EMC, stated at a conference that storing every piece of data long term may place organizations at risk of legal liability. The organization needs to know what data they have, who is looking at it and what they are doing with it. They should classify data and users before they store data. This is needed to protect the data and to reduce information clutter.


Keep 'Smoking Gun' E-Mails From Backfiring. H. Christopher Boehning, Daniel J. Toal. New York Law Journal. October 25, 2007

While this is written from a legal and not archival perspective, the article discusses the importance of validating / authenticating electronic documents. It lists the legal rules for authenticating emails and other electronic documents, including:

  • testimony by a witness with knowledge of the object;
  • circumstantial means ("appearance, contents, substance, internal patterns or other distinctive characteristics, taken in conjunction with circumstances," such as the email address;
  • hash values that serve as a digital fingerprint; comparison to existing documents;
  • self authentication of items with labels, tags, or ownership marks.


The Aftermath: Examining the E-Discovery Landscape After the 2006 Rule Changes. Eric Sinrod. FindLaw. October 16, 2007.

Another article emphasizing the importance of records management plans for electronic data. It mentions that “Data can be located live on networks, servers, hard drives, laptops, PDAs and on backup tapes.” Purging according to retention policies is important. Data may be required in ‘native’ format with all metadata intact.


‘Digital curators’ lead cultural IT projects. Shane Schick. ComputerWorld Canada. 8 Nov 2007.

As cultural organizations try to reach new audiences online and integrate their collections into multimedia-friendly exhibits, they are starting to face the same challenges as others who have been moving away from paper-based processes. These challenges include not only figuring how to digitize content but what gets preserved first, what can wait and what doesn’t need to be digitized at all. Institutions face the difficulty of trying to preserve something indefinitely, without knowing how formats might change over time. They must collecting the right hardware and software along with the content itself. “Archives are now building in budgets for migration strategies for data.”


Friendly Advice Machine. John Cleese. Iron Mountain. October 2007.

On the lighter side: For those with an interest in digital archiving and secure storage, and a ‘British’ sense of humor, these clips may be of interest.



Digital Preservation Matters - 16 November 2007

Electronic Records Management and Digital Preservation: Protecting the Knowledge Assets of the State Government Enterprise. Eric Sweden. NASCIO. October 2007. [pdf]

Electronic records management and digital preservation must be a shared responsibility, including understanding and support, from the CIO. Everyone needs to be part of managing digital assets. These initiatives must be managed on the organizational level. The team needs enterprise architects, project managers, electronic records managers, librarians and archivists to ensure the knowledge assets are managed properly. Technology create both opportunities and challenges. The goal of Digital Preservation systems is to make sure the information they contain remains accessible to users over a long period of time. A challenge is to keep bit streams intact and usable long term. You need to know what to preserve and how to preserve the records. The strategy must address preservation for the life of the record. There is not a single best way to preserve digital materials. Digital materials do not allow preservation procrastination. If a record needs to be maintained for over 10 years, the original technology will probably be obsolete. Digital Preservation must be a routine operation, not a special event.


RSA 2007: long-term data storage presents legal risks. Ian Grant. Computer Weekly. 23 Oct 2007.

Art Coviello, executive vice-president of EMC, stated at a conference that storing every piece of data long term may place organizations at risk of legal liability. The organization needs to know what data they have, who is looking at it and what they are doing with it. They should classify data and users before they store data. This is needed to protect the data and to reduce information clutter.


Keep 'Smoking Gun' E-Mails From Backfiring. H. Christopher Boehning, Daniel J. Toal. New York Law Journal. October 25, 2007

While this is written from a legal and not archival perspective, the article discusses the importance of validating / authenticating electronic documents. It lists the legal rules for authenticating emails and other electronic documents, including:

  • testimony by a witness with knowledge of the object;
  • circumstantial means ("appearance, contents, substance, internal patterns or other distinctive characteristics, taken in conjunction with circumstances," such as the email address;
  • hash values that serve as a digital fingerprint; comparison to existing documents;
  • self authentication of items with labels, tags, or ownership marks.


The Aftermath: Examining the E-Discovery Landscape After the 2006 Rule Changes. Eric Sinrod. FindLaw. October 16, 2007.

Another article emphasizing the importance of records management plans for electronic data. It mentions that “Data can be located live on networks, servers, hard drives, laptops, PDAs and on backup tapes.” Purging according to retention policies is important. Data may be required in ‘native’ format with all metadata intact.


‘Digital curators’ lead cultural IT projects. Shane Schick. ComputerWorld Canada. 8 Nov 2007.

As cultural organizations try to reach new audiences online and integrate their collections into multimedia-friendly exhibits, they are starting to face the same challenges as others who have been moving away from paper-based processes. These challenges include not only figuring how to digitize content but what gets preserved first, what can wait and what doesn’t need to be digitized at all. Institutions face the difficulty of trying to preserve something indefinitely, without knowing how formats might change over time. They must collecting the right hardware and software along with the content itself. “Archives are now building in budgets for migration strategies for data.”


Friendly Advice Machine. John Cleese. Iron Mountain. October 2007.

On the lighter side: For those with an interest in digital archiving and secure storage, and a ‘British’ sense of humor, these clips may be of interest.



Friday, November 9, 2007

Weekly Readings - 9 November 2007

HD Photo to become JPEG XR. Stephen Shankland. CNet News. November 2, 2007.

The Joint Photographic Experts Group has approved Microsoft's HD Photo format as a standard called JPEG XR. This is an important step to make the format neutral. It is designed for the next generation of digital cameras and was based on Microsoft’s Windows Media Format. Microsoft is committed to make the patents available without charge. The standardization process typically takes about a year. (See also http://www.jpeg.org/newsrel19.html).


PRONOM and DROID - new versions released. Neil Beagrie. National Archives UK. November 2, 2007.

The National Archives in the UK has released new versions of PRONOM and DROID. PRONOM is an online registry of file formats, software, and other technical information used for digital preservation purposes, available at http://www.nationalarchives.gov.uk/pronom. DROID (Digital Record Object Identification) is open source software at http://droid.sourceforge.net/ that is used to identify file formats in batch mode. They are freely available.


LOCKSS: Does a Library Good? An Investigation into the Implementation of LOCKSS. Caitlin Hoffman. Blog. November 8, 2007.

An overview of LOCKSS, how it works, and issues related to it. (LOCKSS, developed at Stanford University, stands for Lots of Copies Keeps Stuff Safe.) One of the main issues surrounding it is the issue of trust. “Trusting a single provider, a single institution, and a single archive represents the real risk”. LOCKSS is built on the principle of building confidence in the archive. LOCKSS was built to archive electronic journals but has been enhanced to also archive blogs on Google’s Blogger.


Looking Ahead. Lee J. Nelson. Advanced Imaging Magazine. November 9, 2007.

The article looks at some of the industry trends. Included is an announcement on an HD Photo Plug-in for Adobe Photoshop. “HD Photo is geared for end-to-end digital photography, offering better image quality, greater preservation of data and advanced features. Its still image codec for continuous-tone images is underpinned by lossy and lossless compression, multiple colorspaces, wide dynamic range and extensive metadata.”


Government Pledges £25m To Preserve Uk's Film Archives. 24 Hour Museum. October 17, 2007.

The British government has taken steps to preserve the country’s film archives. They have given money to the UK Film Council to secure the films in the archives. “It’s absolutely right that they should be safe and accessible for future generations.” The £25million plus £3million are to be used to preserve, restore and increase access to the collections, some of which are deteriorating and in danger of being lost.


Library of Congress Collaborates with Xerox To Test Format for Digitally Preserving, Accessing Treasured Images. News Release. Library of Congress. October 25, 2007.

The Library and Xerox are studying the potential of using the JPEG 2000 format in large repositories of digital materials. The project is designed to help develop guidelines and best practices for digital content. The trial will include up to 1 million tiff images to be converted to JPEG 2000. Xerox will build and test the system, and they look specifically to create profiles for the objects. Xerox already created a profile for using the JPEG 2000 format for newspapers.