For the updated blog, please visit
preservationmatters.blogspot.com
Wednesday, March 11, 2015
Tuesday, March 10, 2015
Investing in Curation. A Shared Path to Sustainability. Final RoadMap.
Investing in Curation. A Shared Path to Sustainability. Paul Stokes. The 4C project. March 9, 2015.
Digital curation involves managing, preserving and adding value to digital assets over their entire life cycle. Actively managing digital assets maximizes their value and reduces the risk of obsolescence. The costs of curation is a concern to stakeholders. The final version of the road map is now available; it starts with a focus on the costs of digital curation, but the ultimate goal is to change the way that all organizations manage their digital assets.
The vision: Cost modeling will be a part of the planning and management activities of all digital repositories.
Digital curation involves managing, preserving and adding value to digital assets over their entire life cycle. Actively managing digital assets maximizes their value and reduces the risk of obsolescence. The costs of curation is a concern to stakeholders. The final version of the road map is now available; it starts with a focus on the costs of digital curation, but the ultimate goal is to change the way that all organizations manage their digital assets.
The vision: Cost modeling will be a part of the planning and management activities of all digital repositories.
- Identify the value of digital assets and make choices
- Value is an indirect economic determinant on the cost of curating an asset. The perception of value will affect the methods chosen and how much investment is required.
- Content owners should have clear policies regarding the scope of their collections, the type of assets sought, the preferred file formats.
- Establish value criteria for assets as a component of curation, understanding that certain types of assets can be re-generated or re-captured relatively easily, thereby avoiding curation costs
- Demand and choose more efficient systems
- Requirements for curation services should be specified according to accepted standards and best practices.
- More knowledgeable customers demanding better specified and standard functionality means that products can mature more quickly.
- Develop scalable services and infrastructure
- Organizations should aim to work smarter and be able to demonstrate the impact of their investments.
- Design digital curation as a sustainable service
- Effective digital curation requires active management throughout the whole lifecycle of a digital object.
- Curation should be undertaken with a stated purpose.
- Making curation a service further embeds the activity into the organization's normal business function.
- Make funding dependent on costing digital assets across the whole lifecycle
- Digital curation activity requires a flow of sufficient resources for the activity to proceed.
- Some digital assets may need to be preserved in perpetuity but others will have a much more predictable and shorter life-span.
- All stakeholders involved at any point in the curation lifecycle will need to understand their fiscal responsibilities for managing and curating the asset until such time that the asset is transferred to another steward in the lifecycle chain.
- Be collaborative and transparent to drive down costs
- Each organization is looking to realize a return on their investment.
- If those who provide digital curation services can be descriptive about their products and transparent about their pricing structures, this will enhance possible comparisons, drive competitiveness and lead the market to maturity.
Labels:
costs,
curation,
digital preservation,
standards,
value of libraries
Ending the Invisible Library | Linked Data
Ending the Invisible Library | Linked Data. Matt Enis. Library Journal. February 24, 2015.
The World Wide Web began as a collection of web pages that were navigated with links. Now, and going forward, the web is increasingly about data and relationships among data objects. The use of MARC is "becoming an anachronism in an increasingly networked world". The site schema.org, is a collection of structured data schemas that help web designers specify entities and relationships among entities, but these tools were not designed with libraries in mind. MARC lacks the ability to encode this information or make it accessible on the web. Libraries need to start formatting their data so it can be accessed from internet search tools.
The W3C Schema Bib Extend Community Group (librarians, vendors, and organizations) have been working to expand schema.org to better represent library bibliographic information for search engines. The Library of congress has been working with the BIBFRAME project; “a major focus of the project is to translate the MARC 21 format to a Linked Data model while retaining as much as possible the robust and beneficial aspects of the historical format.” This will structure library records so that search engines can “extract meaningful information" and make it available. Ultimately, LC plans for BIBFRAME to replace MARC; there is a tool to convert MARC records to BIBFRAME.
The Libhub Initiative is a proof-of-concept project to build a network of libraries using BIBFRAME standards to link data between institutions and show how this can make library resources more visible on the internet.
Labels:
preservation tools,
semantic web,
standards
Friday, March 6, 2015
Infokit: Digital file formats
Infokit: Digital file formats. Matt Faber. JISC. March 6, 2015.
JISC has released a new infokit resource on Digital file formats. The infokit presents an overview of the current state of digital file formats for still images, audio and moving images, and it is looking toward future formats, and shifts to new formats from previously popular formats.
Choosing the right file format is important to successfully creating, digitizing, delivering, and preserving the digital media objects:
Maintenance of digital media files is an ongoing process. This kit is to:
JISC has released a new infokit resource on Digital file formats. The infokit presents an overview of the current state of digital file formats for still images, audio and moving images, and it is looking toward future formats, and shifts to new formats from previously popular formats.
Choosing the right file format is important to successfully creating, digitizing, delivering, and preserving the digital media objects:
- The format helps define the quality of a digital object.
- Using poorly supported formats that may restrict or block use will hinder file distribution
- Selecting a proprietary format with a short shelf life, or a compressed format that irreversibly loses data will hamper digital preservation
- Selecting the right format for a project should not be taken lightly
Maintenance of digital media files is an ongoing process. This kit is to:
- Provide a comprehensive understanding of what a file format is
- The considerations in choosing the correct format for your project
- Provide quick and practical answers to ‘what file format should I use for…?
- Help identify uncommon digital file formats.
- Provide in-depth technical information about the digital files and file format properties.
Wednesday, March 4, 2015
Building Productive and Collaborative Relationships at the Speed of Trust
Building Productive and Collaborative Relationships at the Speed of Trust. Todd Kreuger. Educause Review. March 2, 2015.
To make projects successful, it is important to create trust and collaboration among IT, staff, and campus groups. To create that trust, the staff must establish highly productive relationships with the school's departments, faculty, and students. Collaboration, design thinking, and innovation go hand-in-hand. Many projects fall short of customer needs, fail or achieve less than satisfactory results, including plenty of finger pointing and wasted time, money, and opportunity. Some of the lessons learned:
Cycle of Productivity model. Processes and tasks must have a defined owner and be documented and published, and change must be managed to ensure that everyone is aware of the new expectations. The basic premise is that training, assessment of effectiveness, and feedback all must occur to ensure the process or task is completed as expected.
The end result "is one in which a culture of collaboration, coupled with a relentless focus on challenging the status quo, results in our encouraging, pushing, and helping each other innovate, transform, and differentiate."
To make projects successful, it is important to create trust and collaboration among IT, staff, and campus groups. To create that trust, the staff must establish highly productive relationships with the school's departments, faculty, and students. Collaboration, design thinking, and innovation go hand-in-hand. Many projects fall short of customer needs, fail or achieve less than satisfactory results, including plenty of finger pointing and wasted time, money, and opportunity. Some of the lessons learned:
- Get on the same page
- Build and establish trust
- Provide the tools and expectations for success
- Focus on both strategic and operational needs
- Clarify process ownership and the associated responsibilities
- Recognize the desired performance and celebrate success
Cycle of Productivity model. Processes and tasks must have a defined owner and be documented and published, and change must be managed to ensure that everyone is aware of the new expectations. The basic premise is that training, assessment of effectiveness, and feedback all must occur to ensure the process or task is completed as expected.
The end result "is one in which a culture of collaboration, coupled with a relentless focus on challenging the status quo, results in our encouraging, pushing, and helping each other innovate, transform, and differentiate."
Labels:
future of libraries,
project management,
trust
Tuesday, March 3, 2015
Significance 2.0: a guide to assessing the significance of collection
Significance 2.0: a guide to assessing the significance of collections. Roslyn Russell, Kylie Winkworth. Collections Council of Australia Ltd. 2009.
This guide is for defining an adaptable method for determining significance across all collections in Australia. The intention is that it will improve collection decision-making in areas such as, preservation, access, and funding support. Regarding significance:
communities to understand, access and enjoy collections. Artistic, scientific and social or
spiritual values are the criteria or key values that help to express how and why an item or collection is significant. Part of the criteria are: provenance, rarity or representativeness, condition or completeness, and interpretive capacity. Significance assessment involves five main steps:
or collection. It is an argument about how and why an item or collection is of value. This should be reviewed as circumstances change. Significance assessment is
This guide is for defining an adaptable method for determining significance across all collections in Australia. The intention is that it will improve collection decision-making in areas such as, preservation, access, and funding support. Regarding significance:
- We cannot keep everything forever. It is vital we make the best use of our scarce resources for collecting, conserving, documenting and digitising our collection materials.
- Significance is not an absolute state; it can change over time.
- Collection custodians have a responsibility to consult communities and respect other views in constructing societal memory and identity.
- It is vital to understand, respect and document the context of collection materials that shape collection materials.
communities to understand, access and enjoy collections. Artistic, scientific and social or
spiritual values are the criteria or key values that help to express how and why an item or collection is significant. Part of the criteria are: provenance, rarity or representativeness, condition or completeness, and interpretive capacity. Significance assessment involves five main steps:
- analysing an item or collection
- researching its history, provenance and context
- comparison with similar items
- understanding its values by reference to the criteria
- summarising its meanings and values in a statement of significance
or collection. It is an argument about how and why an item or collection is of value. This should be reviewed as circumstances change. Significance assessment is
- a process to help with good management of items and collections;
- it is a collaborative process and consultation is essential.
- it will substantiate justify assessments objectively rather than subjectively
- Collate information about the history and development of the collection
- Research the history, scope and themes of the collection
- Consult knowledgeable people
- Explore the context of the collection
- Analyse and describe the condition of the collection
- Compare the collection with similar collections
- Identify related places and collections
- Assess significance against the criteria
- Write a statement of significance
- List recommendations and actions
Oops! Article preserved, references gone
Oops! Article preserved, references gone. Digital Preservation Seeds. February16, 2015.
A blog post concerning the article Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. References in academic publications justify the argument. Missing references are a significant problem with the scholarly record because arguments and conclusions cannot be verified. In addition, missing or incomplete resources and information will devalue national and academic collections. The Significance method can be used to determine the value of collections. There is currently no robust solution, but a robustify script can direct broken links to Memento. The missing references problem emphasizes that without proper context, preserved information is incomplete.
Saturday, February 28, 2015
OxGarage Conversion
OxGarage Conversion. Website. February 27, 2015.
An interesting web tool from the University of Oxford for converting documents to different formats. OxGarage is a web, and RESTful, service to transform documents between a variety of formats, which uses the Text Encoding Initiative format as a pivot format.The initial option is to select:
An interesting web tool from the University of Oxford for converting documents to different formats. OxGarage is a web, and RESTful, service to transform documents between a variety of formats, which uses the Text Encoding Initiative format as a pivot format.The initial option is to select:
- Documents
- Presentations
- Spreadsheets
Friday, February 27, 2015
Data on the Web Best Practices
Data on the Web Best Practices. W3C First Public Working Draft. 24 February 2015.
This document provides best practices related to the publication and usage of data on the Web. Data should be discoverable and understandable by humans and machines and the efforts of the data publisher recognized.This will help the interaction between the publishers and users.
Data on the Web allows for the existence of multiple ways to represent and to access data which is a challenge. Some of the other challenges include: metadata, formats, provenance, quality, access, versions, and preservation. The Best Practices proposed should help data publishers and data consumers overcome the different challenges faced during the data life cycle on the web. The draft proposes best practices for each one of the described challenges.
This document provides best practices related to the publication and usage of data on the Web. Data should be discoverable and understandable by humans and machines and the efforts of the data publisher recognized.This will help the interaction between the publishers and users.
Data on the Web allows for the existence of multiple ways to represent and to access data which is a challenge. Some of the other challenges include: metadata, formats, provenance, quality, access, versions, and preservation. The Best Practices proposed should help data publishers and data consumers overcome the different challenges faced during the data life cycle on the web. The draft proposes best practices for each one of the described challenges.
Thursday, February 26, 2015
Library of Congress Recommended Format Specifications. Comments Requested.
Library of Congress Recommended Format Specifications. Library of Congress website. February 26, 2015.
Comments and feedback requested by March 31, 2015.
Because of the dynamic, ever-changing nature and availability of formats, the Library plans to revisit the specifications annually. Reviewing the specifications annually will permit the Library to keep pace with developments in the creative world, so that changes to the Format Specifications, although made frequently, can be made in small increments. Input and feedback are greatly encouraged and welcomed.
Comments and feedback requested by March 31, 2015.
Because of the dynamic, ever-changing nature and availability of formats, the Library plans to revisit the specifications annually. Reviewing the specifications annually will permit the Library to keep pace with developments in the creative world, so that changes to the Format Specifications, although made frequently, can be made in small increments. Input and feedback are greatly encouraged and welcomed.
Cloud Storage and Digital Preservation: New guidance from the National Archives
Cloud Storage and Digital Preservation: New guidance from the National Archives. Laura Molloy. Digital Curation Centre. 13 May, 2014.
The use of cloud storage in digital preservation is a rapidly evolving field and this guidance explores how it is developing, emerging options and good practice, together with requirements and standards that archives should consider. Digital preservation is a significant issue for almost all public archives. There is an increasing demand for storage of both born-digital archives and digitised material, and an expectation that public access to this content will continue to expand. Five detailed case studies of UK archives that have implemented cloud storage solutions
Digital preservation can be defined as: “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary, beyond the limits of media failure or technological and organisational change”. The challenges are urgent but can be taken one step at a time; you can address current technology and needs while ensuring that the content can be passed on to the next generation. With cloud storage there are many positives and negatives that must be considered. The article reviews many of these. When establishing your needs: Identify what are the ‘must have’ needs and what are the ‘wants’. Define your requirements and decide on the required capabilities rather than a specific technology, implementation, or product.
The use of cloud storage in digital preservation is a rapidly evolving field and this guidance explores how it is developing, emerging options and good practice, together with requirements and standards that archives should consider. Digital preservation is a significant issue for almost all public archives. There is an increasing demand for storage of both born-digital archives and digitised material, and an expectation that public access to this content will continue to expand. Five detailed case studies of UK archives that have implemented cloud storage solutions
Digital preservation can be defined as: “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary, beyond the limits of media failure or technological and organisational change”. The challenges are urgent but can be taken one step at a time; you can address current technology and needs while ensuring that the content can be passed on to the next generation. With cloud storage there are many positives and negatives that must be considered. The article reviews many of these. When establishing your needs: Identify what are the ‘must have’ needs and what are the ‘wants’. Define your requirements and decide on the required capabilities rather than a specific technology, implementation, or product.
- We should be concerned about the security of data, wherever it is stored, but it would be unrealistic to suggest that most cloud services are inherently less secure than most local data centres.
- Adoption of a digital preservation strategy utilising cloud computing inevitably brings with it a range of legal questions.
- Cloud storage services can achieve significant economies of scale.
- Cloud services are typically considered to be operational rather than capital expenditure
Why Digital Storage Formats Are So Risky
Why Digital Storage Formats Are So Risky. Matthew Woollard. Lifehacker. 25 February 2015.
While it may seem that digital files last forever, the growing digital sphere faces enormous losses. Even Google has been unable to ensure access for its archive of digital content. Technical solutions already exist, but they’re not well known and relatively expensive.
How much are we prepared to pay to ensure that digital content that exists today will be usable in the future? We need to think about the value of the content and decide if it is worth keeping. Determining the value can be difficult. However, "re-use is a significant benefit from preserving data and adds value." Besides economic value, there are also cultural and intellectual reasons for preserving data. An example of preservation of data from the middle ages can be seen with scribes that used wax tablets for temporary records, and parchment for permanent records.
The chances of born-digital material being usable in 100 years will be considerably improved by actively taking steps now to ensure the preservation of the items. Effective digital preservation relies on the activities of the creator as well as the archivist. It is important to make decisions about providing context, the types of formats to use, how to organize the material, and resolving rights issues to avoid future problems.
While it may seem that digital files last forever, the growing digital sphere faces enormous losses. Even Google has been unable to ensure access for its archive of digital content. Technical solutions already exist, but they’re not well known and relatively expensive.
How much are we prepared to pay to ensure that digital content that exists today will be usable in the future? We need to think about the value of the content and decide if it is worth keeping. Determining the value can be difficult. However, "re-use is a significant benefit from preserving data and adds value." Besides economic value, there are also cultural and intellectual reasons for preserving data. An example of preservation of data from the middle ages can be seen with scribes that used wax tablets for temporary records, and parchment for permanent records.
The chances of born-digital material being usable in 100 years will be considerably improved by actively taking steps now to ensure the preservation of the items. Effective digital preservation relies on the activities of the creator as well as the archivist. It is important to make decisions about providing context, the types of formats to use, how to organize the material, and resolving rights issues to avoid future problems.
Tuesday, February 24, 2015
Why we should all think about data preservation
Why we should all think about data preservation. Stephanie Taylor. School of Advanced Study. February 19, 2015.
The SHARD project, which ended in 2012, identified four basic principles of digital preservation for researchers:
The SHARD project, which ended in 2012, identified four basic principles of digital preservation for researchers:
- Start early: The sooner you start thinking about what to preserve, how to do it, and when, the greater the chance of avoiding problems. Early planning means involving everyone in a research project in the discussion to help identify additional issues.
- Explain it: Context provides meaning and is vital in digital preservation. There is little point in preserving material and data without context.
- Store it safely: Backups are not preservation. It needs multiple copies in different locations. Use open source file formats and be careful how you and others handle and access files. Select carefully the files to be preserved.
- Share it: Sharing your research material and data is beneficial. In one way or another, the main reason to carry out preservation at all, on any level, is to be able to share your work with others, now and in the future.
Labels:
digital preservation,
research libraries,
training
Monday, February 23, 2015
Threeding Uses Artec 3D Scanning Technology to Catalog 3D Models for Bulgaria’s National Museum of Military History
Threeding Uses Artec 3D Scanning Technology to Catalog 3D Models for Bulgaria’s National Museum of Military History. Bridget Butler Millsaps. 3D Printer & 3D Printing News. February 20, 2015.
The National Museum of Military History is collaborating on a 3D scanning technology to preserve physical pieces of history by creating 3D digital models. With the scans, the museum can create a virtual museum. It also plans to share the models online and allow the public to use 3D printing images to print replicas of the artifacts.
The National Museum of Military History is collaborating on a 3D scanning technology to preserve physical pieces of history by creating 3D digital models. With the scans, the museum can create a virtual museum. It also plans to share the models online and allow the public to use 3D printing images to print replicas of the artifacts.
Labels:
3D,
cultural preservation,
digital preservation
Saturday, February 21, 2015
OAI-PMH harvesting from SharePoint
SharePoint 2010 to Primo. Cillian Joy. Tech Blog. July 2014.
They have a system to manage the submission, storage, approval, and discovery of taught thesis documents., which uses SharePoint 2010 as a the document repository and Exlibris Primo as the discovery tool. The solution uses PHP, XML, XSLT, CURL, and SharePoint REST API using oData.
Uses standards ATOM and OAI-PMH.
SharePoint 2013 .NET Server, CSOM, JSOM, and REST API index
They have a system to manage the submission, storage, approval, and discovery of taught thesis documents., which uses SharePoint 2010 as a the document repository and Exlibris Primo as the discovery tool. The solution uses PHP, XML, XSLT, CURL, and SharePoint REST API using oData.
Uses standards ATOM and OAI-PMH.
SharePoint 2013 .NET Server, CSOM, JSOM, and REST API index
Friday, February 20, 2015
Enjoy your digital films and videos while you can... before they disappear
Enjoy your digital films and videos while you can... before they disappear. David Shapton. RedShark Publications. February 17, 2015.
Article about fragility of digital objects. Examples how digital files can fail even when there are multiple copies. Drives can fail, systems can be obsolete. Some statements:
- paid-for cloud storage and synchronisation company that seems to be doing OK today but which might not be here at some point in the future.
- Absolutely the most important thing to remember here is that this can happen right under your nose without you realising it. It's like they way you forget things.
- We can have backup strategies. But that's clearly not enough. There's no point at all in backing up all your files so that they're stored on accessible error-free media, only to find that you don't have any applications to play them.
- Cerf has said "that we have to not only preserve the files, but the means to decode them as well."
- we also have to preserve a working copy of the operating system that can play back the media files, and because machines go out of date, we have to preserve a working copy of the machine.
- You don't get a warning when something is about to become obsolete or unreadable. You just get an error message bringing you the bad news, or the device doesn't show up in your file system explorer.
- Data doesn't fade away gradually. It just becomes inaccessible. But when you step back and look at a mass of data from afar, the effect is that it gradually goes away.
Thursday, February 19, 2015
From Theory to Action: Good Enough Digital Preservation for Under-Resourced Cultural Heritage Institutions
From Theory to Action: Good Enough Digital Preservation for Under-Resourced Cultural Heritage Institutions. Jaime Schumacher, et al. Digital POWRR White Paper for the Institute of Museum and Library Services. 27 August 2014.
The Digital POWRR team is comprised of archivists, curators, librarians, and a digital humanist, from small and mid-sized Illinois institutions who know that digital content is vulnerable, but are lacking significant financial resources and have been unable to come up with programmatic and technical solutions to mitigate the risk. Each institution produced a case study and a gap analysis, with a plan to address the obstacles. Some institutions have created and implemented digital preservation programs; however, medium-sized and smaller organizations with fewer resources like those of the POWRR institutions are in a vulnerable position.Some statements of interest:
The Digital POWRR team is comprised of archivists, curators, librarians, and a digital humanist, from small and mid-sized Illinois institutions who know that digital content is vulnerable, but are lacking significant financial resources and have been unable to come up with programmatic and technical solutions to mitigate the risk. Each institution produced a case study and a gap analysis, with a plan to address the obstacles. Some institutions have created and implemented digital preservation programs; however, medium-sized and smaller organizations with fewer resources like those of the POWRR institutions are in a vulnerable position.Some statements of interest:
- "Common elements emerged from our gap analyses: a lack of available financial resources; limited or nonexistent dedicated staff time for digital preservation activities; and inadequate levels of appropriate technical expertise. Some of the case studies also mentioned a lack of institutional awareness of the fragility of digital content and a lack of cohesive policies and practices across departments as a contributing factor towards the absence of real progress."
- Digital preservation is best thought of as an incremental, ongoing, and ever-shifting set of actions, reactions, workflows, and policies.
- the notion that it is necessary to research all available tools and services exhaustively before taking any basic steps to secure digital content is yet another misconception that often prevents any progress from occurring.
- Fortunately, practitioners can get started with simple, freely available triage tools while researching which of the more robust solutions will best suit their needs.
ArchivesDirect hosted service
ArchivesDirect website. February 18, 2015.
ArchivesDirect is a web based hosted service of Archivematica offered by DuraSpace for creating OAIS-based digital preservation workflows with content packages that are archived with DuraCloud and Amazon Glacier. It includes open source preservation tools, and generates archival packets using microservices, PREMIS, and mets xml files. ArchivesDirect is intended for small to mid sized institutions. Duraspace is a partnership with DSpace, Fedora, and Vivo.
Pricing and subscription plans include:
ArchivesDirect Standard (System, training, 1 TB): $11,900
ArchivesDirect Digital Preservation Assessment: $4,500
Additional Storage in Amazon S3 and Glacier: $1,000/TB/year
ArchivesDirect is a web based hosted service of Archivematica offered by DuraSpace for creating OAIS-based digital preservation workflows with content packages that are archived with DuraCloud and Amazon Glacier. It includes open source preservation tools, and generates archival packets using microservices, PREMIS, and mets xml files. ArchivesDirect is intended for small to mid sized institutions. Duraspace is a partnership with DSpace, Fedora, and Vivo.
Pricing and subscription plans include:
ArchivesDirect Standard (System, training, 1 TB): $11,900
ArchivesDirect Digital Preservation Assessment: $4,500
Additional Storage in Amazon S3 and Glacier: $1,000/TB/year
Wednesday, February 18, 2015
Rosetta and Amazon Storage
Rosetta and Amazon Storage. Chris Erickson. February 2015.
In the search for more file storage, as well as more affordable file storage, we tried Amazon Simple Storage Service (Amazon S3). The plan was to connect the Rosetta Digital Preservation System to the Amazon cloud storage, and evaluate it as a possible storage solution. There is a free trial. The Free Tier includes 5GB storage, 20,000 Get Requests, and 2,000 Put Request.
Setup:
I tried various configurations, but decided on a single bucket for the files. I setup buckets for the IEs and metadata, but after trying it, decided to only keep the files on Amazon. That would keep the metadata local. I had tried nested folders, but couldn't figure out how to designate that in the storage rules and definition. So I create the folders by time period.
In the Rosetta Admin interface I create a File storage group, using the S3 storage plugin, and then entered the Bucket name, Secret Access Key, Access Key ID, and left the Maximum waiting time at the default. For the test, I set up a retention code for Amazon, and the storage rule used that code to determine what went to the Amazon storage. In a real storage instance, it would be better to use something that would not change, like the producer, etc.
It took a few tests to get everything in sync. The result was that Rosetta stored the content in Amazon just fine. I also tried adding content with a one day retention period, and the content was removed from Amazon after the day. A fixity check task was also able to work without a problem.
This gives us another storage option, though we decided to not use it at present.
Pricing at the time of this comparison, was:
More storage options will be considered.
In the search for more file storage, as well as more affordable file storage, we tried Amazon Simple Storage Service (Amazon S3). The plan was to connect the Rosetta Digital Preservation System to the Amazon cloud storage, and evaluate it as a possible storage solution. There is a free trial. The Free Tier includes 5GB storage, 20,000 Get Requests, and 2,000 Put Request.
Setup:
I tried various configurations, but decided on a single bucket for the files. I setup buckets for the IEs and metadata, but after trying it, decided to only keep the files on Amazon. That would keep the metadata local. I had tried nested folders, but couldn't figure out how to designate that in the storage rules and definition. So I create the folders by time period.
In the Rosetta Admin interface I create a File storage group, using the S3 storage plugin, and then entered the Bucket name, Secret Access Key, Access Key ID, and left the Maximum waiting time at the default. For the test, I set up a retention code for Amazon, and the storage rule used that code to determine what went to the Amazon storage. In a real storage instance, it would be better to use something that would not change, like the producer, etc.
It took a few tests to get everything in sync. The result was that Rosetta stored the content in Amazon just fine. I also tried adding content with a one day retention period, and the content was removed from Amazon after the day. A fixity check task was also able to work without a problem.
This gives us another storage option, though we decided to not use it at present.
Pricing at the time of this comparison, was:
1 TB | 50 TBs | |||||
Digital Storage Costs | Annual Cost | 20 Year Projected | Yearly Charge | 10 Year Projected | 20 Year Projected | 50 Year Projected |
Cloud Storage | ||||||
Amazon S3 - Regular | $360 | $7,200 | $17,706 | $177,060 | $354,120 | $885,300 |
Amazon S3 - Copy / Glacier | $480 | $9,600 | $23,706 | $237,060 | $474,120 | $1,185,300 |
Amazon S3 - Reduced & Glacier | $288 | $5,760 | $14,165 | $141,648 | $283,296 | $708,240 |
DuraSpace - Preservation | $1,800 | $36,000 | $36,100 | $361,000 | $722,000 | $1,805,000 |
DuraSpace - Dark copy/Glacier | $1,925 | $38,500 | $42,350 | $423,500 | $847,000 | $2,117,500 |
DuraSpace - Enterprise Plus | $5,625 | $112,500 | $64,425 | $644,250 | $1,288,500 | $3,221,250 |
More storage options will be considered.
Save the Voices of Tolkien, Joyce And Tennyson
Save the Voices of Tolkien, Joyce And Tennyson. Laura Clark. Smithsonian.
The British Library issued a public call for help safeguarding the over 6.5 million recordings in their archives through digital preservation. It will take around £40 million to fully fund the effort, and time is running short. The British Library’s sound archives includes audio files from Tolkien, Joyce, Florence Nightingale, Tennyson, WWI soldiers, as well as many nature sounds, oral histories and theater performances. Thousands of others are at risk and will disappear soon if no action is taken.
The British Library issued a public call for help safeguarding the over 6.5 million recordings in their archives through digital preservation. It will take around £40 million to fully fund the effort, and time is running short. The British Library’s sound archives includes audio files from Tolkien, Joyce, Florence Nightingale, Tennyson, WWI soldiers, as well as many nature sounds, oral histories and theater performances. Thousands of others are at risk and will disappear soon if no action is taken.
Tuesday, February 17, 2015
AHRQ Public Access to Federally Funded Research
AHRQ Public Access to Federally Funded Research. Francis D. Chesley. Agency for Healthcare Research and Quality. February, 2015.
The Agency for Healthcare Research and Quality's has established a policy for public access to scientific publications and scientific data in digital format resulting from funding through the agency. Preservation is one of the Public Access Policy's primary objectives.
The Public Access Policy includes the following objectives:
Digital scientific data is defined as "the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens."
The Agency for Healthcare Research and Quality's has established a policy for public access to scientific publications and scientific data in digital format resulting from funding through the agency. Preservation is one of the Public Access Policy's primary objectives.
The Public Access Policy includes the following objectives:
- Ensure that the public can access the final published digital documents.
- Facilitate easy public search, analysis of and access to these publications
- Ensure the attributes to authors, journals, and original publishers are maintained.
- Ensure that publications and metadata are in an archival solution.
- Ensure that all researchers receiving grants develop data management plans, describing how they will provide for long-term preservation of and access to scientific data in digital format.
- A plan for protecting confidentiality and personal privacy.
- A description of how scientific data in digital format will be shared
- It must include a plan for long-term preservation and access to the data
Digital scientific data is defined as "the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens."
Monday, February 16, 2015
Internet future blackout: No way to preserve our data
Internet future blackout: No way to preserve our data. Lisa M. Krieger. San Jose Mercury News. February 12, 2015.
Vint Cerf spoke about the "digital vellum" needed to maintain the data that comprise text, video, software games, scientific data and other digital objects, and how to preserve their meaning.
Google is not directly involved in the digital preservation effort "although we have worked really hard at preserving the digital information of the day. We aren't planning to become the archive of the future -- although I think it would be cool." Cerf envisions libraries and governments investing in the technology needed to carry today's information into the distant future.
The "digital vellum" project led has been created at Carnegie Mellon University. This is how it would work: There's a digital snapshot of a document, which is then built into one giant file. Using preserved and transmitted instructions, a virtual computer pretends to be a 2015-era Mac or IBM computer, and can find the document. "If I can substantiate those bits (of a document) in another computer, years in future, then I will have created the same document -- I can reproduce what you were doing. It is a digital copy of the state of the computer you were using when you created new documents."
Rather than the current paradigm -- names, addresses and routes -- it would start with an "object," with a name, as the thing to be stored and moved. "Our current system of 'domain names' is not a stable system. The current routing system could be replaced by the information-centric system, where we keep track of everything not by where it is hosted but the information itself, by name.
Just as we need preservation of today's bits and software, our encryption systems will need to be preserved as well -- because once they're lost, so is the material that they're protecting. "We need an encryption system in which the keys never wear out and are never broken, that represents keying for hundreds of thousands of years."
Vint Cerf spoke about the "digital vellum" needed to maintain the data that comprise text, video, software games, scientific data and other digital objects, and how to preserve their meaning.
Aiding preservation might involve "Information Centric Networking," which is based on two simple concepts -- addressing information by its name, rather than location, and adding computation and memory to the network. David Oran of Cisco Systems and Glenn Edens, research director of Palo Alto's Xerox Park, are working on that new technology.
Google is not directly involved in the digital preservation effort "although we have worked really hard at preserving the digital information of the day. We aren't planning to become the archive of the future -- although I think it would be cool." Cerf envisions libraries and governments investing in the technology needed to carry today's information into the distant future.
The "digital vellum" project led has been created at Carnegie Mellon University. This is how it would work: There's a digital snapshot of a document, which is then built into one giant file. Using preserved and transmitted instructions, a virtual computer pretends to be a 2015-era Mac or IBM computer, and can find the document. "If I can substantiate those bits (of a document) in another computer, years in future, then I will have created the same document -- I can reproduce what you were doing. It is a digital copy of the state of the computer you were using when you created new documents."
Rather than the current paradigm -- names, addresses and routes -- it would start with an "object," with a name, as the thing to be stored and moved. "Our current system of 'domain names' is not a stable system. The current routing system could be replaced by the information-centric system, where we keep track of everything not by where it is hosted but the information itself, by name.
Just as we need preservation of today's bits and software, our encryption systems will need to be preserved as well -- because once they're lost, so is the material that they're protecting. "We need an encryption system in which the keys never wear out and are never broken, that represents keying for hundreds of thousands of years."
Saturday, February 14, 2015
Google VP leads calls for web content preservation
Google VP leads calls for web content preservation. Caroline Donnelly. IT Pro. 13 Feb, 2015.
Vint Cerf says action is needed to preserve the content of the internet for future generations to enjoy. Historians in the future could may view the 21st century as an “information black hole” because the software and services used to access online content could become defunct over time. To protect against this, he wants to see efforts made to create a “digital vellum” that will preserve the hardware and software needed to access online content in the years to come. "If we want to preserve them, we need to make sure that the digital objects we create today can still be rendered far into the future.” All web users are at risk of throwing their data away into a “digital black hole” in the mistaken belief that uploading content to a site or service will preserve it. “We digitise things because we think we will preserve them, but what we don’t understand is that unless we take other steps, those digital versions may not be any better, and may even be worse than, than the artefacts that we digitised.”
Vint Cerf says action is needed to preserve the content of the internet for future generations to enjoy. Historians in the future could may view the 21st century as an “information black hole” because the software and services used to access online content could become defunct over time. To protect against this, he wants to see efforts made to create a “digital vellum” that will preserve the hardware and software needed to access online content in the years to come. "If we want to preserve them, we need to make sure that the digital objects we create today can still be rendered far into the future.” All web users are at risk of throwing their data away into a “digital black hole” in the mistaken belief that uploading content to a site or service will preserve it. “We digitise things because we think we will preserve them, but what we don’t understand is that unless we take other steps, those digital versions may not be any better, and may even be worse than, than the artefacts that we digitised.”
Labels:
digital preservation,
digitizing,
web archiving
Friday, February 13, 2015
Crystal clear digital preservation: a management issue
Crystal clear digital preservation: a management issue. Barbara Sierman. Digital Preservation Seeds.
February 1, 2015.
The book Digital Preservation for Libraries, Archives and Museums by Edward Corrado and Heather Lea Moulaison does a great job of explaining to people about digital preservation. "In crystal clear language, without beating about the bush and based on extensive up to date (until 2014) literature, digital preservation is explained and almost every aspect of it is touched upon. " It explains what digital preservation is not (backup, etc.) The point of the book is expressed by the statement:
“ensuring ongoing access to digital content over time requires careful reflection and planning. In terms of technology, digital preservation is possible today. It might be difficult and require extensive, institution-wide planning, but digital preservation is an achievable goal given the proper resources. In short, digital preservation is in many ways primarily a management issue”.
It uses the Digital Preservation Triad to symbolize the interrelated activities of
February 1, 2015.
The book Digital Preservation for Libraries, Archives and Museums by Edward Corrado and Heather Lea Moulaison does a great job of explaining to people about digital preservation. "In crystal clear language, without beating about the bush and based on extensive up to date (until 2014) literature, digital preservation is explained and almost every aspect of it is touched upon. " It explains what digital preservation is not (backup, etc.) The point of the book is expressed by the statement:
“ensuring ongoing access to digital content over time requires careful reflection and planning. In terms of technology, digital preservation is possible today. It might be difficult and require extensive, institution-wide planning, but digital preservation is an achievable goal given the proper resources. In short, digital preservation is in many ways primarily a management issue”.
It uses the Digital Preservation Triad to symbolize the interrelated activities of
- Management-related activities,
- Technological activities and
- Content-centred activities.
Thursday, February 12, 2015
Save our Sounds
Save our Sounds. Luke McKernan. British Library. 12 January 2015.
The nation’s sound collections are under threat, from physical degradation, and also as play back devices wear out and disappear. "Archival consensus internationally is that we have approximately 15 years in which to save our sound collections by digitising them before they become unreadable and are effectively lost." The British Library collection contains over 6.5 million recordings of speech, music, wildlife and the environment, from the 1880s to the present day. The Save our Sounds program has three major aims:
The nation’s sound collections are under threat, from physical degradation, and also as play back devices wear out and disappear. "Archival consensus internationally is that we have approximately 15 years in which to save our sound collections by digitising them before they become unreadable and are effectively lost." The British Library collection contains over 6.5 million recordings of speech, music, wildlife and the environment, from the 1880s to the present day. The Save our Sounds program has three major aims:
- Preserve as much as possible of the nation's rare and unique sound recordings from collections across the UK
- Establish a national radio archive to collect, protect and share with other partners
- Invest in new technology to enable us to receive music in digital formats, working with industry partners, to ensure their long-term preservation
Labels:
audio preservation,
digital preservation,
digitizing
Wednesday, February 11, 2015
New Expert Panel Report From Council of Canadian Academies Says Canada’s Memory Institutions “Falling Behind” in Preservation of Digital Materials
New Expert Panel Report From Council of Canadian Academies Says Canada’s Memory Institutions “Falling Behind” in Preservation of Digital Materials. Gary Price. Library Journal. February 4, 2015.
An expert panel report, Leading in the Digital World: Opportunities for Canada’s Memory Institutions, (208 pages; PDF) addresses the challenges and opportunities that exist for libraries, archives, museums, and galleries as they adapt to the digital age. Vast amounts of digital information are at risk of being lost because many traditional tools are no longer adequate in the digital age. Memory institutions face the difficult task of preserving digital files in formats that will remain accessible over the long term. Institutions to collaborate more strategically and develop interactive relationships with users. They must also be leaders within and among their respective organizations. Many of the challenges faced are rooted in technical issues associated with managing digital content, the sheer volume of digital information, and the struggle to remain relevant. Collaboration is essential for adaptation, which enables institutions to access the resources required to deliver the services that users now expect.
An expert panel report, Leading in the Digital World: Opportunities for Canada’s Memory Institutions, (208 pages; PDF) addresses the challenges and opportunities that exist for libraries, archives, museums, and galleries as they adapt to the digital age. Vast amounts of digital information are at risk of being lost because many traditional tools are no longer adequate in the digital age. Memory institutions face the difficult task of preserving digital files in formats that will remain accessible over the long term. Institutions to collaborate more strategically and develop interactive relationships with users. They must also be leaders within and among their respective organizations. Many of the challenges faced are rooted in technical issues associated with managing digital content, the sheer volume of digital information, and the struggle to remain relevant. Collaboration is essential for adaptation, which enables institutions to access the resources required to deliver the services that users now expect.
Tuesday, February 10, 2015
Reference rot in web-based scholarly communication and link decoration as a path to mitigation
Reference rot in web-based scholarly communication and link decoration as a path to mitigation.
Martin Klein, Herbert Van de Sompel. LSE Impact of Social Sciences blog. February 6, 2015.
The failure of a web address to link to the appropriate online source is a significant problem facing scholarly material. The ability to reference sources is a fundamental part of scholarship. "Increasingly, we see references to software, ontologies, project websites, presentations, blogs, videos, tweets, etc. Such resources are usually referenced by means of their HTTP URI as they exist on the web at large. These HTTP URIs allow for immediate access on the web, but also introduce one of the detrimental characteristics of the web to scholarly communication: reference rot." Reference rot is a combination of two problems common for URI references:
The typical strategy to address the problem is to link to a snapshot of the web page (instead of the original web page) created at the time and stored in a web archive, such as the Internet Archive, archive.today, and perma.cc.
There are problems with the approach. The link copy may not remain in place either. The linking URI is lost, as is the any information about the page or changed page. Link decoration can be used, with the URI of the original, the snapshot, and datetime of linking. Memento can provide this information but there are discussions needed to decide how to best convey the information.
Martin Klein, Herbert Van de Sompel. LSE Impact of Social Sciences blog. February 6, 2015.
The failure of a web address to link to the appropriate online source is a significant problem facing scholarly material. The ability to reference sources is a fundamental part of scholarship. "Increasingly, we see references to software, ontologies, project websites, presentations, blogs, videos, tweets, etc. Such resources are usually referenced by means of their HTTP URI as they exist on the web at large. These HTTP URIs allow for immediate access on the web, but also introduce one of the detrimental characteristics of the web to scholarly communication: reference rot." Reference rot is a combination of two problems common for URI references:
- link rot: A URI ceases to exist; the page is not found
- content drift: The resource identified by its URI changes over time and is not what was originally referenced
The typical strategy to address the problem is to link to a snapshot of the web page (instead of the original web page) created at the time and stored in a web archive, such as the Internet Archive, archive.today, and perma.cc.
There are problems with the approach. The link copy may not remain in place either. The linking URI is lost, as is the any information about the page or changed page. Link decoration can be used, with the URI of the original, the snapshot, and datetime of linking. Memento can provide this information but there are discussions needed to decide how to best convey the information.
Labels:
metadata,
preservation tools,
web archiving
Monday, February 9, 2015
All in the (Apple ProRes 422 Video Codec) Family
All in the (Apple ProRes 422 Video Codec) Family. Kate Murray. The Signal.
The Apple ProRes 422 family of video codecs to the Sustainability of Digital Formats website. These codecs are proprietary, lossy compressed, high quality intermediate codecs for digital video primarily supported by Final Cut Pro.
The Apple ProRes 422 Codec Family comprises four subtypes:
The Apple ProRes 422 family of video codecs to the Sustainability of Digital Formats website. These codecs are proprietary, lossy compressed, high quality intermediate codecs for digital video primarily supported by Final Cut Pro.
The Apple ProRes 422 Codec Family comprises four subtypes:
- ProRes 422 HQ: the highest data-rate version of the ProRes 422 codecs, applying the least compression for the best quality but the largest files.
- ProRes 422: the second-highest data-rate of the group, often used for multistream, real-time editing and has a significant storage savings over uncompressed video
- ProRes 422 LT: the third-highest data-rate version, considered an editing codec with smaller file sizes
- ProRes 422 Proxy: the lowest data-rate version often used in offline post-production work that requires low data rates but also a full screen picture.
Saturday, February 7, 2015
Digital Preservation Coalition publishes ‘OAIS Introductory Guide (2nd Edition)’ Technology Watch Report
Digital Preservation Coalition publishes ‘OAIS Introductory Guide (2nd Edition)’ Technology Watch Report. Brian Lavoie. Digital Preservation Coalition. Watch Report. October, 2014. [PDF]
The report describes the OAIS, its core principles and functional elements, as well as the information model which support long-term preservation, access and understandability of data. The OAIS reference model was approved in 2002 and revised and updated in 2012. Perhaps “the most important achievement of the OAIS is that it has become almost universally accepted as the lingua franca of digital preservation”.
The central concept in the reference model is that of an open archival information system. An OAIS-type archive must meet a set of six minimum responsibilities to do with the ingest, preservation, and dissemination of archived materials: Ingest, Archival Storage, Data Management, Preservation Planning, Access, and Administration. There are also Common Services, which consist of basic computing and networking resources.
An OAIS-type archive references three types of entities: Management, Producer, and Consumer, which includes the Designated Community: consumers expected to independently understand the archived information in the form in which it is preserved and made available by the OAIS. This is a framework to encourage dialogue and collaboration among participants in standards-building activities, as well as identifying areas most likely to benefit from standards development.
An OAIS-type archive is expected to:
The OAIS information model is built around the concept of an information package, which includes: the Submission Information Package, the Archival Information Package, and the Dissemination Information Package. Preservation requires metadata to support and document the OAIS’s preservation processes, called Preservation Description Information, which ‘is specifically focused on describing the past and present states of the Content Information, ensuring that it is uniquely identifiable, and ensuring it has not been unknowingly altered’. The information consists of:
The ‘OAIS reference model provides a solid theoretical basis for digital preservation efforts, though theory and practice can sometimes have an uneasy fit.’
The report describes the OAIS, its core principles and functional elements, as well as the information model which support long-term preservation, access and understandability of data. The OAIS reference model was approved in 2002 and revised and updated in 2012. Perhaps “the most important achievement of the OAIS is that it has become almost universally accepted as the lingua franca of digital preservation”.
The central concept in the reference model is that of an open archival information system. An OAIS-type archive must meet a set of six minimum responsibilities to do with the ingest, preservation, and dissemination of archived materials: Ingest, Archival Storage, Data Management, Preservation Planning, Access, and Administration. There are also Common Services, which consist of basic computing and networking resources.
An OAIS-type archive references three types of entities: Management, Producer, and Consumer, which includes the Designated Community: consumers expected to independently understand the archived information in the form in which it is preserved and made available by the OAIS. This is a framework to encourage dialogue and collaboration among participants in standards-building activities, as well as identifying areas most likely to benefit from standards development.
An OAIS-type archive is expected to:
- Negotiate for and accept appropriate information from information producers;
- Obtain sufficient control of the information in order to meet long-term preservation objectives;
- Determine the scope of the archive’s user community;
- Ensure the preserved information is independently understandable to the user community
- Follow documented policies and procedures to ensure the information is preserved against all reasonable contingencies
- Make the preserved information available to the user community, and enable dissemination of authenticated
The OAIS information model is built around the concept of an information package, which includes: the Submission Information Package, the Archival Information Package, and the Dissemination Information Package. Preservation requires metadata to support and document the OAIS’s preservation processes, called Preservation Description Information, which ‘is specifically focused on describing the past and present states of the Content Information, ensuring that it is uniquely identifiable, and ensuring it has not been unknowingly altered’. The information consists of:
- Reference Information (identifiers)
- Context Information (describes relationships among information and objects)
- Provenance Information (history of the content over time)
- Fixity Information (verifying authenticity)
- Access Rights Information (conditions or restrictions)
The ‘OAIS reference model provides a solid theoretical basis for digital preservation efforts, though theory and practice can sometimes have an uneasy fit.’
Labels:
certification,
digital preservation,
metadata,
OAIS,
PREMIS,
repositories,
TDR,
TRAC,
xml
Digital Tools and Apps
Digital Tools and Apps. Chris Erickson. Presentation for ULA. 2014. [PDF]
This is a presentation I created for ULA to briefly outline a few tools that I find helpful. There are many useful tools, and more are being created all the time. Here are a few that I use.
This is a presentation I created for ULA to briefly outline a few tools that I find helpful. There are many useful tools, and more are being created all the time. Here are a few that I use.
- Copy & Transfer Tools: WinSCP; Teracopy;
- Rename Tools: Bulk Rename Utility
- Integrity & Fixity Tools: MD5Summer; MD5sums 1.2; Quick Hash; Hash Tool
- File Editing Tools: Babelpad; Notepad++; XML Notepad;
- ExifTool; BWF MetaEdit; BWAV Reader;
- File Format Tools: DROID;
- File Conversion: Calibre; Adobe Portfolio;
- Others: A whole list of other tools that I use or suggest you look at.
- PDF/A tools
- Email tools
Friday, February 6, 2015
Preserving progress for future generations
Preserving progress for future generations. Rebecca Pool. Research Information. February/March 2015.
Digital preservation remains one of the most critical challenges facing scholarly communities today. From e-journals and e-books to emails, blogs and more, electronic content is proliferating and organizations worldwide are trying to preserve information before the electronic information is lost. Some of the organizations include: Portico (which preserves content on behalf of participating publishers; the number of open access journals it includes is rising, ); CLOCKSS (still grappling with the cost models of providing preservation service).
There is a rising demand for the preservation of dynamic content. No one is able to "capture dynamic content and [preserve] a day-to-day, or even, minute-to-minute feed of this content." There are only snapshots. CLOCKSS is developing the ‘how to’ process to preserve these ‘snapshots’ across multiple locations, validating each against the other, and is also exploring the best pricing structures to preserve such content.
Other organizations include LOCKSS, The Digital Preservation Network, HathiTrust, Preservica, Archivematica, and Rosetta, whose recent clients are the State Library of New South Wales and the State Library of Queensland.
The digital preservation development is clearly gaining momentum, growing in both size and complexity. "Clearly progress is being made and you can measure that by the maturity of solutions on offer." But for most organizations, the urgency of digital preservation has yet to hit home.
"Trying to sell the idea of digital preservation on the basis of return on investment has been very hard. By its nature, it’s a long-term activity and you’re really hedging your bets against future risks. I think we are still in the very early days of genuinely understanding the value of digital assets... and transferring this understanding over to financial assets doesn’t yet work very well." The European consortium 4C (Collaboration to Clarify the Costs of Curation) has been investigating this problem. Their road map helps organisations appraise digital assets, adopt a strategy to grow preservation assets and develop costing processes. In addition they have developed a model for curation costs. The only way to understand the costs of preservation is though sharing, through openness and collaboration.
Digital preservation remains one of the most critical challenges facing scholarly communities today. From e-journals and e-books to emails, blogs and more, electronic content is proliferating and organizations worldwide are trying to preserve information before the electronic information is lost. Some of the organizations include: Portico (which preserves content on behalf of participating publishers; the number of open access journals it includes is rising, ); CLOCKSS (still grappling with the cost models of providing preservation service).
There is a rising demand for the preservation of dynamic content. No one is able to "capture dynamic content and [preserve] a day-to-day, or even, minute-to-minute feed of this content." There are only snapshots. CLOCKSS is developing the ‘how to’ process to preserve these ‘snapshots’ across multiple locations, validating each against the other, and is also exploring the best pricing structures to preserve such content.
Other organizations include LOCKSS, The Digital Preservation Network, HathiTrust, Preservica, Archivematica, and Rosetta, whose recent clients are the State Library of New South Wales and the State Library of Queensland.
The digital preservation development is clearly gaining momentum, growing in both size and complexity. "Clearly progress is being made and you can measure that by the maturity of solutions on offer." But for most organizations, the urgency of digital preservation has yet to hit home.
"Trying to sell the idea of digital preservation on the basis of return on investment has been very hard. By its nature, it’s a long-term activity and you’re really hedging your bets against future risks. I think we are still in the very early days of genuinely understanding the value of digital assets... and transferring this understanding over to financial assets doesn’t yet work very well." The European consortium 4C (Collaboration to Clarify the Costs of Curation) has been investigating this problem. Their road map helps organisations appraise digital assets, adopt a strategy to grow preservation assets and develop costing processes. In addition they have developed a model for curation costs. The only way to understand the costs of preservation is though sharing, through openness and collaboration.
Thursday, February 5, 2015
Ex Libris plugins for Rosetta on Github
Ex Libris plugins for Rosetta on Github. January 2015.
The Github site for Rosetta plugins. Includes:
The Github site for Rosetta plugins. Includes:
- Rosetta.Jpylyzer Jpylyzer technical metadata plugin
- rosetta.JWPlayer
- rosetta.IABookReader
- rosetta.Jpylyzer-MDExtractorPlugin
- rosetta.Drmlint-RiskExtractorPlugin
- rosetta.Split2JpgMigrationToolPlugin Migration Tool Plug-in to split multi-page files to multiple jpg files
- rosetta.IIPImageVPP Rosetta IIPImage Viewer Pre-Processor Plugin
Tuesday, February 3, 2015
The Cobweb. Can the Internet be archived?
The Cobweb. Can the Internet be archived? Jill Lepore. The New Yorker. January 26, 2015.
The average life of a Web page is about a hundred days. The pages can disappear through “link rot,” or people may see an updated web page where most likely the original has been overwritten. Or the page may have been moved and something else is where it used to be. This is known as “content drift.” This is worse than an error message since it’s impossible to tell that what you’re seeing isn’t what you went to look for: the overwriting, erasure, or moving of the original is invisible.
Link rot and content drift, collectively known as “reference rot,” have been disastrous for the law and courts. In providing evidence, legal scholars, lawyers, and judges often cite Web pages in their footnotes; they expect that evidence to remain where they found it as their proof. But a 2013 survey of law- and policy-related publications found that after six years, nearly fifty per cent of the URLs cited in those publications no longer worked. A Harvard Law School study in 2014 showed “more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information.”
The overwriting, drifting, and rotting of the Web also affects engineers, scientists, and doctors. Recently, researchers at Los Alamos National Laboratory reported the results of a study of three and a half million scholarly articles published in science, technology, and medical journals between 1997 and 2012: one in five links provided in the notes suffers from reference rot.
The problems with links disappearing has been known since the start of the internet. Tim Berners-Lee proposed the HTTP protocol to link web pages, and he had also considered a time axis for the protocol, but "preservation was not a priority.” Other internet pioneers are also concerned. Vint Cerf has talked about a need for a long-term storage “digital vellum”: “I worry that the twenty-first century will become an informational black hole.” Brewster Kahle started the Internet Archive, which has archived more than four hundred and thirty billion Web pages.
Herbert Van de Sompel has been working on Memento which allows a user to look at pages around the time it was written.
The average life of a Web page is about a hundred days. The pages can disappear through “link rot,” or people may see an updated web page where most likely the original has been overwritten. Or the page may have been moved and something else is where it used to be. This is known as “content drift.” This is worse than an error message since it’s impossible to tell that what you’re seeing isn’t what you went to look for: the overwriting, erasure, or moving of the original is invisible.
Link rot and content drift, collectively known as “reference rot,” have been disastrous for the law and courts. In providing evidence, legal scholars, lawyers, and judges often cite Web pages in their footnotes; they expect that evidence to remain where they found it as their proof. But a 2013 survey of law- and policy-related publications found that after six years, nearly fifty per cent of the URLs cited in those publications no longer worked. A Harvard Law School study in 2014 showed “more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information.”
The overwriting, drifting, and rotting of the Web also affects engineers, scientists, and doctors. Recently, researchers at Los Alamos National Laboratory reported the results of a study of three and a half million scholarly articles published in science, technology, and medical journals between 1997 and 2012: one in five links provided in the notes suffers from reference rot.
The problems with links disappearing has been known since the start of the internet. Tim Berners-Lee proposed the HTTP protocol to link web pages, and he had also considered a time axis for the protocol, but "preservation was not a priority.” Other internet pioneers are also concerned. Vint Cerf has talked about a need for a long-term storage “digital vellum”: “I worry that the twenty-first century will become an informational black hole.” Brewster Kahle started the Internet Archive, which has archived more than four hundred and thirty billion Web pages.
Herbert Van de Sompel has been working on Memento which allows a user to look at pages around the time it was written.
Labels:
digital preservation,
persistent ID,
web archiving
Open Preservation Foundation to provide sustainable home for JHOVE
Open Preservation Foundation to provide sustainable home for JHOVE. Becky. Open Preservation Foundation Blog. 3 Feb 2015.
The Open Preservation Foundation is taking stewardship of the JHOVE preservation tool and providing a sustainable home. The tool will become part of the OPF software portfolio
and follow their Software Maturity Model. Portico is contributing code improvements that they have made to the tool. Other tools in the portfolio include:
The Open Preservation Foundation is taking stewardship of the JHOVE preservation tool and providing a sustainable home. The tool will become part of the OPF software portfolio
and follow their Software Maturity Model. Portico is contributing code improvements that they have made to the tool. Other tools in the portfolio include:
- Jpylyzer: JP2 image validator and properties extractor
- FIDO: command-line tool to identify the file formats of digital objects.
- Matchbox: duplicate image detection tool
- xcorrSound: four tools to improve Digital Audio Recordings
Labels:
digital preservation,
JHOVE,
preservation tools
Office Opens up with OOXML
Office Opens up with OOXML. Carl Fleischhauer, Kate Murray. The Signal. February 3, 2015.
Nine new format descriptions have been added to the Library’s Format Sustainability Web site. These closely related formats relate to the Office Open XML (OOXML) family, which are the formats of the Microsoft family of “Office” desktop applications, including Word, PowerPoint and Excel. Formerly, these applications produced files in proprietary, binary formats with the extensions doc, ppt, and xls. The current versions employ an XML structure for the data and an x has been added to the extensions: docx, pptx, and xlsx.
"In addition to giving the formats an XML expression, Microsoft also decided to move the formats out of proprietary status and into a standardized form (now focus on the word Open in the name.) Three international organizations cooperated to standardize OOXML."
The list of the nine:
Nine new format descriptions have been added to the Library’s Format Sustainability Web site. These closely related formats relate to the Office Open XML (OOXML) family, which are the formats of the Microsoft family of “Office” desktop applications, including Word, PowerPoint and Excel. Formerly, these applications produced files in proprietary, binary formats with the extensions doc, ppt, and xls. The current versions employ an XML structure for the data and an x has been added to the extensions: docx, pptx, and xlsx.
"In addition to giving the formats an XML expression, Microsoft also decided to move the formats out of proprietary status and into a standardized form (now focus on the word Open in the name.) Three international organizations cooperated to standardize OOXML."
The list of the nine:
- OOXML_Family, OOXML Format Family, ISO/IEC 29500 and ECMA 376
- OPC/OOXML_2012, Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012
- DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
- DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
- PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
- PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
- XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
- XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
- MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2012, ECMA-376, Editions 1-4
Labels:
digital preservation,
formats,
standards,
xml
Monday, February 2, 2015
Websites Change, Go Away and Get Taken Down
Websites Change, Go Away and Get Taken Down. Website. January 2015.
Perma.cc is a beta service that allows users to create citation links that will never break.
When a user creates a link, Perma.cc archives a copy of the referenced content, and generates a link to an unalterable hosted instance of the site. Regardless of what may happen to the original source, if the link is later published by a journal using the Perma.cc service, the archived version will always be available through the Perma.cc link.
Perma.cc is a beta service that allows users to create citation links that will never break.
When a user creates a link, Perma.cc archives a copy of the referenced content, and generates a link to an unalterable hosted instance of the site. Regardless of what may happen to the original source, if the link is later published by a journal using the Perma.cc service, the archived version will always be available through the Perma.cc link.
When readers click on a Perma.cc link they are directed to a page which points to either the original site (which may have changed since the link was created) or see the archived copy of the site in its original state.
Perma.cc is an online preservation service developed by the Harvard Law School Library in conjunction with university law libraries across the country and other organizations in the “forever” business.
Perma.cc is an online preservation service developed by the Harvard Law School Library in conjunction with university law libraries across the country and other organizations in the “forever” business.
Saturday, January 31, 2015
Phase Two of POWRR: Extending the Reach of Digital Preservation Workshops
Phase Two of POWRR: Extending the Reach of Digital Preservation Workshops. Danielle Spalenka. January 27, 2015.
The Digital POWRR Project (Preserving digital Objects with Restricted Resources) will continue the POWRR workshops for two years.
Project team members realized that many information professionals feel overwhelmed by the scope of the digital preservation problem, which prevents them from implementing digital preservation activities. They found that digital preservation is best thought of as an incremental, ongoing, and ever-shifting set of actions, reactions, workflows, and policies. Digital preservation activities can be started by taking small steps to prioritize and triage digital collections, while working to build awareness and advocate for resources.
Some of the resources on the site include:
The Digital POWRR Project (Preserving digital Objects with Restricted Resources) will continue the POWRR workshops for two years.
Project team members realized that many information professionals feel overwhelmed by the scope of the digital preservation problem, which prevents them from implementing digital preservation activities. They found that digital preservation is best thought of as an incremental, ongoing, and ever-shifting set of actions, reactions, workflows, and policies. Digital preservation activities can be started by taking small steps to prioritize and triage digital collections, while working to build awareness and advocate for resources.
Some of the resources on the site include:
Friday, January 30, 2015
Memorial University of Newfoundland selects Ex Libris Solutions, including Rosetta
Memorial University of Newfoundland selects Ex Libris Solutions, including Rosetta. Press Release. Ex Libris. January 28, 2015.
The Memorial University in Newfoundland and Labrador, Canada has adopted a suite of Ex Libris solutions comprised of the Alma library management solution, the Primo discovery and delivery solution, and the Rosetta digital asset management and preservation system. These solutions replace multiple disparate legacy systems used by the Library.
Rosetta will enable Memorial University to manage and preserve its important collections of Newfoundland's history, including a huge collection of digitized newspapers. Using the Primo search interface for physical collections, digital and digitized assets, and electronic resources, Memorial will provide a seamless discovery experience to users, whatever their learning and teaching needs.
The Memorial University in Newfoundland and Labrador, Canada has adopted a suite of Ex Libris solutions comprised of the Alma library management solution, the Primo discovery and delivery solution, and the Rosetta digital asset management and preservation system. These solutions replace multiple disparate legacy systems used by the Library.
Rosetta will enable Memorial University to manage and preserve its important collections of Newfoundland's history, including a huge collection of digitized newspapers. Using the Primo search interface for physical collections, digital and digitized assets, and electronic resources, Memorial will provide a seamless discovery experience to users, whatever their learning and teaching needs.
Thursday, January 29, 2015
University of Arizona selects Ex Libris Rosetta.
University of Arizona selects Ex Libris Rosetta. Press Release. Ex Libris. January 27, 2015.
The University of Arizona has adopted the Rosetta digital management and preservation solution. Rosetta will help the university provide sustained access to scholarly digital content and research to both university members and the broader academic community.
"After evaluating a number of commercial digital preservation systems, we found that Rosetta had the unique capabilities that Arizona requires. Our priorities for 2015 led us to seek a preservation solution that could be used collaboratively by a number of campuses. Rosetta's ability to provide end-to-end digital asset management and preservation for the vast array of assets and research data that the university possesses, its consortial architecture that allows participating institutions to maintain a degree of autonomy, and its ability to act as a transitional component between multiple display layers, made it the clear choice for Arizona."
The University of Arizona has adopted the Rosetta digital management and preservation solution. Rosetta will help the university provide sustained access to scholarly digital content and research to both university members and the broader academic community.
"After evaluating a number of commercial digital preservation systems, we found that Rosetta had the unique capabilities that Arizona requires. Our priorities for 2015 led us to seek a preservation solution that could be used collaboratively by a number of campuses. Rosetta's ability to provide end-to-end digital asset management and preservation for the vast array of assets and research data that the university possesses, its consortial architecture that allows participating institutions to maintain a degree of autonomy, and its ability to act as a transitional component between multiple display layers, made it the clear choice for Arizona."
Monday, January 26, 2015
ForgetIT
ForgetIT. Website. January 23, 2015.
While preservation of digital content is now well established in memory institutions such as national libraries and archives, it is still in its infancy in most other organizations, and even more so for personal content. ForgetIT combines three new concepts to ease the adoption of preservation in the personal and organizational context:
While preservation of digital content is now well established in memory institutions such as national libraries and archives, it is still in its infancy in most other organizations, and even more so for personal content. ForgetIT combines three new concepts to ease the adoption of preservation in the personal and organizational context:
- Managed Forgetting: resource selection as a function of attention and significance dynamics. Focuses on characteristic signal reduction. It relies on information assessment and offers options such as full preservation, removing redundancy, resource condensation, and complete digital forgetting.
- Synergetic Preservation: making intelligent preservation processes a part of the content life cycle and by developing solutions for smooth transitions.
- Contextualized Remembering: keeping preserved content meaningful by combining context extraction and re-contextualization.
Digital Curation Foundations
Digital Curation Foundations. Stephen Abrams. California Digital Library. January 20, 2015. (PDF).
Digital curation is a complex of actors, policies, practices, and technologies that enables meaningful consumer engagement with authentic content of interest across space and time. The UC Curation Center defines its mission in terms of digital curation, rather than digital preservation because that better expresses the need for coordinated activities of preservation of, and access to, managed assets. It also reflects the idea of ongoing enrichment of managed content rather than just maintaining the content over time. This should ideally start before the assets are created. The approach is more on services than systems, and those services should be delivered at the place they are needed.
The laws of library science deal with use, service to the users, and ongoing change. Every asset should be curated in order to be used. They should be able to be used when and where the user needs them and in accordance with the user's expectations. The digital curation activities must not only be sustainable, but capable of evolving to meet every changing needs as well as risks. This kind of service requires "administrative, financial, and professional support.
Curation decisions should be made with respect to an underlying theory or conceptual domain model based on first principles. The ultimate goal of digital curation is to deliver content. The digital curation field has reached a stage of maturity where it can usefully draw upon a rich body of theoretical research and practical experience.
Digital curation is a complex of actors, policies, practices, and technologies that enables meaningful consumer engagement with authentic content of interest across space and time. The UC Curation Center defines its mission in terms of digital curation, rather than digital preservation because that better expresses the need for coordinated activities of preservation of, and access to, managed assets. It also reflects the idea of ongoing enrichment of managed content rather than just maintaining the content over time. This should ideally start before the assets are created. The approach is more on services than systems, and those services should be delivered at the place they are needed.
The laws of library science deal with use, service to the users, and ongoing change. Every asset should be curated in order to be used. They should be able to be used when and where the user needs them and in accordance with the user's expectations. The digital curation activities must not only be sustainable, but capable of evolving to meet every changing needs as well as risks. This kind of service requires "administrative, financial, and professional support.
Curation decisions should be made with respect to an underlying theory or conceptual domain model based on first principles. The ultimate goal of digital curation is to deliver content. The digital curation field has reached a stage of maturity where it can usefully draw upon a rich body of theoretical research and practical experience.
- The curation imperative: providing highly available, responsive, comprehensive, and sustainable services for access to, and use and enhancement of, authentic digital assets over time.
- The primary unit of curation management is the digital object
- The true focus of curation is the underlying information meaning of the objects. "In other words, bits are the means, content is the ends."
- Creation / acquisition.
- Appraisal / selection.
- Preservation planning.
- Preservation intervention.
- Selection of appropriate curation service providers
- Appropriate micro services
Sunday, January 25, 2015
X-ray technique reads burnt Vesuvius scroll
X-ray technique reads burnt Vesuvius scroll. Jonathan Webb. BBC News. 20 January 2015.
Scientists are using a 3D X-ray imaging technique to read rolled-up scrolls buried by Mount Vesuvius that can distinguish the ink from the paper. The technique has identified a handful of Greek letters within a rolled-up scroll. [BYU has used multi-spectral imaging to read the blackened unrolled scroll fragments. More here.] The X-ray phase-contrast tomography technique looks at the bumps on the paper rather than chemicals in the ink that yielded the long-hidden letters. The letters are slightly raised, the ink never penetrated into the fibres of the papyrus, but sat on top of them. Curved letters that stand out from the papyrus fibres are easier to identify than square ones.
Scientists are using a 3D X-ray imaging technique to read rolled-up scrolls buried by Mount Vesuvius that can distinguish the ink from the paper. The technique has identified a handful of Greek letters within a rolled-up scroll. [BYU has used multi-spectral imaging to read the blackened unrolled scroll fragments. More here.] The X-ray phase-contrast tomography technique looks at the bumps on the paper rather than chemicals in the ink that yielded the long-hidden letters. The letters are slightly raised, the ink never penetrated into the fibres of the papyrus, but sat on top of them. Curved letters that stand out from the papyrus fibres are easier to identify than square ones.
Saturday, January 24, 2015
Video Games and the Curse of Retro
Video Games and the Curse of Retro. Simon Parkin. New Yorker. January 11, 2015.
Almost two and a half thousand MS-DOS computer games have been added to the Internet Archive game collection (which says that "Through the use of the EM-DOSBOX in-browser emulator, these programs are bootable and playable.") The archive has rescued historical games which are unplayable unless you also have the original hardware.
Video games are more prone to obsolescence than other digital products. When hardware and software change, many games become unplayable. Unlike other digital media, video games rely on audiovisual reproduction and on a computer’s ability to execute the coded rules and instructions. Game publishers may not have an incentive to maintain older games, so they become obsolete.
Britain’s National Media Museum established the National Videogame Archive, which aims to “preserve, analyse and display the products of the global videogame industry by placing games in their historical, social, political and cultural contexts.” The Internet Archive, by contrast, makes games playable online. The games are part of our social, political, and cultural context. “We risk ending up in a ‘digital dark age’ because so much material that defines our current era is immaterial and ephemeral.” This is the motivation for many video-game preservationists: save everything before it’s lost, and let the future decide what matters in the long run.
Almost two and a half thousand MS-DOS computer games have been added to the Internet Archive game collection (which says that "Through the use of the EM-DOSBOX in-browser emulator, these programs are bootable and playable.") The archive has rescued historical games which are unplayable unless you also have the original hardware.
Video games are more prone to obsolescence than other digital products. When hardware and software change, many games become unplayable. Unlike other digital media, video games rely on audiovisual reproduction and on a computer’s ability to execute the coded rules and instructions. Game publishers may not have an incentive to maintain older games, so they become obsolete.
Britain’s National Media Museum established the National Videogame Archive, which aims to “preserve, analyse and display the products of the global videogame industry by placing games in their historical, social, political and cultural contexts.” The Internet Archive, by contrast, makes games playable online. The games are part of our social, political, and cultural context. “We risk ending up in a ‘digital dark age’ because so much material that defines our current era is immaterial and ephemeral.” This is the motivation for many video-game preservationists: save everything before it’s lost, and let the future decide what matters in the long run.
Labels:
archives,
digital preservation,
game preservation
Friday, January 23, 2015
The Dataverse Network
The Dataverse Network. Harvard Dataverse Network. 2014.
The Dataverse Network is an open source application to publish, share, reference, extract and analyze research data. It facilitates making data available to others and to replicate work of other researchers. The network hosts multiple studies or collections of studies, and each study contains cataloging information that describes the data plus the actual data and complementary files.
The Dataverse Network project develops software, protocols, and community connections for creating research data repositories that automate professional archival practices, guarantee long term preservation, and enable researchers to share, retain control of, and receive web visibility and formal academic citations for their data contributions.
The Dataverse Network is an open source application to publish, share, reference, extract and analyze research data. It facilitates making data available to others and to replicate work of other researchers. The network hosts multiple studies or collections of studies, and each study contains cataloging information that describes the data plus the actual data and complementary files.
The Dataverse Network project develops software, protocols, and community connections for creating research data repositories that automate professional archival practices, guarantee long term preservation, and enable researchers to share, retain control of, and receive web visibility and formal academic citations for their data contributions.
Thursday, January 22, 2015
Fighting entropy and ISIL, one image at a time
Fighting entropy and ISIL, one image at a time. Whitney Blair Wyckoff. FedScoop. December 10, 2014.
United States security is generating so much data that traditional disk media is being pushed to its limits, requiring new technologies to safely store all that information. Hitachi Data Systems has a new technology to preserve information on disks in an infinitely expandable array. This platform uses Blu-ray XL M-DISCs that resist environmental conditions and can last for more than 1,000 years. The M-DISC optical solutions have proven survivability and durability. This system represents both "the highest reliability as well as the lowest overall cost of ownership representing superior savings in power, footprint and data reliability."
The IT can supplement magnetic storage with optical media to create a preservation tier that enables IT managers to migrate data when they want, not when the technology or media forces them. This saves money and allows for more strategic long term planning. Flash media, magnetic tape storage, regular optical discs all are subject to deterioration and have short life spans. With additional storage servers, the amount of data that can be accessed in unlimited.
The system can preserve data for as long as necessary and access it whenever needed. Benefits provide lower operating costs through lower media migration costs, wider environmental storage requirements, migration-free technology upgrades and high media longevity and durability.
"The cost savings is stark while the possibility of data loss is virtually eliminated."
Labels:
digital preservation,
Millenniata,
storage,
tape
Wednesday, January 21, 2015
How one of the world’s largest archives is managing the move from parchment to pixels
How one of the world’s largest archives is managing the move from parchment to pixels. David Clipsham. Blog. January 16 2015.
The UK National Archives is to permanently preserve the records of the UK government that have been selected for their historic value. Because there was no authoritative source of information regarding file formats they developed PRONOM, a registry of file formats and the applications required to open and read them, and DROID, a freely available open source tool to manage that data and information.There the approach to digital preservation, which they call parsimonious preservation, is essentially two principles:
The UK National Archives is to permanently preserve the records of the UK government that have been selected for their historic value. Because there was no authoritative source of information regarding file formats they developed PRONOM, a registry of file formats and the applications required to open and read them, and DROID, a freely available open source tool to manage that data and information.There the approach to digital preservation, which they call parsimonious preservation, is essentially two principles:
- Understand what you have got
- Keep it safe
Labels:
archives,
digital preservation,
preservation tools
Creating and Archiving Born Digital Video
Creating and Archiving Born Digital Video. Library of Congress. December 2, 2014.
Four PDF documents from the Library of Congress / The FADGI Audio-Visual Working Group. They provide practical technical information for both file creators and file archivists to help them make informed decisions when creating or archiving born digital video files and to understand the long term consequences of those decisions.
Four PDF documents from the Library of Congress / The FADGI Audio-Visual Working Group. They provide practical technical information for both file creators and file archivists to help them make informed decisions when creating or archiving born digital video files and to understand the long term consequences of those decisions.
- Part 1. Introduction. Explanatory document.
These recommended practices are intended to support informed decision-making and guide file creators and archivists as they seek out processes, file characteristics, and other practices that will yield files with the greatest preservation potential.
The documents and case histories show that there is no one answer to the question “what format should I use to ensure sustainable long term access for my born digital video files?” Instead, there is "a range of solutions based on the fitness for purpose concept where the workflows and deliverables achieve the specific goals set out for the project within the existing constraints and circumstances." - Part 2. Eight Federal Case Histories. This report presents eight case histories documenting the current state of practice in six federal agencies working with born digital video, divided into 3 creating cases, and three archiving cases.The goal of the three Creating case histories is to encourage a thoughtful approach from the very beginning of the video production project, which takes sustainability and interoperability into account. The three Archiving case histories show issues of moving the files into repositories, and explore the issues of long term retention and access. The report contains recommended practices, requirements, advice, examples of when following recommended practices is not practical, costs, and lessons learned. At the end are helpful File Characteristic Comparison Tables summarizing the specifications of the creating and archiving case histories, both video and audio data.
- Part 3. High Level Recommended Practice. This document outlines a set of high level recommended practices for creating and archiving born digital video, with advice for file creators, archivists, and advice for both that transcend life cycle points.Some important general points:
- Born digital video files should be the highest quality that the institution can afford to make and maintain over the long term.
- Project planning should include capabilities to create high quality digital video files and metadata from the outset
- One of the most important functions of archival repositories is to document their holdings.
- Identify the file characteristics at the most granular level possible, including the wrapper and video stream encoding
- It's essential in an archival environment to understand why changes to the technical characteristics of the file are needed and the impacts of these changes on the data.
- Equally as important is to document all the changes to order to document provenance.
- Create metadata to support life cycle management
- Plan for access: high quality born digital video files may need additional processing to be made widely available
- Part 4. Resource Guide. This document includes links to resources including those referred to in the case histories and recommended practices. Contains an excellent resource list to websites, documents, white papers, tools; they cover the areas of storage; transcoding / editing and other technical tools; inventorying and processing; digitizing, capture, preservation & quality control; authenticity, fixity & integrity; file naming; metadata; formats; standards; video creation; equipment and capture devices.
Subscribe to:
Posts (Atom)