RemNote Community
Community

Preservation (library and archival science) - Digital Preservation Strategies

Understand digitization processes and challenges, preservation issues for born‑digital materials, and the balance between digital copies and original artifacts.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What is the definition of digitization?
1 of 10

Summary

Digitization and Digital Preservation Introduction Digitization—the conversion of analog materials into digital form—represents one of the most significant challenges facing modern libraries and archives. While digitization promises to make rare and fragile materials accessible to millions of people, it also raises profound questions about preservation, authenticity, and the role of original documents. This section explores both the benefits and the complex problems that arise when institutions attempt to transform their collections into digital formats. What Is Digitization? Digitization is the process of converting physical or analog materials—such as books, manuscripts, photographs, and documents—into digital formats, typically through scanning. The goal is to create a digital surrogate that preserves the intellectual and informational content of the original material. However, it's important to understand that digitization is not the same as preservation itself. Digitization creates a copy, not a preservation strategy by itself. A digital file is preserved only if institutions maintain it properly over time. The Challenges of Digitization Digitization sounds straightforward, but it faces several practical and technical obstacles: Cost of Digital Storage Digital files require significant storage infrastructure and ongoing maintenance costs. Creating millions of digital copies is expensive, and these files must be stored redundantly to prevent loss. The costs don't end after the initial scanning—institutions must continuously update storage systems and migrate files to new formats as technology evolves. Format Obsolescence Digital formats become outdated. A file created in a format that was standard 20 years ago may no longer be readable by modern software. This creates a fundamental problem: how can institutions guarantee that a digital file created today will still be accessible in 50 or 100 years? Unlike a physical book—which can be read with no technology—a digital file depends entirely on having compatible software and hardware available. Lack of Guaranteed Backward Compatibility As technology advances, older formats are often abandoned. There is no guarantee that new systems will be able to read old digital formats. For example, proprietary formats developed by specific companies may disappear entirely if those companies cease to exist or stop supporting the format. These challenges mean that digitization alone cannot be viewed as a complete preservation solution. Institutions must plan for ongoing management of digital materials, including periodic format migration and file verification. The Quality Versus Access Trade-off One of the most vexing decisions in digitization is balancing image quality against accessibility. This creates a genuine dilemma: Higher Quality, Higher Cost Scanning at higher resolutions produces images that capture more detail and are more valuable for future scholarly use. However, higher-resolution images take longer to scan, require more storage space, and cost more money. A high-resolution scan of a rare medieval manuscript might capture fine details that are crucial for future researchers, but an institution with limited resources might only be able to afford lower-resolution scans. The Fragility Problem Some materials are so fragile that the scanning process itself risks damage. Repeatedly opening a delicate 500-year-old book for high-quality scanning might cause tears, binding damage, or loss of ink. This creates an uncomfortable situation: do you risk damaging the original to create a high-quality digital copy, or do you make a lower-quality scan to protect the artifact? Libraries often resolve this tension by retaining the original (see "Retention of Original Copies" below), but the dilemma highlights a fundamental issue: digitization is not a risk-free process. Copyright Concerns Digitization must navigate complex copyright law. When an institution creates a digital copy of a published work, it may be creating a "reproduction" that falls under copyright protection. This is particularly challenging for: Published works still under copyright: Even if an institution owns a physical copy of a book, digitizing it may infringe on the copyright holder's exclusive right to make reproductions. Preservation purposes: Many copyright laws include "fair use" or exceptions for preservation purposes, but these exceptions are not universal or clearly defined. An institution digitizing materials for preservation may need to obtain explicit permission from copyright holders or rely on legal arguments that their use is permissible. Orphan works: Some materials have unknown copyright holders, making it legally risky to digitize them without permission. These copyright concerns significantly slow down digitization projects and require institutions to balance their preservation mission with respect for intellectual property rights. Born-Digital Materials: A New Preservation Problem What Are Born-Digital Materials? Born-digital materials are items that are created, stored, and used entirely in digital form from the outset. Unlike a digitized book (which is an analog work converted to digital), born-digital materials never existed in physical form. Examples include: Email archives Digital photographs taken with digital cameras Word processing documents Websites Audio and video files created digitally Software and databases The key distinction is important: if a document was originally printed and then scanned, it's a digitized material. If it was created as a Word document and never printed, it's born-digital. This distinction matters because born-digital materials present fundamentally different preservation challenges. Unique Preservation Challenges for Born-Digital Materials Born-digital materials face preservation challenges that don't apply to traditional print materials: Digital Decay and Media Failure Digital storage media (hard drives, solid-state drives, magnetic tape) are not permanent. Hard drives fail. Magnetic tape degrades over time. Bit rot—the gradual corruption of data—can occur even in well-maintained storage systems. Unlike a printed book, which can sit on a shelf indefinitely, digital files require active management and periodic testing to ensure they haven't degraded. Format Obsolescence The file formats used for born-digital materials can become obsolete. Someone creating a document in Microsoft Word 2000 format might find that modern Word versions no longer fully support that format, or that the file is unreadable in 30 years. This problem is especially acute for specialized formats (scientific data, CAD files, etc.) that may have limited software support. Software Dependencies Many born-digital materials can only be meaningfully accessed through specific software. A complicated Excel spreadsheet with macros, an interactive PDF, or a video file with proprietary codecs all depend on having the right software available. Preserving the intellectual content might require preserving not just the file, but also the software environment in which it was created. Rapid Proliferation Born-digital materials are being created at unprecedented rates. A single email account can contain thousands of messages. A digital photography project can generate millions of image files. Institutions cannot possibly preserve everything, forcing them to make difficult selection decisions. What Institutions Must Do Libraries and archives increasingly recognize that born-digital materials require: Institutional Policies and Procedures Institutions must develop explicit policies for which born-digital materials to preserve, how long to retain them, and what metadata to capture. These policies must address format selection, storage redundancy, and periodic review schedules. Staying Current With Standards Digital preservation is a rapidly evolving field. Standards and best practices change frequently. Preservation staff must commit to ongoing professional development and must stay informed about emerging technologies and methodologies. What is considered best practice today may be obsolete in five years. Dedicated Resources and Staff Born-digital preservation requires specialized expertise that libraries historically did not need. Institutions must hire or train staff with skills in data management, format migration, and digital systems management. Integrating Born-Digital Materials Into Overall Collection Management Born-digital materials should not be treated as a separate problem but should be integrated into an institution's overall preservation strategy. This requires: Consistent Access and Discovery Born-digital materials should be discoverable through the same catalogs and finding aids that users consult for traditional materials. A researcher looking for correspondence should be able to find both printed letters and digitized emails through a unified search interface. Collaborative Workflows Preserving born-digital materials requires collaboration across departments. Librarians who understand collections, information technology specialists who understand systems, and preservation experts must work together. A born-digital collection is not the responsibility of any single department; it requires institutional coordination. Selection Within Collections Just as librarians have always made decisions about which physical items to acquire and retain, they must now make decisions about which digital materials are worth long-term preservation. These decisions should follow the same intellectual principles that guide traditional collection development. The Critical Tension: Digital Surrogates Versus Original Materials Why Original Materials Still Matter Even as institutions invest heavily in digitization, there are compelling reasons why original materials cannot be abandoned: Scholarly Authentication Scholars—particularly in fields like history, literature, and diplomacy—require access to authentic original documents for several reasons: Citation and Verification: Scholars cite specific editions and versions of works. A digital surrogate, no matter how high-quality, is not the same as the original object that was cited. If a researcher needs to verify that a particular quote appears in a specific document, they may need to consult the original. Paleographic Analysis: Experts who study historical handwriting, printing techniques, or document materials need to examine originals. A digital image, no matter how detailed, cannot convey the texture of paper, the impressions made by writing instruments, or other physical characteristics that experts use to authenticate documents. Legal and Regulatory Requirements Legal systems often require retention of original documents as evidence. Court cases may require original signatures, documents, or other physical evidence. Tax records and financial documents often must be retained in original form to satisfy regulatory requirements. Institutions cannot discard originals and replace them with digital copies if legal standards require the originals themselves. Concerns About Digital Formats Even as technology advances, preservationists harbor legitimate concerns about relying exclusively on digital formats: Longevity Uncertainty No digital format has been proven to last more than a few decades. Printed books have survived for centuries or millennia; we have no evidence that digital files will be readable in 200 years, no matter how carefully they are maintained. The technology required to access digital files (software, hardware, operating systems) evolves constantly. Can we guarantee that in the year 2200, systems will exist to read files we create today? Quality Loss in Reformatting A digital copy, no matter how high-resolution, is fundamentally different from the original. A photograph captures two-dimensional image data but cannot convey the texture of aged paper, the weight of a book, the smell of old ink, or other sensory characteristics that are part of the original artifact's identity. Some scholars and preservationists argue that these material qualities are inseparable from the content itself. Incompleteness of Digital Reproductions Digital surrogates often lack complete metadata. Pages may be omitted from a scanning project. Binding information might not be captured in the digital image. A digital file represents a selection of the original object's characteristics—a selection determined by what the institution decided was worth capturing. Future researchers might discover that something important was overlooked. The Practice of Retaining Originals Due to these concerns, the contemporary best practice in preservation is dual retention: keeping both the original material and high-quality digital surrogates. This approach: Provides redundancy: If digital formats fail or become inaccessible, originals remain available Serves multiple purposes: Digital versions serve broad public access; originals serve scholarly authentication and legal requirements Allows for future digitization: As scanning technology improves, originals can be re-digitized at higher quality in the future Preserves material culture: The physical objects themselves are retained for study of their material properties This dual-retention approach is now standard in major research libraries, though it is expensive and requires significant storage space. The Philosophical Problem: Digitization and Material Culture There is a deeper philosophical tension in digitization that is worth understanding: The Content Versus Material Distinction Digitization assumes that the intellectual content of a book can be separated from its physical form. The goal of digitization is to capture the information (the text, images, data) while accepting the loss of the material (paper, binding, typeface as physical artifact). However, some scholars argue this distinction is false. For historical documents, literary manuscripts, and fine press books, the material form is part of the content. Consider: A medieval illuminated manuscript: The art, the specific inks used, the parchment texture, and the hand-written script cannot be fully understood from a digital image. The materiality of the object is central to its historical significance. First editions of important literary works: Scholars value the specific edition, with its particular typeface and binding, because that's the form readers encountered at a historically important moment. Personal documents: A handwritten letter, with its specific handwriting, paper, and ink, conveys emotional and forensic information that a digital transcription loses. Critics of aggressive digitization argue that institutions risk losing important information about material culture by privileging the intellectual content and neglecting the physical artifact. This debate remains unresolved in the preservation community. It explains why, despite digitization's many benefits, original materials remain valuable and institutions must maintain them. Summary: Digitization is not a magic solution to preservation challenges. It is one tool among many, useful for providing access and creating backup copies, but limited by technical, legal, and philosophical constraints. The most thoughtful preservation strategies treat digitization as a complement to—not a replacement for—maintaining original materials and developing institutional capacity for long-term digital stewardship.
Flashcards
What is the definition of digitization?
The process of converting analog materials into digital form, typically through scanning.
In the context of digitization, what is the trade-off regarding higher-quality images?
They take longer to scan but are more valuable for future use.
What legal area must be navigated when creating digital copies for preservation?
Copyright law.
How are born-digital materials defined?
Materials created, stored, and used entirely in digital form without an analog predecessor.
What are the specific risks associated with the preservation of born-digital content?
Digital decay Format obsolescence Storage media failure
Which groups must collaborate to ensure the preservation of born-digital items?
Librarians Information-technology (IT) staff Preservation specialists
What is the legal motivation for retaining original physical documents?
They are often mandated as evidence by legal regulations.
What is the purpose of keeping original materials after digitization in modern libraries?
To serve as a fail-safe copy.
What is the primary goal of digitization regarding a book's content?
To preserve the intellectual content while ignoring material characteristics.
What is the critical argument against separating a book's physical attributes from its text?
The physical attributes are inseparable from the textual content.

Quiz

Which of the following is NOT commonly cited as a challenge of digitization?
1 of 8
Key Concepts
Digital Preservation Concepts
Digital Preservation
Born‑Digital Materials
Format Obsolescence
Digital Decay (Bit Rot)
Metadata
Legal and Policy Considerations
Copyright in Digitization
Preservation Policy
Quality‑Access Trade‑off
Physical vs. Digital Preservation
Digitization Process
Digitization