Digital library Study Guide
Study Guide
📖 Core Concepts
Digital Library – An online collection of digital objects (text, images, audio, video, etc.) organized for access, retrieval, and long‑term preservation.
Born‑Digital vs. Digitized – Born‑digital items are created in digital form; digitized items are digital copies of physical objects.
Interoperability – Ability of different library systems to exchange metadata and services via standards (e.g., OAI‑PMH, Z39.50).
Metadata – Structured data that describes a digital object (author, title, subject, format, rights). Essential for search and preservation.
Digital Preservation – Strategies (migration, emulation, bit‑stream preservation) that keep digital content usable over time.
Legal Deposit & Copyright – Laws requiring publishers to submit copies to national libraries; fair‑use and DMCA exemptions govern digital copying.
Recommender Systems – Algorithms (content‑based, collaborative, citation‑based) that suggest relevant resources to users.
---
📌 Must Remember
DELOS functional components: acquisition, storage, description, organization, navigation, access.
5S Theory elements: Streams (information flow), Structures (organization), Spaces (user context), Scenarios (tasks), Societies (social relations).
Key protocols: OAI‑PMH (metadata harvesting), Z39.50 (distributed search).
Preservation goals: Keep bits intact (checksums), keep formats readable (migration), keep original environment (emulation).
Fair‑use factors: purpose, nature, amount, market effect.
DMCA exemption for libraries: up to three copies of a work, one may be digital, for preservation.
Persistent identifiers: DOI, Handle – guarantee stable citation.
---
🔄 Key Processes
Metadata Harvesting (OAI‑PMH)
Repository exposes metadata via HTTP.
Harvester sends ListRecords request → receives XML records → stores locally for indexing.
Distributed Searching (Z39.05/Z39.50)
User query → parallel requests to multiple servers → results merged → duplicates removed → final ranked list returned.
Digital Preservation Workflow
Ingest → generate fixity checksum → store raw bit‑stream.
Monitoring → periodic checksum verification.
Migration → convert to current formats when old formats become obsolete.
Recommender Generation (Content‑Based)
Extract item attributes (keywords, subject ontology).
Compute similarity (e.g., cosine similarity) between user‑profile vector and item vectors.
Rank and present top‑N items.
Legal Deposit Submission
Publisher delivers copy → national library archives → metadata indexed → public access (subject to rights).
---
🔍 Key Comparisons
Digital Library vs. Virtual Library – Same origin, but virtual now denotes a portal aggregating distributed collections; digital emphasizes the underlying digital objects.
Keyword Search vs. Semantic Search – Keyword matches exact terms; semantic interprets concepts using ontologies (subject, community‑aware, bibliographic).
Migration vs. Emulation – Migration rewrites content into new formats; emulation recreates the original software/hardware environment.
Content‑Based vs. Collaborative Filtering – Content‑based uses item attributes; collaborative leverages similarity of user behavior.
---
⚠️ Common Misunderstandings
“Digitization equals preservation.” – Digitization creates access copies but does not solve long‑term format or bit‑stream decay.
“All born‑digital items need less metadata.” – They actually require richer metadata (creation tools, provenance) than simple scans.
“Open access removes all copyright concerns.” – OA works still require licensing (e.g., Creative Commons) and may be subject to publisher agreements.
“Distributed search always yields better results.” – It can suffer from inconsistent ranking across servers; harvested‑metadata search offers more control.
---
🧠 Mental Models / Intuition
Library as a “Pipeline”: acquisition → storage → description (metadata) → organization (indexes) → navigation → access. Think of each stage as a conveyor belt; a break in any stage stops the flow.
Preservation “Three‑Legged Stool”: Bit‑stream integrity, format readability, contextual information. All three must be maintained for true longevity.
Search as “Fishing”: Keyword search = casting a net for exact fish; semantic search = using sonar to locate fish based on shape (concept).
---
🚩 Exceptions & Edge Cases
Legal Deposit Exceptions: Some countries allow embargo periods; certain media (e.g., software) may be excluded.
DMCA Exemption Limits: Only three copies per work; cannot distribute the digital copy beyond preservation needs.
Metadata Quality: Born‑digital items may lack traditional bibliographic fields (e.g., edition); use technical metadata (file format, creation date) instead.
Preservation Costs: Small institutions may rely on cloud‑based trusted repositories rather than building in‑house infrastructure.
---
📍 When to Use Which
Choose OAI‑PMH when you need to harvest metadata for building a local search index or aggregating multiple repositories.
Use Z39.50 (distributed search) if real‑time querying of remote collections is required and you can tolerate heterogeneous ranking.
Apply Migration for widely used, stable formats (e.g., PDF‑/A) to future‑proof access.
Select Emulation for complex, interactive objects (software, games) where preserving behavior is essential.
Pick Content‑Based Filtering when item metadata is rich and you want recommendations without extensive user history.
Pick Collaborative Filtering when you have abundant user interaction data but sparse item metadata.
---
👀 Patterns to Recognize
“Access + Preservation = Metadata” – Whenever a new digital object is added, expect a metadata creation step followed by preservation (checksum, storage).
“Harvest → Index → Search” – Harvested‑metadata pipelines always include a local indexing stage before the user can search.
“Legal Deposit → National Repository → Public Portal” – Standard flow for government‑mandated collection of published works.
“Fair‑Use Defense = Transformative + Limited Market Impact” – Digitization projects that add searchability and do not replace the original market often succeed.
---
🗂️ Exam Traps
Distractor: “Digital libraries eliminate all preservation costs.” – Wrong; they shift costs to digital storage, migration, and integrity monitoring.
Distractor: “Semantic search only needs keyword matching.” – Incorrect; it relies on ontologies and concept mapping beyond literal terms.
Distractor: “OAI‑PMH is a search protocol.” – It is a harvesting protocol, not a real‑time search engine.
Distractor: “DMCA completely blocks any copying for libraries.” – The DMCA includes specific exemptions for nonprofit libraries.
Distractor: “All born‑digital items are automatically open access.” – False; rights and licensing still apply.
---
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or