Data file Study Guide
Study Guide
📖 Core Concepts
Data file – a computer file that stores data for an application; it contains data only, not executable code.
Text file – stores data as human‑readable characters (ASCII/Unicode); each line ends with an End‑of‑Line (EOL) character.
Binary file – stores data in the same binary format the computer uses in memory; no EOL delimiters, no automatic translations.
Closed (proprietary) format – metadata is hidden; used to prevent tampering or easy import by competitors.
Open format – internal structure and metadata are published, allowing anyone to read/write the file.
Index file – separate file that holds pointers to locations in a data file, speeding up searches.
Indexed file – a data file that embeds its own index structure for rapid key‑based record retrieval.
Database – a collection of organized data files that support efficient storage, retrieval, and manipulation.
Serialisation – converting in‑memory structures into a storable representation (text or binary) for later reconstruction.
---
📌 Must Remember
Text files = human‑readable + line delimiters → possible EOL translations.
Binary files = machine‑readable + no delimiters → faster I/O.
Closed formats = hide metadata → protect data integrity & vendor lock‑in.
Open formats = publish metadata → promote interoperability.
An index file ≠ an indexed file (separate vs embedded index).
Serialisation is needed before writing complex structures to any file type.
---
🔄 Key Processes
Writing a Text File
Convert data to character strings.
Append appropriate EOL character for the target OS.
Perform any required EOL translation (e.g., \n → \r\n).
Writing a Binary File
Serialize data into binary representation (e.g., using struct.pack).
Write raw bytes directly; no line delimiters or translations.
Creating an Indexed File
Choose a key field for each record.
Build an internal index (e.g., B‑tree) mapping keys → file offsets.
Store index alongside records within the same file.
Using an Index File
Keep the data file unchanged.
Build a separate file containing offset pointers for each searchable record.
On a query, read the index file to locate the exact byte position, then fetch the record.
---
🔍 Key Comparisons
Text file vs Binary file
Readability: Text = human readable; Binary = not readable.
Performance: Binary = faster I/O; Text = slower due to delimiters & translations.
Portability: Text = more portable across platforms; Binary = may require endianness handling.
Closed format vs Open format
Metadata: Closed = hidden; Open = published.
Control: Closed = vendor‑controlled; Open = community‑controlled.
Index file vs Indexed file
Location: Index file = separate; Indexed file = internal.
Maintenance: Separate index can be rebuilt without touching data; embedded index must be updated with each data change.
---
⚠️ Common Misunderstandings
“Binary files are always smaller.” – Size depends on data representation; binary can be larger if not compacted.
“Text files never need serialization.” – Complex structures (e.g., objects) still require serialization to a textual format (JSON, XML).
“Closed formats are more secure.” – Obscurity is not security; they can still be reverse‑engineered.
“An index file automatically makes a file indexed.” – The data file must be accessed through the index; the file itself remains non‑indexed.
---
🧠 Mental Models / Intuition
“File as a book” – Text file = pages with visible line breaks; Binary file = a stack of raw pages where you need the exact page number (byte offset).
“Lock vs key” – Closed format = lock that only the creator has the key; Open format = lock with a publicly posted key.
“Map vs GPS” – Index file = external map you consult before traveling; Indexed file = built‑in GPS that knows the route instantly.
---
🚩 Exceptions & Edge Cases
End‑of‑Line translation differs by OS (Windows \r\n, Unix \n); reading a text file on a different platform may produce extra characters if translation is disabled.
Binary files and endianness – on little‑ vs big‑endian machines, raw binary numbers may need byte‑order conversion.
Open formats may still have versioning – new fields can be added; old parsers may ignore unknown metadata.
---
📍 When to Use Which
Use text files for configuration, logs, or data that humans must edit/read.
Choose binary files for large numeric datasets, images, or when I/O speed is critical.
Prefer open formats when data exchange with other applications or future portability is needed.
Opt for closed formats when protecting intellectual property or preventing accidental corruption is a priority.
Deploy an index file when you cannot modify the original data file (e.g., third‑party data).
Build an indexed file when you control the file format and need high‑speed key lookups.
---
👀 Patterns to Recognize
EOL‑related bugs → look for mismatched line endings in cross‑platform text file handling.
Performance slowdown → presence of line delimiters usually signals a text‑file bottleneck.
Metadata absence → likely a closed format; expect proprietary APIs.
Separate index file listed → expect two‑file handling logic in the code base.
---
🗂️ Exam Traps
“Binary files are always safer because they’re not human readable.” – Safety depends on access controls, not readability.
“If a file has an index, it must be an indexed file.” – Could be a separate index file; the data file may still be plain.
“Open formats cannot be proprietary.” – Some “open” formats have optional proprietary extensions; the core remains open.
“Serialisation only applies to text files.” – Serialization is required for both text (e.g., JSON) and binary (e.g., protobuf) outputs.
---
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or