Subjects/Science/Computer and Information Science/Computer Science/Data file

Data file Study Guide

Study Guide

📖 Core Concepts Data file – a computer file that stores data for an application; it contains data only, not executable code. Text file – stores data as human‑readable characters (ASCII/Unicode); each line ends with an End‑of‑Line (EOL) character. Binary file – stores data in the same binary format the computer uses in memory; no EOL delimiters, no automatic translations. Closed (proprietary) format – metadata is hidden; used to prevent tampering or easy import by competitors. Open format – internal structure and metadata are published, allowing anyone to read/write the file. Index file – separate file that holds pointers to locations in a data file, speeding up searches. Indexed file – a data file that embeds its own index structure for rapid key‑based record retrieval. Database – a collection of organized data files that support efficient storage, retrieval, and manipulation. Serialisation – converting in‑memory structures into a storable representation (text or binary) for later reconstruction. --- 📌 Must Remember Text files = human‑readable + line delimiters → possible EOL translations. Binary files = machine‑readable + no delimiters → faster I/O. Closed formats = hide metadata → protect data integrity & vendor lock‑in. Open formats = publish metadata → promote interoperability. An index file ≠ an indexed file (separate vs embedded index). Serialisation is needed before writing complex structures to any file type. --- 🔄 Key Processes Writing a Text File Convert data to character strings. Append appropriate EOL character for the target OS. Perform any required EOL translation (e.g., \n → \r\n). Writing a Binary File Serialize data into binary representation (e.g., using struct.pack). Write raw bytes directly; no line delimiters or translations. Creating an Indexed File Choose a key field for each record. Build an internal index (e.g., B‑tree) mapping keys → file offsets. Store index alongside records within the same file. Using an Index File Keep the data file unchanged. Build a separate file containing offset pointers for each searchable record. On a query, read the index file to locate the exact byte position, then fetch the record. --- 🔍 Key Comparisons Text file vs Binary file Readability: Text = human readable; Binary = not readable. Performance: Binary = faster I/O; Text = slower due to delimiters & translations. Portability: Text = more portable across platforms; Binary = may require endianness handling. Closed format vs Open format Metadata: Closed = hidden; Open = published. Control: Closed = vendor‑controlled; Open = community‑controlled. Index file vs Indexed file Location: Index file = separate; Indexed file = internal. Maintenance: Separate index can be rebuilt without touching data; embedded index must be updated with each data change. --- ⚠️ Common Misunderstandings “Binary files are always smaller.” – Size depends on data representation; binary can be larger if not compacted. “Text files never need serialization.” – Complex structures (e.g., objects) still require serialization to a textual format (JSON, XML). “Closed formats are more secure.” – Obscurity is not security; they can still be reverse‑engineered. “An index file automatically makes a file indexed.” – The data file must be accessed through the index; the file itself remains non‑indexed. --- 🧠 Mental Models / Intuition “File as a book” – Text file = pages with visible line breaks; Binary file = a stack of raw pages where you need the exact page number (byte offset). “Lock vs key” – Closed format = lock that only the creator has the key; Open format = lock with a publicly posted key. “Map vs GPS” – Index file = external map you consult before traveling; Indexed file = built‑in GPS that knows the route instantly. --- 🚩 Exceptions & Edge Cases End‑of‑Line translation differs by OS (Windows \r\n, Unix \n); reading a text file on a different platform may produce extra characters if translation is disabled. Binary files and endianness – on little‑ vs big‑endian machines, raw binary numbers may need byte‑order conversion. Open formats may still have versioning – new fields can be added; old parsers may ignore unknown metadata. --- 📍 When to Use Which Use text files for configuration, logs, or data that humans must edit/read. Choose binary files for large numeric datasets, images, or when I/O speed is critical. Prefer open formats when data exchange with other applications or future portability is needed. Opt for closed formats when protecting intellectual property or preventing accidental corruption is a priority. Deploy an index file when you cannot modify the original data file (e.g., third‑party data). Build an indexed file when you control the file format and need high‑speed key lookups. --- 👀 Patterns to Recognize EOL‑related bugs → look for mismatched line endings in cross‑platform text file handling. Performance slowdown → presence of line delimiters usually signals a text‑file bottleneck. Metadata absence → likely a closed format; expect proprietary APIs. Separate index file listed → expect two‑file handling logic in the code base. --- 🗂️ Exam Traps “Binary files are always safer because they’re not human readable.” – Safety depends on access controls, not readability. “If a file has an index, it must be an indexed file.” – Could be a separate index file; the data file may still be plain. “Open formats cannot be proprietary.” – Some “open” formats have optional proprietary extensions; the core remains open. “Serialisation only applies to text files.” – Serialization is required for both text (e.g., JSON) and binary (e.g., protobuf) outputs. ---

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or