Subjects/Science/Computer and Information Science/Computer Science/Compiler

Compiler Study Guide

Study Guide

📖 Core Concepts Compiler – Translates source‑language code into a target language (usually lower‑level) that can be executed by a computer. Source language – The original programming language written by the developer. Target language – The language output by the compiler (machine code, assembly, bytecode, or another high‑level language). Interpreter – Executes source code directly (or after a quick bytecode step) without producing a permanent executable. Virtual Machine (VM) – An abstract runtime that runs bytecode; the VM hides hardware details from the program. Cross‑compiler – Generates code for a different CPU/OS than the one on which the compiler runs. Bootstrap compiler – A temporary compiler used to build a more optimized, permanent compiler for the same language. Transpiler / source‑to‑source compiler – Converts code from one high‑level language to another high‑level language. Just‑In‑Time (JIT) compilation – Compiles hot code paths to native machine code at run time, after an initial interpretation step. 📌 Must Remember Compilation direction: high‑level → low‑level (assembly, object code, or machine code). Major compiler phases: Preprocessing → Lexical analysis → Parsing → Semantic analysis → IR generation → Optimization → Code generation. Three‑stage architecture: Front end (syntax & semantics) → Middle end (architecture‑independent optimizations) → Back end (target‑specific optimizations & code emission). One‑pass vs. multi‑pass: One‑pass reads source once (simpler, faster); multi‑pass reads source/IR multiple times (enables more analysis & optimizations). Performance hierarchy: Native compiled code > JIT‑compiled code > Bytecode‑interpreted code > Pure interpretation. Cross‑compiler usage: Essential for embedded or foreign‑platform development where the target lacks a full toolchain. Compiler correctness: Proven by formal methods or extensive validation testing. 🔄 Key Processes Preprocessing – Handles macros (#define) and conditional compilation (#if). Lexical analysis – Scans characters, groups them into tokens (identifiers, literals, operators). Parsing – Builds a concrete syntax tree, then often an abstract syntax tree (AST). Semantic analysis – Performs type checking, scope resolution, builds the symbol table. IR generation – Converts AST to an intermediate representation (e.g., three‑address code). Optimization (middle end) – Local (basic block): dead‑code elimination, constant propagation. Procedural: inline expansion, loop transformation. Interprocedural: whole‑program analysis. Back‑end code generation – Maps optimized IR to target instructions, registers allocation, instruction scheduling, peephole rewrites. JIT workflow – Interpret bytecode → Profile hot spots → Compile hot sections → Replace interpreted version with native code. 🔍 Key Comparisons Compiler vs. Interpreter – Compiler: produces standalone target code before execution. Interpreter: runs source (or bytecode) directly, no permanent target file. Native vs. Cross Compiler – Native: target platform = host platform. Cross: target platform ≠ host platform. One‑pass vs. Multi‑pass – One‑pass: single read, limited analysis, faster compile time. Multi‑pass: multiple reads, richer analysis/optimizations, needed when later code depends on earlier declarations. Source‑to‑Source vs. Bytecode Compiler – Source‑to‑Source: outputs another high‑level language (often for portability). Bytecode: outputs a VM‑specific low‑level, platform‑independent representation. ⚠️ Common Misunderstandings “Compiled languages are always faster.” – Runtime JIT or high‑quality interpreters can narrow the gap. “Assemblers are compilers.” – Assemblers translate assembly to machine code; they do not perform high‑level analyses. “A language is either compiled or interpreted.” – Implementation choice (e.g., Python can be interpreted or JIT‑compiled). “Cross compilers only run on embedded systems.” – They are also used for cross‑platform desktop or mobile builds. 🧠 Mental Models / Intuition Pipeline model: Think of the compiler as an assembly line – each phase adds a layer of refinement, from raw text to optimized machine instructions. Hot‑spot hotspot: In JIT, “hot” code is like a popular road; the system builds a faster highway (native code) only where traffic is heavy. Front‑middle‑back analogy: Front end = inspection (what the code says); middle end = re‑design (make it better); back end = construction (build the final house on a specific lot). 🚩 Exceptions & Edge Cases Bootstrap compiler – Used temporarily; the final compiler may be built with its own output. Virtual‑machine target compilers – Not classified as native or cross because the VM abstracts the hardware. Generated C code – Often not human‑readable; formatting is ignored. 📍 When to Use Which Cross compiler needed when the target CPU/OS differs from development machine (e.g., building firmware for ARM on an x86 host). JIT preferred for long‑running applications with repeatable hot loops (e.g., web servers, scientific simulations). One‑pass compiler useful for simple languages or rapid‑compile scenarios where heavy optimization isn’t required. Multi‑pass compiler chosen when language semantics require forward references or when aggressive optimizations are desired. Source‑to‑source compiler chosen for portability or when leveraging an existing mature backend (e.g., transpiling to C for wider compiler support). 👀 Patterns to Recognize Presence of #ifdef / #define → preprocessing stage is in play. Repeated parsing errors on unknown identifiers → semantic analysis likely missing a symbol table entry. Performance boost after warm‑up → JIT compilation of hot code paths. Large generated .c files with unreadable formatting → source‑to‑source compilation targeting C. Error messages referencing “register allocation” → back‑end code generation stage. 🗂️ Exam Traps “Interpreters never use bytecode.” – Many interpreters first compile to bytecode (e.g., CPython). “Assemblers are a type of compiler.” – Assemblers lack high‑level analyses; they are separate tools. “Cross compilers are always slower than native compilers.” – Speed depends on algorithmic efficiency, not target relationship. “JIT always yields native‑speed performance.” – Initial warm‑up cost and profiling overhead can make short‑run programs slower than ahead‑of‑time compiled code. “If a language is high‑level, it must be interpreted.” – High‑level languages can be compiled (e.g., C, Go) or interpreted. --- Use this guide to quickly recall the most exam‑relevant facts, workflows, and decision points about compilers and their relatives.

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or