Introduction to Computer Architecture
Understand the fundamentals of computer architecture, covering instruction set architecture, microarchitecture, memory hierarchy, and performance trade‑offs.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What is the primary focus of computer architecture studies?
1 of 23
Summary
Computer Architecture Overview
Computer architecture is the study of how computers are organized internally and how their components work together to execute programs quickly and efficiently. Rather than just looking at what a computer does from the outside, architecture examines the internal design decisions that make execution fast, power-efficient, and cost-effective.
The main goal of understanding computer architecture is to grasp why certain design choices matter: Why do we care about clock speed? How do computers actually execute the millions of instructions per second? What tradeoffs do engineers make when designing a computer? These questions are central to architecture.
What Is the Instruction Set Architecture?
The instruction set architecture (ISA) is a fundamental concept that acts as a contract between software and hardware. It specifies exactly which commands (instructions) a processor understands and how those instructions behave.
Think of an ISA like a language. When you write code in Python, it gets compiled into machine code—a sequence of instructions. But here's the key: a program compiled for one ISA will not run directly on a machine with a different ISA. For example, code compiled for x86 processors (found in many laptops) will not run on ARM processors (found in most smartphones) without recompilation.
The ISA defines:
Instruction format: How instructions are encoded as bits
Register size: How many bits each register holds
Memory addressing: How addresses reference memory locations
Available operations: What arithmetic, logical, and data movement operations exist
Common ISAs you'll encounter include x86, ARM, and RISC-V. Each has different design philosophies but serves the same fundamental purpose: defining what instructions a processor can execute.
Microarchitecture: Bringing Architecture to Life
While the ISA specifies what instructions a processor understands, the microarchitecture specifies how the processor actually implements those instructions. This is an important distinction.
Different microarchitectures can implement the exact same ISA while delivering different speeds, power consumption, and costs. For example, Intel and AMD both make x86 processors—they implement the same ISA—but their internal designs (microarchitectures) differ significantly, leading to performance differences.
The Datapath: Core Processing Components
The datapath is where actual computation happens. It consists of three main components:
Arithmetic Logic Unit (ALU): This is the "calculator" of the processor. For every instruction that performs arithmetic or logical operations (addition, subtraction, AND, OR, etc.), the ALU is where that work gets done.
Registers: These are tiny, extremely fast storage locations that hold temporary data during instruction execution. When an instruction needs to add two numbers, those numbers come from registers, the ALU computes the result, and the result goes back to a register. Registers are much faster to access than memory because they sit directly on the processor.
Buses: These are the wiring that moves data between the ALU, registers, and memory. When you think of information flowing through a computer, you're thinking about data moving across buses.
The image shows these components within the CPU (the blue region): the Control Unit directs operations, the ALU performs computations, and Registers store temporary values. Data moves between the CPU and Main Memory through buses.
Control Logic: Orchestrating Execution
The control logic (or control unit) is like the conductor of an orchestra. While the datapath performs calculations, the control logic determines which operations happen, when they happen, and in what order.
For each instruction, the control logic must:
Fetch the instruction from memory
Decode what operation to perform
Send signals to the ALU and registers telling them what to do
Store the result back in a register or memory
Without the control logic, the datapath would just sit idle. The control logic brings everything together into a coordinated sequence.
Pipelines: Improving Performance Through Overlap
A key insight in microarchitecture is that instructions don't have to complete one at a time. Pipelining allows multiple instructions to overlap in execution.
Imagine an instruction execution process with stages: Fetch → Decode → Execute → Write Result. In a pipeline, while one instruction is in the Execute stage, another instruction can be in the Decode stage, and a third can be in the Fetch stage. This way, the processor completes multiple instructions per clock cycle instead of one instruction per several cycles.
However, pipelines can "stall" when dependencies exist—for example, if instruction 2 needs the result from instruction 1, instruction 2 must wait. This is where branch prediction becomes important. When a program reaches a conditional jump (a branch), the processor doesn't know which direction it will go. Branch prediction guesses whether the jump will be taken or not, allowing the processor to speculatively fetch and execute instructions ahead of time. If the prediction is correct, execution continues without delay. If it's wrong, the incorrectly executed instructions are discarded.
Out-of-order execution is another technique that improves pipeline efficiency. Modern processors can reorder instructions to ensure functional units stay busy. If instruction 2 depends on instruction 1, but instruction 3 is independent, the processor can execute instruction 3 before instruction 1 completes, as long as instruction 2 still waits for instruction 1. This keeps the hardware fully utilized.
Memory Hierarchy
One of the most important concepts in computer architecture is the memory hierarchy. It exists because of a fundamental physical reality: there's a tradeoff between speed and size. We can build memory that's very fast but very small, or memory that's large but slow. A well-designed computer balances these extremes.
How the Hierarchy is Organized
The memory hierarchy has multiple levels, organized from fastest/smallest to slowest/largest:
Level 1 (Closest to CPU): Registers and Level 1 (L1) Cache - extremely fast, can deliver data in a single cycle, but can hold only kilobytes of data.
Level 2-3: Level 2 (L2) and Level 3 (L3) Caches - still fast, but slower than L1, with larger capacity. L1 might be 32 KB, while L3 might be 8 MB.
Level 4: Main Memory (RAM) - significantly slower than cache (dozens of cycles to access), but much larger (gigabytes).
Level 5 (Farthest from CPU): Secondary Storage (SSD, Hard Drive) - extremely slow (millions of cycles), but huge capacity (terabytes).
The key principle is: most data the CPU needs should be found at faster levels. The hierarchy accomplishes this through intelligent caching—frequently used data is automatically moved to faster levels.
How Caches Work
Data doesn't move between memory levels one byte at a time. Instead, it moves in fixed-size blocks called cache lines, typically 64 bytes. When the CPU needs one byte at a specific address, the entire cache line containing that byte is fetched and stored in the cache. This takes advantage of spatial locality: if a program accesses one memory location, it's likely to access nearby locations soon.
Associativity describes how flexible cache placement is. In a fully associative cache, a data block can go anywhere in the cache. In a direct-mapped cache, each memory address maps to exactly one cache location. Most real caches use set-associative organization, where each address maps to one of a small number of locations (e.g., 8-way associative means 8 possible locations).
Measuring Cache Effectiveness
The hit rate is the fraction of memory accesses that are satisfied by a particular cache level without needing to go to a slower level. A 95% L1 cache hit rate means 95% of accesses find their data in L1, avoiding slower memory accesses. Higher hit rates are better—they mean the hierarchy is working effectively.
Performance Evaluation
How do we measure whether a computer architecture is "good"? This section covers the key metrics engineers use and the tradeoffs they navigate.
Key Performance Metrics
Clock Speed (measured in GHz) tells you how many clock cycles the processor completes per second. A 3 GHz processor completes 3 billion cycles per second. Higher clock speed means instructions complete faster, but it doesn't directly tell you how much work gets done—a slower clock with better pipelining might accomplish more.
Instructions Per Cycle (IPC) measures how many instructions complete on average during each clock cycle. With perfect pipelining, a processor might achieve IPC = 4, meaning 4 instructions complete per cycle. Out-of-order execution and branch prediction help increase IPC by keeping pipelines full.
Throughput combines clock speed and IPC to measure total work. It's often expressed as instructions per second (IPS):
$$\text{Throughput (IPS)} = \text{Clock Speed (cycles/sec)} \times \text{IPC (instructions/cycle)}$$
A 3 GHz processor with IPC = 2 delivers 6 billion instructions per second—twice as much as a 3 GHz processor with IPC = 1.
Power Efficiency evaluates how much computational work happens for each unit of energy consumed. As processors get faster, they consume more power, generating heat. Power efficiency is crucial for battery-powered devices and large data centers.
Design Tradeoffs
Architects constantly balance competing goals:
Clock speed vs. Power: Increasing clock speed requires more power and generates more heat. A slower design might be more power-efficient.
IPC vs. Complexity: Out-of-order execution and branch prediction increase IPC but add complexity, cost, and power consumption.
Cache Size vs. Latency: Larger caches hold more data but take longer to access. The hierarchy aims to keep frequently used data in smaller, faster caches.
These tradeoffs differ depending on the target application. A smartphone needs power efficiency. A gaming computer prioritizes throughput. A server needs both efficiency and throughput. Understanding these tradeoffs is central to good architecture design.
<extrainfo>
Modern System Design
Multi-Core and Beyond
Modern processors contain multiple cores—essentially multiple CPUs on a single chip. A quad-core processor has 4 independent cores, each capable of executing instructions simultaneously. This allows a computer to truly execute multiple instruction streams in parallel, rather than simulating parallelism through time-sharing.
However, multiple cores introduce coordination challenges. When two cores need to access the same memory location, or when they need to share data, careful synchronization is required.
Specialized Accelerators
General-purpose CPUs are versatile but not optimal for all workloads. Graphics Processing Units (GPUs) are specialized processors designed for image and video processing. They excel at parallel computation—performing the same operation on thousands of data elements simultaneously.
Similarly, Tensor Processing Units (TPUs) and other AI accelerators are designed specifically for machine learning workloads, where they can outperform general-purpose CPUs by orders of magnitude.
Interconnect Networks
When a system contains multiple cores, multiple caches, multiple accelerators, and memory controllers, they must be connected efficiently. Interconnect networks (sometimes called on-chip networks) handle this, routing data and control signals between components. Well-designed interconnects ensure that adding more cores continues to improve performance.
</extrainfo>
Flashcards
What is the primary focus of computer architecture studies?
How a computer is organized and how its parts work together to execute programs efficiently.
What is the "contract" between software and hardware that specifies processor commands?
Instruction Set Architecture (ISA).
Why can't a program compiled for x86 run directly on an ARM machine?
They have different Instruction Set Architectures (ISAs).
How does microarchitecture relate to the Instruction Set Architecture?
It is the concrete implementation of the abstract features defined by the ISA.
Can the same Instruction Set Architecture be used by different microarchitectures?
Yes, allowing for variations in speed, power consumption, and cost.
What is the function of the Arithmetic Logic Unit (ALU) in a datapath?
It performs arithmetic and logical operations required by instructions.
What is the purpose of registers within the datapath?
To store temporary data used by the ALU during instruction execution.
What is the role of control logic in a computer system?
It orchestrates the sequence of operations the datapath performs for each instruction.
How do pipelines increase the overall throughput of a processor?
By allowing multiple instructions to overlap in execution.
What is the purpose of branch prediction in a CPU pipeline?
To guess the direction of conditional jumps and reduce pipeline stalls.
Which storage types are located closest to the CPU?
Registers
Level 1 (L1) cache
What is the largest and slowest level of the memory hierarchy?
Secondary storage (e.g., Solid State Drives).
What is the primary design goal of the memory hierarchy?
To ensure data needed by the CPU is found quickly, minimizing trips to slower memory.
What are the fixed-size blocks of memory transferred between cache levels called?
Cache lines.
What does cache associativity determine?
How many places a particular memory block can reside within a cache set.
In computer memory, what does the hit rate measure?
The fraction of memory accesses satisfied by a particular cache level.
What does the metric "clock speed" represent?
The number of clock cycles a CPU completes each second.
What does the metric "Instructions Per Cycle" (IPC) measure?
The average number of instructions a CPU completes in a single clock cycle.
How is throughput typically expressed in processor performance evaluation?
Instructions per second.
What does power efficiency evaluate in a computer system?
The amount of computational work performed per unit of energy consumed.
What three main factors do designers balance to meet application requirements?
Clock speed
Power consumption
Cost
What is the primary purpose of Tensor Processing Units (TPUs)?
To accelerate machine-learning computations.
What is the function of interconnect networks in modern systems?
To link cores, accelerators, and memory to share data efficiently.
Quiz
Introduction to Computer Architecture Quiz Question 1: Which components are included in the datapath?
- The arithmetic logic unit, registers, and buses (correct)
- The power supply, case fans, and heat sink
- The graphical user interface, driver software, and APIs
- The network router, switch, and firewall
Introduction to Computer Architecture Quiz Question 2: What must designers balance when creating a processor?
- Clock speed, power consumption, and cost (correct)
- The brand of the motherboard and the color of the case
- The number of Ethernet ports and Wi‑Fi standards
- The type of operating system and user interface theme
Introduction to Computer Architecture Quiz Question 3: What type of workload are graphics processing units (GPUs) specialized for?
- Accelerating image and video processing (correct)
- Managing network packet routing
- Controlling power supply regulation
- Encoding audio signals for telephony
Which components are included in the datapath?
1 of 3
Key Concepts
Computer Architecture Concepts
Computer architecture
Instruction set architecture (ISA)
Microarchitecture
Memory hierarchy
CPU Functionality and Performance
Datapath
CPU pipeline
Branch prediction
Cache memory
CPU performance metrics
Multi‑core processor
Definitions
Computer architecture
The study of how a computer’s components are organized and interact to execute programs efficiently.
Instruction set architecture (ISA)
The abstract contract defining a processor’s supported instructions, register sizes, and memory address formats.
Microarchitecture
The concrete hardware implementation of an ISA, determining a processor’s speed, power use, and cost.
Datapath
The collection of functional units, such as the ALU, registers, and buses, that move and process data within a CPU.
CPU pipeline
A technique that overlaps the execution stages of multiple instructions to increase instruction throughput.
Branch prediction
A hardware mechanism that guesses the outcome of conditional jumps to minimize pipeline stalls.
Memory hierarchy
A layered arrangement of storage from fast registers and caches to slower main memory and secondary storage.
Cache memory
Small, high‑speed memory that stores frequently accessed data to reduce access latency for the CPU.
CPU performance metrics
Quantitative measures like clock speed, instructions per cycle, and throughput that evaluate processor performance.
Multi‑core processor
A CPU design that integrates two or more independent cores to execute multiple instruction streams in parallel.