RemNote Community
Community

Introduction to Computer Architecture

Understand the fundamentals of computer architecture, covering instruction set architecture, microarchitecture, memory hierarchy, and performance trade‑offs.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What is the primary focus of computer architecture studies?
1 of 23

Summary

Computer Architecture Overview Computer architecture is the study of how computers are organized internally and how their components work together to execute programs quickly and efficiently. Rather than just looking at what a computer does from the outside, architecture examines the internal design decisions that make execution fast, power-efficient, and cost-effective. The main goal of understanding computer architecture is to grasp why certain design choices matter: Why do we care about clock speed? How do computers actually execute the millions of instructions per second? What tradeoffs do engineers make when designing a computer? These questions are central to architecture. What Is the Instruction Set Architecture? The instruction set architecture (ISA) is a fundamental concept that acts as a contract between software and hardware. It specifies exactly which commands (instructions) a processor understands and how those instructions behave. Think of an ISA like a language. When you write code in Python, it gets compiled into machine code—a sequence of instructions. But here's the key: a program compiled for one ISA will not run directly on a machine with a different ISA. For example, code compiled for x86 processors (found in many laptops) will not run on ARM processors (found in most smartphones) without recompilation. The ISA defines: Instruction format: How instructions are encoded as bits Register size: How many bits each register holds Memory addressing: How addresses reference memory locations Available operations: What arithmetic, logical, and data movement operations exist Common ISAs you'll encounter include x86, ARM, and RISC-V. Each has different design philosophies but serves the same fundamental purpose: defining what instructions a processor can execute. Microarchitecture: Bringing Architecture to Life While the ISA specifies what instructions a processor understands, the microarchitecture specifies how the processor actually implements those instructions. This is an important distinction. Different microarchitectures can implement the exact same ISA while delivering different speeds, power consumption, and costs. For example, Intel and AMD both make x86 processors—they implement the same ISA—but their internal designs (microarchitectures) differ significantly, leading to performance differences. The Datapath: Core Processing Components The datapath is where actual computation happens. It consists of three main components: Arithmetic Logic Unit (ALU): This is the "calculator" of the processor. For every instruction that performs arithmetic or logical operations (addition, subtraction, AND, OR, etc.), the ALU is where that work gets done. Registers: These are tiny, extremely fast storage locations that hold temporary data during instruction execution. When an instruction needs to add two numbers, those numbers come from registers, the ALU computes the result, and the result goes back to a register. Registers are much faster to access than memory because they sit directly on the processor. Buses: These are the wiring that moves data between the ALU, registers, and memory. When you think of information flowing through a computer, you're thinking about data moving across buses. The image shows these components within the CPU (the blue region): the Control Unit directs operations, the ALU performs computations, and Registers store temporary values. Data moves between the CPU and Main Memory through buses. Control Logic: Orchestrating Execution The control logic (or control unit) is like the conductor of an orchestra. While the datapath performs calculations, the control logic determines which operations happen, when they happen, and in what order. For each instruction, the control logic must: Fetch the instruction from memory Decode what operation to perform Send signals to the ALU and registers telling them what to do Store the result back in a register or memory Without the control logic, the datapath would just sit idle. The control logic brings everything together into a coordinated sequence. Pipelines: Improving Performance Through Overlap A key insight in microarchitecture is that instructions don't have to complete one at a time. Pipelining allows multiple instructions to overlap in execution. Imagine an instruction execution process with stages: Fetch → Decode → Execute → Write Result. In a pipeline, while one instruction is in the Execute stage, another instruction can be in the Decode stage, and a third can be in the Fetch stage. This way, the processor completes multiple instructions per clock cycle instead of one instruction per several cycles. However, pipelines can "stall" when dependencies exist—for example, if instruction 2 needs the result from instruction 1, instruction 2 must wait. This is where branch prediction becomes important. When a program reaches a conditional jump (a branch), the processor doesn't know which direction it will go. Branch prediction guesses whether the jump will be taken or not, allowing the processor to speculatively fetch and execute instructions ahead of time. If the prediction is correct, execution continues without delay. If it's wrong, the incorrectly executed instructions are discarded. Out-of-order execution is another technique that improves pipeline efficiency. Modern processors can reorder instructions to ensure functional units stay busy. If instruction 2 depends on instruction 1, but instruction 3 is independent, the processor can execute instruction 3 before instruction 1 completes, as long as instruction 2 still waits for instruction 1. This keeps the hardware fully utilized. Memory Hierarchy One of the most important concepts in computer architecture is the memory hierarchy. It exists because of a fundamental physical reality: there's a tradeoff between speed and size. We can build memory that's very fast but very small, or memory that's large but slow. A well-designed computer balances these extremes. How the Hierarchy is Organized The memory hierarchy has multiple levels, organized from fastest/smallest to slowest/largest: Level 1 (Closest to CPU): Registers and Level 1 (L1) Cache - extremely fast, can deliver data in a single cycle, but can hold only kilobytes of data. Level 2-3: Level 2 (L2) and Level 3 (L3) Caches - still fast, but slower than L1, with larger capacity. L1 might be 32 KB, while L3 might be 8 MB. Level 4: Main Memory (RAM) - significantly slower than cache (dozens of cycles to access), but much larger (gigabytes). Level 5 (Farthest from CPU): Secondary Storage (SSD, Hard Drive) - extremely slow (millions of cycles), but huge capacity (terabytes). The key principle is: most data the CPU needs should be found at faster levels. The hierarchy accomplishes this through intelligent caching—frequently used data is automatically moved to faster levels. How Caches Work Data doesn't move between memory levels one byte at a time. Instead, it moves in fixed-size blocks called cache lines, typically 64 bytes. When the CPU needs one byte at a specific address, the entire cache line containing that byte is fetched and stored in the cache. This takes advantage of spatial locality: if a program accesses one memory location, it's likely to access nearby locations soon. Associativity describes how flexible cache placement is. In a fully associative cache, a data block can go anywhere in the cache. In a direct-mapped cache, each memory address maps to exactly one cache location. Most real caches use set-associative organization, where each address maps to one of a small number of locations (e.g., 8-way associative means 8 possible locations). Measuring Cache Effectiveness The hit rate is the fraction of memory accesses that are satisfied by a particular cache level without needing to go to a slower level. A 95% L1 cache hit rate means 95% of accesses find their data in L1, avoiding slower memory accesses. Higher hit rates are better—they mean the hierarchy is working effectively. Performance Evaluation How do we measure whether a computer architecture is "good"? This section covers the key metrics engineers use and the tradeoffs they navigate. Key Performance Metrics Clock Speed (measured in GHz) tells you how many clock cycles the processor completes per second. A 3 GHz processor completes 3 billion cycles per second. Higher clock speed means instructions complete faster, but it doesn't directly tell you how much work gets done—a slower clock with better pipelining might accomplish more. Instructions Per Cycle (IPC) measures how many instructions complete on average during each clock cycle. With perfect pipelining, a processor might achieve IPC = 4, meaning 4 instructions complete per cycle. Out-of-order execution and branch prediction help increase IPC by keeping pipelines full. Throughput combines clock speed and IPC to measure total work. It's often expressed as instructions per second (IPS): $$\text{Throughput (IPS)} = \text{Clock Speed (cycles/sec)} \times \text{IPC (instructions/cycle)}$$ A 3 GHz processor with IPC = 2 delivers 6 billion instructions per second—twice as much as a 3 GHz processor with IPC = 1. Power Efficiency evaluates how much computational work happens for each unit of energy consumed. As processors get faster, they consume more power, generating heat. Power efficiency is crucial for battery-powered devices and large data centers. Design Tradeoffs Architects constantly balance competing goals: Clock speed vs. Power: Increasing clock speed requires more power and generates more heat. A slower design might be more power-efficient. IPC vs. Complexity: Out-of-order execution and branch prediction increase IPC but add complexity, cost, and power consumption. Cache Size vs. Latency: Larger caches hold more data but take longer to access. The hierarchy aims to keep frequently used data in smaller, faster caches. These tradeoffs differ depending on the target application. A smartphone needs power efficiency. A gaming computer prioritizes throughput. A server needs both efficiency and throughput. Understanding these tradeoffs is central to good architecture design. <extrainfo> Modern System Design Multi-Core and Beyond Modern processors contain multiple cores—essentially multiple CPUs on a single chip. A quad-core processor has 4 independent cores, each capable of executing instructions simultaneously. This allows a computer to truly execute multiple instruction streams in parallel, rather than simulating parallelism through time-sharing. However, multiple cores introduce coordination challenges. When two cores need to access the same memory location, or when they need to share data, careful synchronization is required. Specialized Accelerators General-purpose CPUs are versatile but not optimal for all workloads. Graphics Processing Units (GPUs) are specialized processors designed for image and video processing. They excel at parallel computation—performing the same operation on thousands of data elements simultaneously. Similarly, Tensor Processing Units (TPUs) and other AI accelerators are designed specifically for machine learning workloads, where they can outperform general-purpose CPUs by orders of magnitude. Interconnect Networks When a system contains multiple cores, multiple caches, multiple accelerators, and memory controllers, they must be connected efficiently. Interconnect networks (sometimes called on-chip networks) handle this, routing data and control signals between components. Well-designed interconnects ensure that adding more cores continues to improve performance. </extrainfo>
Flashcards
What is the primary focus of computer architecture studies?
How a computer is organized and how its parts work together to execute programs efficiently.
What is the "contract" between software and hardware that specifies processor commands?
Instruction Set Architecture (ISA).
Why can't a program compiled for x86 run directly on an ARM machine?
They have different Instruction Set Architectures (ISAs).
How does microarchitecture relate to the Instruction Set Architecture?
It is the concrete implementation of the abstract features defined by the ISA.
Can the same Instruction Set Architecture be used by different microarchitectures?
Yes, allowing for variations in speed, power consumption, and cost.
What is the function of the Arithmetic Logic Unit (ALU) in a datapath?
It performs arithmetic and logical operations required by instructions.
What is the purpose of registers within the datapath?
To store temporary data used by the ALU during instruction execution.
What is the role of control logic in a computer system?
It orchestrates the sequence of operations the datapath performs for each instruction.
How do pipelines increase the overall throughput of a processor?
By allowing multiple instructions to overlap in execution.
What is the purpose of branch prediction in a CPU pipeline?
To guess the direction of conditional jumps and reduce pipeline stalls.
Which storage types are located closest to the CPU?
Registers Level 1 (L1) cache
What is the largest and slowest level of the memory hierarchy?
Secondary storage (e.g., Solid State Drives).
What is the primary design goal of the memory hierarchy?
To ensure data needed by the CPU is found quickly, minimizing trips to slower memory.
What are the fixed-size blocks of memory transferred between cache levels called?
Cache lines.
What does cache associativity determine?
How many places a particular memory block can reside within a cache set.
In computer memory, what does the hit rate measure?
The fraction of memory accesses satisfied by a particular cache level.
What does the metric "clock speed" represent?
The number of clock cycles a CPU completes each second.
What does the metric "Instructions Per Cycle" (IPC) measure?
The average number of instructions a CPU completes in a single clock cycle.
How is throughput typically expressed in processor performance evaluation?
Instructions per second.
What does power efficiency evaluate in a computer system?
The amount of computational work performed per unit of energy consumed.
What three main factors do designers balance to meet application requirements?
Clock speed Power consumption Cost
What is the primary purpose of Tensor Processing Units (TPUs)?
To accelerate machine-learning computations.
What is the function of interconnect networks in modern systems?
To link cores, accelerators, and memory to share data efficiently.

Quiz

Which components are included in the datapath?
1 of 3
Key Concepts
Computer Architecture Concepts
Computer architecture
Instruction set architecture (ISA)
Microarchitecture
Memory hierarchy
CPU Functionality and Performance
Datapath
CPU pipeline
Branch prediction
Cache memory
CPU performance metrics
Multi‑core processor