Understanding the Arithmetic Logic Unit

Core of the CPU’s Datapath

A detailed analysis of the CPU’s Arithmetic Logic Unit (ALU), detailing its architecture, flag registers, and operation, including specific examples of binary arithmetic.
Author

Chuck Nelson

Published

September 27, 2025

ALU

1 The Arithmetic Logic Unit (ALU): Core of the CPU’s Datapath

The Arithmetic Logic Unit (ALU) is the digital circuit within the Central Processing Unit (CPU) responsible for executing all arithmetic and logical operations. While the Control Unit (CU) orchestrates instruction flow and timing, the ALU is the engine that actually manipulates data. It is a combinatorial circuit, meaning its output is purely a function of its current input, without internal memory or state, aside from the temporary inputs provided by registers and control signals.

ALU complexity scales with the processor architecture (e.g., 8-bit, 32-bit, 64-bit). A modern 64-bit ALU typically consists of multiple parallel execution units to handle integer arithmetic, floating-point operations (often in a dedicated Floating-Point Unit, FPU), and vector processing (SIMD).

1.1 ALU Architecture and Components

A standard \(N\)-bit ALU operates on two \(N\)-bit operands (\(A\) and \(B\)) and produces an \(N\)-bit result (\(R\)).

1.2 Key Inputs and Outputs

Component Function Width
Operand A First data input (e.g., from an accumulator register) \(N\)-bit
Operand B Second data input (e.g., from a general-purpose register or memory) \(N\)-bit
Function Code (\(F\)) Control signals from the Control Unit specifying the operation (e.g., ADD, XOR, LSL) \(k\)-bit (where \(2^k\) is the number of operations)
Status Input (\(C_{in}\)) Carry-in bit, primarily for chained operations (e.g., multi-word addition) 1-bit
Result (\(R\)) The output of the executed operation \(N\)-bit
Status Outputs (Flags) A set of 1-bit flags indicating the outcome of the operation Varies (e.g., 4-6 bits)

1.3 Status Registers (Flags)

The output flags are critical components of the CPU’s Program Status Word (PSW) or Status Register. These flags enable conditional branching and multi-precision arithmetic. Typical flags include:

Flag Abbreviation Purpose
Zero Flag Z Set if the result \(R\) is \(0\).
Sign Flag N (Negative) Set if the Most Significant Bit (MSB) of \(R\) is \(1\) (indicating a negative two’s complement result).
Carry Flag C Set if an arithmetic operation generates a carry out of the MSB (used for unsigned overflow).
Overflow Flag V Set if a signed arithmetic operation results in an overflow (incorrect result in two’s complement).

1.4 Internal Structure

The internal structure of a simple ALU is often implemented using a large multiplexer (MUX) that selects the output from various parallel functional blocks (adders, logic gates, shifters) based on the \(F\) (Function Code) input:

  1. Arithmetic Block: Implemented using binary Full Adders for addition, subtraction (using two’s complement), and increment/decrement. High-performance ALUs employ Carry Lookahead Adders (CLA) or Carry Select Adders to minimize ripple delay.

  2. Logic Block: Implemented using combinations of basic gates (AND, OR, XOR, NOT) applied bit-wise across the \(N\)-bit operands.

  3. Shifter Block: Performs bit-shifting operations (Logical Shift Left/Right, Arithmetic Shift Right, Rotate Left/Right).

2 III. Demonstration of a Binary Arithmetic Operation

Let’s examine the operation of a 4-bit ALU performing the addition \(A + B\) where \(A = 0101_2 (5_{10})\) and \(B = 0011_2 (3_{10})\).

2.1 Subtraction via Two’s Complement

A common requirement is that the ALU must perform subtraction using only its addition circuitry.

Operation: \(A - B\) (e.g., \(5_{10} - 3_{10}\))

  1. Control Unit Setup: The Control Unit sets the \(F\) code to ‘ADD’ and activates the two’s complement logic on Operand B.

  2. Inversion (One’s Complement): The bits of \(B\) are inverted: \(0011 \rightarrow 1100\).

  3. Addition of 1: A \(C_{in}\) of \(1\) is injected into the least significant bit (LSB) of the Full Adder chain. This achieves \(A + (\sim B) + 1\), which equals \(A + (-B)\).

\[ \begin{aligned} & \quad 0101 \quad (A) \\ + & \quad 1101 \quad (\text{Two's Complement of } B=0011) \\ \hline \text{Carry Out} & \quad \mathbf{1}0010 \quad (R) \end{aligned} \]

Result Interpretation:

  • Result (\(R\)): The 4-bit result is \(0010_2\), which is \(2_{10}\).

  • Carry Flag (C): Set to 1 (indicating a carry out of the MSB, which is discarded in the 4-bit result, but essential for multi-word ops).

  • Zero Flag (Z): Reset to 0 (since \(R \ne 0\)).

  • Sign Flag (N): Reset to 0 (MSB is 0).

3 ALU Integration within the Fetch-Execute Cycle

The ALU is positioned within the CPU’s datapath, the network of registers and buses that move data. Its operation is critically dependent on the Control Unit (CU) and memory access.

3.1 The Datapath Context

During an instruction’s execution phase, the ALU’s operands are sourced from:

  1. General-Purpose Registers (GPRs): The most common source, providing fast access to operands \(A\) and \(B\).

  2. Immediate Data: Instruction fields containing constant values (e.g., ADD R1, R2, #5).

  3. Memory Data: Data fetched from the Data Cache/main memory and temporarily held in a Data Register.

3.2 The Fetch-Execute Cycle and the ALU

The ALU is silent during the Fetch and Decode phases but is paramount during Execute and Writeback.

Phase ALU Activity Data Flow Example (Instruction: ADD R1, R2, R3)
1. Fetch None (or PC Increment): The ALU may be used to increment the Program Counter (PC) by the instruction length (e.g., 4 bytes) to point to the next instruction. PC + 4 \rightarrow PC
2. Decode None: The Control Unit determines the operation code (ADD) and operand registers (R2, R3, R1). CU generates \(F\)-code for ADD.
3. Execute Core Operation: The Control Unit gates the contents of R2 and R3 onto the \(A\) and \(B\) input buses of the ALU, sets the \(F\)-code to ADD, and enables the ALU output. R2 \rightarrow A_{in}, R3 \rightarrow B_{in}, ALU calculates R2 + R3.
4. Writeback Result Transfer: The result \(R\) from the ALU output is written back to the destination register (R1), and the Status Flags are updated in the Status Register. R \rightarrow R1, ALU Flags \(\rightarrow\) Status Register.

3.3 Assembly Instruction to ALU Control Signal Mapping

The link between high-level computation and the ALU is the Instruction Set Architecture (ISA) and the Control Unit (CU). An assembly instruction, typically a mnemonic, directly maps to a specific operation code (opcode) and dictates how the CU sets the ALU’s \(F\) (Function Code) input.

Example: Logical AND Operation

Consider the assembly instruction: AND R1, R2, R3. This instruction performs a bitwise logical AND operation, setting the destination register \(R1\) equal to the contents of \(R2\) AND \(R3\).

Step Component Action Resulting ALU Control
1. Decode Control Unit Interprets the opcode AND. CU generates a function code \(F\) corresponding to the ALU’s Logic Block (e.g., \(F = 0010\)).
2. Execute Register File \(R2\) and \(R3\) are placed onto the ALU input buses (\(A\) and \(B\)). \(A \leftarrow (R2)\), \(B \leftarrow (R3)\).
3. Execute ALU Logic Block The ALU performs the bitwise \(A \text{ AND } B\) operation in parallel across all \(N\) bits. \(R = A \cap B\)
4. Writeback Status Register The Status Flags (Z, N, C, V) are updated based on the result \(R\). If \(R\) is zero, \(Z=1\); otherwise \(Z=0\).
5. Writeback Register File The result \(R\) is stored in the destination register \(R1\). \(R1 \leftarrow R\).

Numerical Example (32-bit ALU):

If we assume the register values are:

  • \(R2 = 0\text{x}FFFF0000\)

  • \(R3 = 0\text{x}0000FFFF\)

The ALU performs the bitwise AND:

\[ \begin{aligned} & 1111\;1111\;1111\;1111\;0000\;0000\;0000\;0000 \quad (R2) \\ \text{AND } & 0000\;0000\;0000\;0000\;1111\;1111\;1111\;1111 \quad (R3) \\ \hline & 0000\;0000\;0000\;0000\;0000\;0000\;0000\;0000 \quad (R) \end{aligned} \]

The result \(R\) is \(0\text{x}00000000\). Consequently, the Zero Flag (\(Z\)) would be set to \(1\). This flag setting is crucial for subsequent instructions, such as conditional branches (e.g., JNZ - Jump if Not Zero).

3.4 ALU and Dynamic Memory Address Calculation

The ALU is fundamental to the interface between the CPU’s internal datapath and the external Random Access Memory (RAM). It is tasked with calculating the Effective Address (EA), which is the final physical memory location to be accessed. This process requires precise arithmetic operations performed by the ALU.

The Effective Address Calculation Pipeline

For any memory access instruction (like LOAD or STORE), the CPU must perform a calculation before the address can be placed onto the system’s address bus:

\[\text{EA} = \text{Base Register Value} + \text{Offset Value}\]

This calculation is achieved by treating the Base Register (e.g., \(R2\)) and the Offset (e.g., \(8\)) as two \(N\)-bit operands (\(A\) and \(B\)) and having the ALU execute a simple addition (\(A+B\)).

Addressing Modes Dependent on ALU Arithmetic

Several critical memory addressing modes rely entirely on the ALU’s addition or subtraction capabilities:

  • Indexed or Base-Displacement Addressing: This is the most common mode for accessing arrays and data structures.

    • Instruction Example: LOAD R1, 8(R2) (Load R1 with the contents of the memory address found at \(R2 + 8\)).

    • ALU Operation: The CU feeds the contents of \(R2\) (the base address) and the immediate value \(8\) (the offset) into the ALU. The ALU calculates the sum.

  • PC-Relative Addressing (Control Flow): Used for short, position-independent jumps and branches.

    • Instruction Example: BNE Label (Branch if Not Equal). The Label is calculated as an offset relative to the current Program Counter (PC).

    • ALU Operation: The CU feeds \(PC\) (Base) and the immediate branch offset (Displacement) to the ALU. The ALU calculates \(PC + \text{Offset}\), generating the target address for the jump.

  • Register Indirect with Post-Increment/Decrement: Common in high-performance or embedded architectures (like ARM or MIPS) for iterating through memory (e.g., pop/push operations).

    • Instruction Example: LOAD R1, (R2)+ (Load R1 from the address in R2, then increment R2 by the data size, e.g., 4 bytes).

    • ALU Operation (Phase 1: Address Calculation): ALU calculates \(\text{EA} = (R2) + 0\) (no offset initially).

    • ALU Operation (Phase 2: Update): ALU calculates \((R2) + 4\) (Increment) and writes this new address back to \(R2\).

Flow of Address Data

Once the ALU produces the calculated Effective Address (\(R\)), the Control Unit directs this address through a dedicated internal bus to the Memory Address Register (MAR).

\[\text{ALU Result } R \rightarrow \text{MAR} \rightarrow \text{Address Bus}\]

The presence of the address in the MAR is the signal to the memory controller and the external RAM modules that an access operation (read or write) is about to commence at that specific physical location. This seamless, single-cycle calculation by the ALU is essential for high-speed memory access.

4 Advanced ALU Concepts: Pipelining and Parallelism

To maintain the high throughput required by modern processors (which execute multiple instructions per clock cycle), CPUs employ superscalar architectures that utilize multiple specialized execution units, often containing dedicated ALUs.

  • Integer ALUs (I-ALUs): These handle standard arithmetic, logic, and addressing calculations. A modern CPU core may have 3-4 I-ALUs operating in parallel.

  • Floating-Point Units (FPU): Specialized circuits optimized for floating-point arithmetic (IEEE 754 standard).

  • Vector/SIMD Units: Large ALUs capable of performing the same operation on multiple data elements simultaneously (Single Instruction, Multiple Data), critical for multimedia and graphics processing.

This parallelism is managed by the Control Unit’s scheduler, which identifies independent instructions in the pipeline and dispatches them concurrently to available ALUs, drastically increasing the CPU’s Instruction Per Cycle (IPC) rate.

In summary, the ALU is not just a calculator; it is a highly controlled, high-speed combinatorial engine that provides the essential arithmetic and logical backbone, enabling the CPU to transition from an instruction to an executed result within a single clock cycle.

Back to top