RemNote Community
Community

Introduction to Queueing Theory

Learn the fundamentals of queueing theory, key performance measures such as utilization and Little’s Law, and common models with their real‑world applications.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What is the mathematical study of lines that form when a limited number of servers handle incoming customers or data?
1 of 19

Summary

Introduction to Queueing Theory What is Queueing Theory? Queueing theory is the mathematical study of waiting lines. Whenever customers, jobs, or data packets arrive at a system with limited capacity, they often must wait for service. Queueing theory helps us answer fundamental questions: How long will a typical customer wait? How many customers will be in the system at any given moment? And how should we allocate resources—hiring more staff, building faster servers, or adding more capacity—to keep delays acceptable? The theory is remarkably practical. Engineers use queueing models to schedule call-center staff, design routers and network switches, and optimize manufacturing assembly lines. At its heart, queueing theory builds mathematical models around two key random processes: the arrival process (how often customers show up) and the service process (how long it takes to serve each customer). The Arrival Process In a queueing system, customers arrive over time according to some pattern. Rather than assuming arrivals happen at fixed, predictable times, queueing theory typically models them as random events. The most common assumption is that arrivals follow a Poisson process. This means the number of arrivals in any fixed time interval follows a Poisson distribution. Poisson arrivals are widely used in practice because they capture three realistic features: Independence: whether a customer arrives in one time period has no effect on whether another arrives in a different period Constant rate: arrivals occur at a constant average rate, which we denote by $\lambda$ (the Greek letter "lambda") Rarity of simultaneous arrivals: the probability of multiple customers arriving at exactly the same instant is negligible When arrivals are Poisson-distributed with rate $\lambda$, there is an important consequence: the time between successive arrivals follows an exponential distribution. If a customer just arrived, the time until the next customer arrives is exponentially distributed with the same rate parameter $\lambda$. This exponential inter-arrival time is one of the defining features of Poisson arrivals. The arrival rate $\lambda$ has units of customers per unit time. For example, if $\lambda = 5$, then on average 5 customers arrive per hour. The Service Process Once a customer arrives, they must be served by one of the servers in the system. The service process describes how long this takes. In the simplest classical model, service times are exponentially distributed with average rate $\mu$ (the Greek letter "mu"). The service rate $\mu$ has the same units as $\lambda$: customers per unit time. If $\mu = 10$, the server completes an average of 10 customers per hour. An important property of exponential service times is the memoryless property: the probability that a customer's service will be completed in the next instant is constant, regardless of how long they have already been in service. This sounds strange, but it simplifies the mathematics considerably and is a reasonable approximation in many real systems. More advanced queueing models relax this assumption. Some systems have deterministic service times, where every customer takes exactly the same amount of time (like an automated assembly line). Others use general service time distributions, which can be any distribution at all. These more complex models are more realistic in some cases but also more difficult to analyze. Key Performance Measures Now that we understand arrivals and service, we need metrics to evaluate system performance. Queueing theory focuses on several key quantities. Utilization The utilization, denoted $\rho$ (the Greek letter "rho"), is the fraction of time the server is busy. It is defined as: $$\rho = \frac{\lambda}{\mu}$$ Utilization tells us how heavily the system is loaded. If $\rho = 0.5$, the server is busy half the time. If $\rho = 0.9$, it is busy 90% of the time. A crucial observation is that for the system to remain stable—that is, for the queue not to grow without bound—we need $\rho < 1$. If $\lambda \geq \mu$, customers arrive faster (on average) than they can be served, and the queue will grow indefinitely. This is why we call $\rho < 1$ the stability condition. Queue Length and Waiting Time Two central performance measures are: $L$: the average number of customers in the system (both waiting and being served) $W$: the average waiting time experienced by a customer These two quantities are linked by a remarkable result called Little's Law: $$L = \lambda W$$ Little's Law is one of the most important results in queueing theory because it holds for almost any steady-state queue, regardless of the specific arrival and service distributions. It says that the average number of customers in the system equals the arrival rate times the average time each customer spends in the system. Intuitively, if customers arrive at rate $\lambda$ and each one spends an average time $W$ in the system, the average occupancy must be $L = \lambda W$. Probability of Waiting In the classical single-server exponential model, there is a beautiful result: the probability that an arriving customer must wait (rather than being served immediately) equals the utilization $\rho$. This provides a direct way to translate utilization into a customer-facing metric. If $\rho = 0.7$, then 70% of arriving customers will have to wait. Classical Queueing Models To make practical use of queueing theory, we study specific models that combine different assumptions about arrivals and service. Each model yields closed-form formulas for $L$, $W$, and other measures. The M/M/1 Queue The simplest and most widely studied model is the M/M/1 queue: Markovian (Poisson) arrivals, Markovian (exponential) service times, and a single server. For the M/M/1 queue, the steady-state probability that the system contains exactly $n$ customers is: $$P(n) = (1 - \rho) \rho^n$$ From this distribution, we can derive: Average number in system: $$L = \frac{\rho}{1 - \rho}$$ Average waiting time in queue: $$Wq = \frac{\rho}{\mu(1 - \rho)}$$ Notice that as $\rho$ approaches 1, both $L$ and $Wq$ increase dramatically. When the server is nearly fully utilized, queues grow long and waits become very substantial. This is why maintaining $\rho$ well below 1 is important in practice. The M/M/c Queue When a system has multiple servers, we use the M/M/c model: Poisson arrivals, exponential service times, and $c$ identical servers. In an M/M/c queue, each server has its own utilization. If the total arrival rate is $\lambda$ and each server has capacity $\mu$, the utilization per server is: $$\rho = \frac{\lambda}{c\mu}$$ For stability, we still require $\rho < 1$, which means $\lambda < c\mu$—the total arrival rate must be less than the combined capacity of all servers. The probability that an arriving customer has to wait (rather than being served immediately by an idle server) is given by the Erlang-C formula. This formula is more complex than the M/M/1 result, but it is fundamental to telecom and call-center design, where it is used to set staffing levels to meet service-level targets. When $\rho < 1$, the system reaches a steady state with a finite average queue length and waiting time. Adding more servers substantially reduces waiting times compared to the single-server case with the same overall utilization. The M/D/1 Queue Some systems have deterministic service times, where each customer's service duration is fixed. The M/D/1 model assumes Poisson arrivals, deterministic (constant) service times, and one server. For the M/D/1 queue: Average number in system: $$L = \rho + \frac{\rho^2}{2(1 - \rho)}$$ Average waiting time in queue: $$Wq = \frac{\rho}{2\mu(1 - \rho)}$$ Compare the $Wq$ formula to the M/M/1 formula. For the same utilization $\rho$, the M/D/1 queue has half the waiting time of the M/M/1 queue! This illustrates an important principle: reducing variability in service times reduces waiting times. When service is deterministic (no variability), the queue is shorter than when service is exponentially distributed (high variability). The G/G/1 Queue Real systems rarely follow perfectly Poisson arrivals or perfectly exponential service times. The G/G/1 model (General arrivals, General service, one server) allows arbitrary inter-arrival time distributions and arbitrary service-time distributions. Unfortunately, there are no exact closed-form formulas for the G/G/1 queue. Instead, engineers use approximations. One widely used approximation is derived from the Pollaczek-Khinchine formula: $$Wq \approx \frac{\lambda(Ca^2 + Cs^2)}{2(1 - \rho)} \cdot \frac{1}{\mu}$$ where: $Ca^2$ is the squared coefficient of variation of inter-arrival times $Cs^2$ is the squared coefficient of variation of service times The coefficient of variation, $C^2$, is a normalized measure of variability. For example, $C^2 = 1$ corresponds to exponential (highly variable), and $C^2 = 0$ corresponds to deterministic (no variability). This formula reveals that waiting time increases with variability in both arrivals and service. If either process is bursty or unpredictable, queues grow longer. This approximation works well across many practical systems, even when arrivals and service are not Poisson or exponential. Applications of Queueing Theory Telecommunications Modern data networks carry packets through routers and switches. Each router has buffers (queues) to hold packets while they wait for transmission on an outgoing link. Queueing theory helps network engineers predict buffer occupancy, average packet delay, and the probability that a packet will be lost due to buffer overflow. Packet arrivals are often modeled as Poisson, and link transmission rates determine the service rate $\mu$. By applying M/M/1 or M/M/c models, engineers can size buffers and provision link capacities to meet delay and loss targets. <extrainfo> Service Industry Applications In call centers, retail stores, and other service settings, queueing theory informs staffing decisions. The probability of delay (which equals $\rho$ in an M/M/1 system) is a key metric. If management sets a target such as "90% of customers are served without waiting," this translates directly to a utilization target of $\rho = 0.9$. Using the M/M/c formulas, managers can then determine how many servers (staff) are needed to achieve this target given the expected arrival rate $\lambda$. </extrainfo>
Flashcards
What is the mathematical study of lines that form when a limited number of servers handle incoming customers or data?
Queueing theory
What are the three primary questions addressed by queueing theory?
How long a typical customer will wait How many customers will be in the system at any time How to allocate resources to minimize delays
Queueing models are built on which two random processes?
Arrivals of customers Service times of servers
If arrivals follow a Poisson distribution, how is the time between successive arrivals distributed?
Exponentially distributed
In a Poisson arrival process, what does the parameter $\lambda$ represent?
The constant average arrival rate (customers per unit time)
In a classic single-server exponential model, what does $\mu$ represent?
The average service rate (customers per unit time)
What is the implication of exponential service times regarding the probability of completing service?
The probability of completion in the next instant is constant over time
What is the formula for utilization $\rho$ in a single-server system?
$\rho = \lambda / \mu$ (where $\lambda$ is arrival rate and $\mu$ is service rate)
What condition must the utilization $\rho$ satisfy for a queueing system to be stable?
$\rho < 1$
What is the mathematical formula for Little’s Law?
$L = \lambda W$
What do the variables $L$, $\lambda$, and $W$ represent in Little’s Law?
$L$: Average number of customers in the system $\lambda$: Average arrival rate $W$: Average waiting time experienced by a customer
In a single-server exponential model, what is the probability that an arriving customer must wait?
Exactly $\rho$ (utilization)
What is the formula for the average number of customers $L$ in a steady-state M/M/1 queue?
$L = \rho / (1 - \rho)$
What is the formula for the average waiting time in the queue $Wq$ for an M/M/1 model?
$Wq = \rho / (\mu (1 - \rho))$ (where $\mu$ is the service rate)
What is the formula for utilization per server $\rho$ in a system with $c$ identical servers?
$\rho = \lambda / (c \mu)$
Which formula provides the probability that an arriving customer must wait in a multiple-server queue?
The Erlang-C formula
How does deterministic (fixed) service compare to exponential service regarding queue length?
It reduces variability, leading to shorter queues for the same utilization
What two factors related to variation influence the average waiting time in a general model?
Square-coefficient of variation of inter-arrival times ($Ca^2$) Square-coefficient of variation of service times ($Cs^2$)
In data network modeling, what physical component corresponds to the service rate?
Link capacity (measured in bits per second)

Quiz

In queueing theory, what does the term “arrivals” describe?
1 of 16
Key Concepts
Queueing Models
M/M/1 queue
M/M/c queue
M/D/1 queue
G/G/1 queue
Queueing Theory Concepts
Queueing theory
Poisson process
Exponential distribution
Little's Law
Erlang‑C formula
Pollaczek–Khinchine formula