Subjects/Math/Statistics and Discrete Math/Probability/Queueing theory

Introduction to Queueing Theory

Learn the fundamentals of queueing theory, key performance measures such as utilization and Little’s Law, and common models with their real‑world applications.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What is the mathematical study of lines that form when a limited number of servers handle incoming customers or data?

1 of 19

Summary

Introduction to Queueing Theory What is Queueing Theory? Queueing theory is the mathematical study of waiting lines. Whenever customers, jobs, or data packets arrive at a system with limited capacity, they often must wait for service. Queueing theory helps us answer fundamental questions: How long will a typical customer wait? How many customers will be in the system at any given moment? And how should we allocate resources—hiring more staff, building faster servers, or adding more capacity—to keep delays acceptable? The theory is remarkably practical. Engineers use queueing models to schedule call-center staff, design routers and network switches, and optimize manufacturing assembly lines. At its heart, queueing theory builds mathematical models around two key random processes: the arrival process (how often customers show up) and the service process (how long it takes to serve each customer). The Arrival Process In a queueing system, customers arrive over time according to some pattern. Rather than assuming arrivals happen at fixed, predictable times, queueing theory typically models them as random events. The most common assumption is that arrivals follow a Poisson process. This means the number of arrivals in any fixed time interval follows a Poisson distribution. Poisson arrivals are widely used in practice because they capture three realistic features: Independence: whether a customer arrives in one time period has no effect on whether another arrives in a different period Constant rate: arrivals occur at a constant average rate, which we denote by $\lambda$ (the Greek letter "lambda") Rarity of simultaneous arrivals: the probability of multiple customers arriving at exactly the same instant is negligible When arrivals are Poisson-distributed with rate $\lambda$, there is an important consequence: the time between successive arrivals follows an exponential distribution. If a customer just arrived, the time until the next customer arrives is exponentially distributed with the same rate parameter $\lambda$. This exponential inter-arrival time is one of the defining features of Poisson arrivals. The arrival rate $\lambda$ has units of customers per unit time. For example, if $\lambda = 5$, then on average 5 customers arrive per hour. The Service Process Once a customer arrives, they must be served by one of the servers in the system. The service process describes how long this takes. In the simplest classical model, service times are exponentially distributed with average rate $\mu$ (the Greek letter "mu"). The service rate $\mu$ has the same units as $\lambda$: customers per unit time. If $\mu = 10$, the server completes an average of 10 customers per hour. An important property of exponential service times is the memoryless property: the probability that a customer's service will be completed in the next instant is constant, regardless of how long they have already been in service. This sounds strange, but it simplifies the mathematics considerably and is a reasonable approximation in many real systems. More advanced queueing models relax this assumption. Some systems have deterministic service times, where every customer takes exactly the same amount of time (like an automated assembly line). Others use general service time distributions, which can be any distribution at all. These more complex models are more realistic in some cases but also more difficult to analyze. Key Performance Measures Now that we understand arrivals and service, we need metrics to evaluate system performance. Queueing theory focuses on several key quantities. Utilization The utilization, denoted $\rho$ (the Greek letter "rho"), is the fraction of time the server is busy. It is defined as: $$\rho = \frac{\lambda}{\mu}$$ Utilization tells us how heavily the system is loaded. If $\rho = 0.5$, the server is busy half the time. If $\rho = 0.9$, it is busy 90% of the time. A crucial observation is that for the system to remain stable—that is, for the queue not to grow without bound—we need $\rho < 1$. If $\lambda \geq \mu$, customers arrive faster (on average) than they can be served, and the queue will grow indefinitely. This is why we call $\rho < 1$ the stability condition. Queue Length and Waiting Time Two central performance measures are: $L$: the average number of customers in the system (both waiting and being served) $W$: the average waiting time experienced by a customer These two quantities are linked by a remarkable result called Little's Law: $$L = \lambda W$$ Little's Law is one of the most important results in queueing theory because it holds for almost any steady-state queue, regardless of the specific arrival and service distributions. It says that the average number of customers in the system equals the arrival rate times the average time each customer spends in the system. Intuitively, if customers arrive at rate $\lambda$ and each one spends an average time $W$ in the system, the average occupancy must be $L = \lambda W$. Probability of Waiting In the classical single-server exponential model, there is a beautiful result: the probability that an arriving customer must wait (rather than being served immediately) equals the utilization $\rho$. This provides a direct way to translate utilization into a customer-facing metric. If $\rho = 0.7$, then 70% of arriving customers will have to wait. Classical Queueing Models To make practical use of queueing theory, we study specific models that combine different assumptions about arrivals and service. Each model yields closed-form formulas for $L$, $W$, and other measures. The M/M/1 Queue The simplest and most widely studied model is the M/M/1 queue: Markovian (Poisson) arrivals, Markovian (exponential) service times, and a single server. For the M/M/1 queue, the steady-state probability that the system contains exactly $n$ customers is: $$P(n) = (1 - \rho) \rho^n$$ From this distribution, we can derive: Average number in system: $$L = \frac{\rho}{1 - \rho}$$ Average waiting time in queue: $$Wq = \frac{\rho}{\mu(1 - \rho)}$$ Notice that as $\rho$ approaches 1, both $L$ and $Wq$ increase dramatically. When the server is nearly fully utilized, queues grow long and waits become very substantial. This is why maintaining $\rho$ well below 1 is important in practice. The M/M/c Queue When a system has multiple servers, we use the M/M/c model: Poisson arrivals, exponential service times, and $c$ identical servers. In an M/M/c queue, each server has its own utilization. If the total arrival rate is $\lambda$ and each server has capacity $\mu$, the utilization per server is: $$\rho = \frac{\lambda}{c\mu}$$ For stability, we still require $\rho < 1$, which means $\lambda < c\mu$—the total arrival rate must be less than the combined capacity of all servers. The probability that an arriving customer has to wait (rather than being served immediately by an idle server) is given by the Erlang-C formula. This formula is more complex than the M/M/1 result, but it is fundamental to telecom and call-center design, where it is used to set staffing levels to meet service-level targets. When $\rho < 1$, the system reaches a steady state with a finite average queue length and waiting time. Adding more servers substantially reduces waiting times compared to the single-server case with the same overall utilization. The M/D/1 Queue Some systems have deterministic service times, where each customer's service duration is fixed. The M/D/1 model assumes Poisson arrivals, deterministic (constant) service times, and one server. For the M/D/1 queue: Average number in system: $$L = \rho + \frac{\rho^2}{2(1 - \rho)}$$ Average waiting time in queue: $$Wq = \frac{\rho}{2\mu(1 - \rho)}$$ Compare the $Wq$ formula to the M/M/1 formula. For the same utilization $\rho$, the M/D/1 queue has half the waiting time of the M/M/1 queue! This illustrates an important principle: reducing variability in service times reduces waiting times. When service is deterministic (no variability), the queue is shorter than when service is exponentially distributed (high variability). The G/G/1 Queue Real systems rarely follow perfectly Poisson arrivals or perfectly exponential service times. The G/G/1 model (General arrivals, General service, one server) allows arbitrary inter-arrival time distributions and arbitrary service-time distributions. Unfortunately, there are no exact closed-form formulas for the G/G/1 queue. Instead, engineers use approximations. One widely used approximation is derived from the Pollaczek-Khinchine formula: $$Wq \approx \frac{\lambda(Ca^2 + Cs^2)}{2(1 - \rho)} \cdot \frac{1}{\mu}$$ where: $Ca^2$ is the squared coefficient of variation of inter-arrival times $Cs^2$ is the squared coefficient of variation of service times The coefficient of variation, $C^2$, is a normalized measure of variability. For example, $C^2 = 1$ corresponds to exponential (highly variable), and $C^2 = 0$ corresponds to deterministic (no variability). This formula reveals that waiting time increases with variability in both arrivals and service. If either process is bursty or unpredictable, queues grow longer. This approximation works well across many practical systems, even when arrivals and service are not Poisson or exponential. Applications of Queueing Theory Telecommunications Modern data networks carry packets through routers and switches. Each router has buffers (queues) to hold packets while they wait for transmission on an outgoing link. Queueing theory helps network engineers predict buffer occupancy, average packet delay, and the probability that a packet will be lost due to buffer overflow. Packet arrivals are often modeled as Poisson, and link transmission rates determine the service rate $\mu$. By applying M/M/1 or M/M/c models, engineers can size buffers and provision link capacities to meet delay and loss targets. <extrainfo> Service Industry Applications In call centers, retail stores, and other service settings, queueing theory informs staffing decisions. The probability of delay (which equals $\rho$ in an M/M/1 system) is a key metric. If management sets a target such as "90% of customers are served without waiting," this translates directly to a utilization target of $\rho = 0.9$. Using the M/M/c formulas, managers can then determine how many servers (staff) are needed to achieve this target given the expected arrival rate $\lambda$. </extrainfo>

Flashcards

What is the mathematical study of lines that form when a limited number of servers handle incoming customers or data?

Queueing theory

What are the three primary questions addressed by queueing theory?

How long a typical customer will wait How many customers will be in the system at any time How to allocate resources to minimize delays

Queueing models are built on which two random processes?

Arrivals of customers Service times of servers

If arrivals follow a Poisson distribution, how is the time between successive arrivals distributed?

Exponentially distributed

In a Poisson arrival process, what does the parameter $\lambda$ represent?

The constant average arrival rate (customers per unit time)

In a classic single-server exponential model, what does $\mu$ represent?

The average service rate (customers per unit time)

What is the implication of exponential service times regarding the probability of completing service?

The probability of completion in the next instant is constant over time

What is the formula for utilization $\rho$ in a single-server system?

$\rho = \lambda / \mu$ (where $\lambda$ is arrival rate and $\mu$ is service rate)

What condition must the utilization $\rho$ satisfy for a queueing system to be stable?

$\rho < 1$

What is the mathematical formula for Little’s Law?

$L = \lambda W$

What do the variables $L$, $\lambda$, and $W$ represent in Little’s Law?

$L$: Average number of customers in the system $\lambda$: Average arrival rate $W$: Average waiting time experienced by a customer

In a single-server exponential model, what is the probability that an arriving customer must wait?

Exactly $\rho$ (utilization)

What is the formula for the average number of customers $L$ in a steady-state M/M/1 queue?

$L = \rho / (1 - \rho)$

What is the formula for the average waiting time in the queue $Wq$ for an M/M/1 model?

$Wq = \rho / (\mu (1 - \rho))$ (where $\mu$ is the service rate)

What is the formula for utilization per server $\rho$ in a system with $c$ identical servers?

$\rho = \lambda / (c \mu)$

Which formula provides the probability that an arriving customer must wait in a multiple-server queue?

The Erlang-C formula

How does deterministic (fixed) service compare to exponential service regarding queue length?

It reduces variability, leading to shorter queues for the same utilization

What two factors related to variation influence the average waiting time in a general model?

Square-coefficient of variation of inter-arrival times ($Ca^2$) Square-coefficient of variation of service times ($Cs^2$)

In data network modeling, what physical component corresponds to the service rate?

Link capacity (measured in bits per second)

Quiz

Introduction to Queueing Theory Quiz Question 1: In queueing theory, what does the term “arrivals” describe?

How often new customers show up in the system (correct)
How long each server takes to handle a customer
The length of the queue at any moment
The downtime of servers

Introduction to Queueing Theory Quiz Question 2: Which probability distribution is commonly assumed for arrivals in many practical queueing situations?

Poisson distribution (correct)
Normal distribution
Uniform distribution
Binomial distribution

Introduction to Queueing Theory Quiz Question 3: What does an exponential service time imply about the probability of completing service in the next instant?

The probability is constant over time (correct)
The probability increases as time passes
The probability decreases as time passes
The probability depends on the current queue length

Introduction to Queueing Theory Quiz Question 4: In the single‑server exponential model, what is the probability that an arriving customer must wait?

ρ (correct)
λ
μ
1 – ρ

Introduction to Queueing Theory Quiz Question 5: What is the steady‑state probability that the system contains n customers in an M/M/1 queue?

(1 – ρ) ρⁿ (correct)
(1 – ρ) ρⁿ⁻¹
ρⁿ
(1 + ρ) ρⁿ

Introduction to Queueing Theory Quiz Question 6: What is the formula for the average number of customers L in an M/M/1 queue?

L = ρ / (1 – ρ) (correct)
L = ρ · (1 – ρ)
L = (1 – ρ) / ρ
L = ρ² / (1 – ρ)

Introduction to Queueing Theory Quiz Question 7: How is the utilization per server ρ calculated in an M/M/c system?

ρ = λ / (c μ) (correct)
ρ = λ c / μ
ρ = λ / (μ c²)
ρ = c λ / μ

Introduction to Queueing Theory Quiz Question 8: Which formula gives the probability that an arriving customer must wait in an M/M/c queue?

Erlang‑C formula (correct)
Erlang‑B formula
Poisson‑C formula
Binomial‑C formula

Introduction to Queueing Theory Quiz Question 9: What are the core assumptions of the M/D/1 queue?

Poisson arrivals, deterministic service times, one server (correct)
Deterministic arrivals, exponential service times, one server
Poisson arrivals, exponential service times, multiple servers
Deterministic arrivals and deterministic service, multiple servers

Introduction to Queueing Theory Quiz Question 10: How does deterministic service affect queue performance compared to exponential service at the same utilization?

It reduces variability, leading to shorter queues (correct)
It increases variability, leading to longer queues
It has no effect on queue length
It makes queues unpredictable

Introduction to Queueing Theory Quiz Question 11: What characterizes a G/G/1 queue?

Arbitrary inter‑arrival and service‑time distributions (correct)
Only Poisson arrivals and exponential service
Only deterministic arrivals and service
Only exponential arrivals and deterministic service

Introduction to Queueing Theory Quiz Question 12: Why are approximations like the Pollaczek‑Khinchine formula used for G/G/1 queues?

Because closed‑form formulas generally do not exist (correct)
Because exact formulas are too simple
Because approximations are more accurate than exact results
Because simulations are impossible

Introduction to Queueing Theory Quiz Question 13: Which arrival assumption is often used for modeling random packet arrivals in data networks?

Poisson arrivals (correct)
Normal arrivals
Uniform arrivals
Deterministic arrivals

Introduction to Queueing Theory Quiz Question 14: In a service‑industry setting, a high probability of delay most directly indicates the need to:

Add more servers or staff to reduce waiting times (correct)
Lower product prices to attract additional customers
Offer larger discounts to increase sales volume
Reduce operating hours to cut costs

Introduction to Queueing Theory Quiz Question 15: Queueing theory is least directly applicable to which of the following?

Designing crop rotation schedules for agriculture (correct)
Staffing call‑center agents
Configuring computer network router buffers
Optimizing manufacturing assembly line throughput

Introduction to Queueing Theory Quiz Question 16: Which statement about the two stochastic components in a queueing model is true?

Both arrivals and service times are modeled as random processes (correct)
Only arrivals are random while service times are fixed
Only service times are random while arrivals occur at constant intervals
Both arrivals and service times are deterministic

In queueing theory, what does the term “arrivals” describe?

1 of 16

Key Concepts

Queueing Models

M/M/1 queue

M/M/c queue

M/D/1 queue

G/G/1 queue

Queueing Theory Concepts

Queueing theory

Poisson process

Exponential distribution

Little's Law

Erlang‑C formula

Pollaczek–Khinchine formula

Definitions

Queueing theory

The mathematical study of waiting lines and the allocation of resources to serve arriving customers or jobs.

Poisson process

A stochastic process in which events occur independently and at a constant average rate.

Exponential distribution

A continuous probability distribution characterized by a constant hazard (failure) rate.

Little's Law

A fundamental relationship stating that the long‑run average number in a system equals the arrival rate multiplied by the average time in the system (L = λW).

M/M/1 queue

A single‑server queueing model with Poisson arrivals and exponentially distributed service times.

M/M/c queue

A multi‑server queueing model with Poisson arrivals and exponential service times for each of c identical servers.

M/D/1 queue

A single‑server queueing model with Poisson arrivals and deterministic (fixed) service times.

G/G/1 queue

A single‑server queueing model with general (arbitrary) inter‑arrival and service‑time distributions.

Erlang‑C formula

An expression that gives the probability an arriving customer must wait in an M/M/c system.

Pollaczek–Khinchine formula

A formula that provides the average waiting time in an M/G/1 queue based on arrival and service‑time variability.