How Load Affects Response Time

Concepts and Planning	<<	>>

How Load Affects Response Time

A server consists of hardware, including one or more CPUs with a particular architecture and processing speed, some amount of physical memory (RAM), one or more disk drives of certain speeds and sizes, and their controllers. This hardware, specifically the CPUs, RAM, and I/O subsystem, are the critical server hardware resources.

When a server responds to a user action or performs background actions, it uses each of these three resources to some degree. For example, responding to an open message request from a client may require several milliseconds of CPU processing time, one or more disk accesses, and enough memory to perform the operation.

When actions don't overlap in time, all the server hardware is dedicated to each action and the server is essentially idle between each action. Each action is completed as quickly as possible and doesn't need to wait for hardware resources to become available. In this case, the server is essentially unloaded.

When more users connect to the server or when many background actions are occurring, actions begin to overlap, and there is competition for the server hardware resources. Bottlenecks occur because the server must wait for hardware to become available so that it can complete its tasks. When this happens, a server is under load.

When a server is under load, actions may take longer to complete than if the server were unloaded. For user actions, this can result in increased response time for clients. If the server is under an excessive load, users may perceive the server as slow or unresponsive. This relationship between server load and client response time defines the number of users that a server can support.

Imagine a server with users who all perform the same actions, but their actions are evenly distributed over time. With only one user connected to the server, each user-initiated action is completed before the next one starts. The response times that a user experiences will be near the theoretical minimums possible for the client's hardware, server, and network.

At some point, as the average load on a server increases the response times move from acceptable to unacceptable. This crossover point defines the number of theoretical "average" users that the server can support.