What the heck is this 95th Percentile number?

The 95th percentile is a widely used mathematical calculation to evaluate the regular and sustained utilization of a network pipe. For example, it is the same calculation BBN/GTE uses to bill the Coop for its metered use of a T3. Many ISPs use it for capacity planning and/or calculating metered use.

Basically the 95th percentile says that 95% of the time, the usage is below this amount. Conversely of course, 5% of the time, usage is above that amount. The 95th percentile is a good number to use for planning so you can ensure you have the needed bandwidth at least 95% of the time.

There are three important factors to a percentile calculation:

Percentile number: A percentile basically says that for that percentage of the time, the data points are below the resulting value. So if we calculate a 50th percentile, 50% of the time the data points are below that resulting value and 50% of the time they are above that value. A 50th percentile is the same as a "median." An average, or "mean," is similar but a weighted result. A 95th percentile says that 95% of the time data points are below that value and 5% of the time they are above that value. 95 is a magic number used in networking because you have to plan for the most-of-the-time case. If networks were planned for mean or average use, they could be unusable (saturated) half the time. On the other hand, 100th percentile is a theoretically impossible goal because given no bottlenecks, the data will use the throughput available.
Data points used: A percentile is calculated on some set of data points. What those data points represent is significant to understanding the meaning of the percentile result. For example, percentile rankings of SAT scores indicate one's relative standing with others who took the test. Network percentiles are based on sampled throughput utilization. The sample rate indicates how accurate or forgiving the percentile is. The more frequent the sample rate, the more accurate and less forgiving the percentile will be. Coop MRTG data samples are collected every 10 minutes. As a count of bits over a 10 minute period, the data sample represents a 10 minute averaged bits per second value. It's averaged because we don't know the highs and lows within that 10 minute period. BBN/GTE uses a 15 minute sample interval. Some use a 5 minute sample interval.
Data set size: The data set size indicates the range of the values. Again using the SAT example, a percentile result has a different meaning if the data set is nationwide or just statewide. In network percentiles, the data set is a period of time over which samples are collected. Usually for any solid planning and trend determination, we need a reasonbly large data set to cover the peaks and valleys of utilization. A month of samples is the typical data set.

The Coop percentile calculation uses a 95th percentile on 10 minute averages (more on this below) over a period of 30 days. The calculation is made on the most recent 30 day period, so the result is a floating window result - not fixed to a calendar month.

So the percentile figure shown on the Coop MRTG graphs tells us that 95% of the time in the most recent 30 days, the bits per second utilization in the 10 minute interval is below the reported 95th percentile value.

How is the number actually calculated?

MRTG is a great program. Its data are automatically reduced over time to larger intervals to keep log files from growing without bound. This means that the 10 minute average numbers get reduced to 30 minute averages and then 2 hour averages after a while in the log file.

In an MRTG data file, the first 600 values are at the run interval (10 minutes at the Coop), the next 600 are reduced to 30 minute intervals, the next 600 are reduced to 2 hour intervals, and the rest are reduced to 24 hour intervals. We use 1360 MRTG data points which are:

600 at 10 minute intervals (6000 minutes)
600 at 30 minute intervals (18000 minutes)
160 at 2 hour intervals (19200 minutes)
Total of 43200 minutes or 30 days exactly

In order to preserve the data set as 10 minute samples and not skew the significance of the data to the most recent side of the period, the Coop percentile program repeats the reduced data as necessary to get the correct number of samples. For example, a 30 minute sample is repeated three times to be three equal 10 minute samples. Note that this correction was added 9/26/1998.

The traditional mathematical method for calculating a percentile assumes that your data set is so large that you can't store it all in memory and sort it. It uses "buckets" and calculates an "ogive" and then approximates the result through reverse interpolation. Since our data set is finite and small (relative to memory), we just do it straight forward:

collect the data set (two actually: inbound, outbound samples)
sort each data set
find the index of the 95th percentile element
print the larger of the inbound or outbound 95th percentile data element

Below is enough of the actual program for you to recreate the 95th percentile calculation on your own MRTG data sets.

#!/usr/local/bin/perl5
#
# Generate a percentile calculation from the most recent $samples
# in an MRTG log file.  This isn't the most accurate percentile because
# the sample interval changes twice in the data set.  Once we've got
# it, produce a GIF that represents (as an odometer-style number) the
# larger of the input or output values.

# Copyright 1997,1998: Labyrinth Computer Services, All Rights Reserved

require 5.003;

# Percentile to calculate
$PER=95;

# Program to generate output odometer (gif number)
$odometer	= 'path-to-odometer';

# MRTG data file (usually you get this from the CGI environment)
$file		= 'data.log';

open(FILE, "$file") || &Fatal("Couldn't open file: $file \n");

$last=<FILE>;		# throw away header line
$samples=1360;		# 30 days worth at 10 minute intervals in first 600
$bits=8;		# convert bytes to bits
@redux=(600,3,12);      # reduction (every N samples multiply by X, Y...)

for ( $i = 1; $_ = <FILE> ; $i++ ) {
	($in, $out) = (split)[1,2];
	push(@In,$in);
	push(@Out,$out);
        $factor = int($i/$redux[0]);
        if ($factor != 0) {
                foreach (1..$redux[$factor]) {
                        push(@In,$in);
                        push(@Out,$out);
                }
        }
}

close(FILE);

foreach $set ("In", "Out")
{
	@sorted = sort { $a <=> $b } (@$set);

	$idx = int(@$set * $PER / 100);
	$$set = sprintf("%05d", ($sorted[$idx] * $bits)/1000);
}

if ( $In > $Out )
{
	exec $odometer, $In;
} else
{
	exec $odometer, $Out;
}

Acknowledgements

The mathematical part of the program and this explanation were written by Barb Dijker. The integration with the output gif generator was written by Dworkin Muller. Barb did the integration of the program into MRTG. The percentile appears as a gif so that it is calculated only when you view the MRTG page. The calculation is too expensive (cpu and i/o) to calculate for each MRTG monitored port at each interval.