|
What the heck is this 95th Percentile number?The 95th percentile is a widely used mathematical calculation to evaluate the regular and sustained utilization of a network pipe. For example, it is the same calculation BBN/GTE uses to bill the Coop for its metered use of a T3. Many ISPs use it for capacity planning and/or calculating metered use. Basically the 95th percentile says that 95% of the time, the usage is below this amount. Conversely of course, 5% of the time, usage is above that amount. The 95th percentile is a good number to use for planning so you can ensure you have the needed bandwidth at least 95% of the time. There are three important factors to a percentile calculation:
The Coop percentile calculation uses a 95th percentile on 10 minute averages (more on this below) over a period of 30 days. The calculation is made on the most recent 30 day period, so the result is a floating window result - not fixed to a calendar month. So the percentile figure shown on the Coop MRTG graphs tells us that 95% of the time in the most recent 30 days, the bits per second utilization in the 10 minute interval is below the reported 95th percentile value. How is the number actually calculated?MRTG is a great program. Its data are automatically reduced over time to larger intervals to keep log files from growing without bound. This means that the 10 minute average numbers get reduced to 30 minute averages and then 2 hour averages after a while in the log file. In an MRTG data file, the first 600 values are at the run interval (10 minutes at the Coop), the next 600 are reduced to 30 minute intervals, the next 600 are reduced to 2 hour intervals, and the rest are reduced to 24 hour intervals. We use 1360 MRTG data points which are:
In order to preserve the data set as 10 minute samples and not skew the significance of the data to the most recent side of the period, the Coop percentile program repeats the reduced data as necessary to get the correct number of samples. For example, a 30 minute sample is repeated three times to be three equal 10 minute samples. Note that this correction was added 9/26/1998. The traditional mathematical method for calculating a percentile assumes that your data set is so large that you can't store it all in memory and sort it. It uses "buckets" and calculates an "ogive" and then approximates the result through reverse interpolation. Since our data set is finite and small (relative to memory), we just do it straight forward:
Below is enough of the actual program for you to recreate the 95th percentile calculation on your own MRTG data sets. #!/usr/local/bin/perl5 # # Generate a percentile calculation from the most recent $samples # in an MRTG log file. This isn't the most accurate percentile because # the sample interval changes twice in the data set. Once we've got # it, produce a GIF that represents (as an odometer-style number) the # larger of the input or output values. # Copyright 1997,1998: Labyrinth Computer Services, All Rights Reserved require 5.003; # Percentile to calculate $PER=95; # Program to generate output odometer (gif number) $odometer = 'path-to-odometer'; # MRTG data file (usually you get this from the CGI environment) $file = 'data.log'; open(FILE, "$file") || &Fatal("Couldn't open file: $file \n"); $last=<FILE> AcknowledgementsThe mathematical part of the program and this explanation were written by Barb Dijker. The integration with the output gif generator was written by Dworkin Muller. Barb did the integration of the program into MRTG. The percentile appears as a gif so that it is calculated only when you view the MRTG page. The calculation is too expensive (cpu and i/o) to calculate for each MRTG monitored port at each interval. |
Coop Home |
Services
| Members | Newsletters | Policy |
Pricing Copyright © 1998-2002, Colorado Internet Cooperative Association - All Rights Reserved |