Python | Statistics Module | Spread Functions

What is the Statistics module in Python?

Python provides a built-in module to facilitate the working of mathematical statistics functions for real-valued (numeric) data. This module came into existence with Python 3.4 version. The input values in the statistics functions are not required to be sorted. The module is classified as -
Averages and Measures of the Central Location
Measures of Spread

Today's Agenda

In this post, we will learn about the Measures of Spread functions and their various types in Python. We will cover the following functions -
variance()
pvariance()
stdev()
pstdev()
These functions calculate a measure of the deviation of the entire population or sample data from the typical or average values. You can also read about the Mean Functions, Median Functions and Mode Functions.

Prerequisite

This post has been prepared for the audience who :
Have access to a Linux-based system or a Windows-based system.
Have Python 3 installed over their systems that can be used to run the code. Check for python version using: python --version
And finally, who are eager to learn and try such useful functions.

Let's get started

1. variance() -->
Calculates the variance (a measure of the deviation) of data. A higher variance value means that the data is spread or dispersed more, and a lower variance value means that the data is spread less (more close to the mean of the input values).

This function is used to calculate the variance of a sample of the population. To calculate the variance of the entire population, use the pvariance() function. For input to this function, there must be at least two real-valued numbers passed. In case a dataset with less than 2 values is passed, StatisticsError is raised.

This function basically calculates the second moment around the point that is not the mean. variance() function has an optional argument "xbar". The value of this argument is None by default. If the value is missing (or is None), then the mean is calculated automatically. Else if the value is given for xbar, then it should be the mean of the data.

The xbar value is passed as an argument to avoid recalculations. But, in case an incorrect mean is passed, then the variance() function does not verify it and thus it leads to an invalid or impossible variance value.

# Importing the statistics module
import statistics
from fractions import Fraction as F
from decimal import Decimal as D
  
# list of float numbersdata1 = [1.5, 5.75, 1.25, 2.25, 0.75, 4.25]
a = statistics.variance(data1)

# list of float numbers and mean
m = statistics.mean(data1)
b = statistics.variance(data1, m)

# list of decimal numbers
data2 = [D("10.5"), D("33.75"),D("0.625"),D("18.375")]
c = statistics.variance(data2)

# tuple of a set of fractional numbers
data3 = [F(5,2), F(7,1),F(6,5),F(37,8),F(3,9),F(8,4)]
d = statistics.variance(data3)
  
# Printing the variance
print("Variance of data1 is :", a)
print("Variance of data1 with mean is :", b)
print("Variance of data2 is :", c)
print("Variance of data3 is :", d)

OUTPUT

Variance of data1 is : 3.84375 Variance of data1 with mean is : 3.84375 Variance of data2 : 195.734375 Variance of data3 is : 522241/86400

2. pvariance() -->
Calculates the variance of an entire population. To calculate the variance of a sample of the population, use the variance() function. This function has all the workings, arguments and other details same as the variance() function.

The syntax will be the same as the variance() function, only the function pvariance() is used in the place of variance(). Considering the datasets for the variance() function shown above, the pvariance() output values are -

OUTPUT

PVariance of data1 is : 3.203125 PVariance of data1 with mean is : 3.203125 PVariance of data2 : 146.80078125 PVariance of data3 : 522241/103680

3. stdev() -->
Calculates the standard deviation (i.e the square root of the sample variance). This function has all the workings, arguments and other details same as the variance() function. The syntax will be the same as the variance() function, only the function stdev() is used in the place of variance(). Considering the datasets for the variance() function shown above, the stdev() output values are -

# Importing the statistics module
import statistics
from fractions import Fraction as F
from decimal import Decimal as D
  
# list of float numbersdata1 = [0.25, 6.5, 1.25, 12.25, 9.75, 18.25, 14.25] 
a = statistics.stdev(data1)

# list of float numbers and mean
m = statistics.mean(data1)
b = statistics.stdev(data1, m)

# list of decimal numbers
data2 = [D("5.25"), D("13.75"),D("0.625"),D("28.465")]
c = statistics.stdev(data2) 

# tuple of a set of fractional numbers
data3 = [F(3,6), F(9,1),F(17,5),F(22,8),F(4,9),F(8,1)]
d = statistics.stdev(data3)

# Printing the stdev
print("stdev of data1 is :", a)
print("stdev of data1 with mean is :", b)
print("stdev of data2 is :", c)
print("stdev of data3 is :", d)

OUTPUT

stdev of data1 is : 6.671947313369684 stdev of data1 with mean is : 6.671947313369684 stdev of data2 is : 12.23532896983158356795582080 stdev of data3 is : 3.682743235229075

4. pstdev() -->
Calculates the standard deviation (i.e the square root of the population variance). This function has all the workings, arguments and other details same as the pvariance() function. The syntax will be the same as the pvariance() function, only the function pstdev() is used in the place of stdev(). Considering the datasets for the stdev() function shown above, the pstdev() output values are -

OUTPUT

pstdev of data1 is : 6.177022927341128 pstdev of data1 with mean is : 6.671947313369684 pstdev of data2 is : 10.59610571153383690137078410 pstdev of data3 is : 3.3618692390575307

For more reference, you can visit the official document. You can also read about the Mean Functions, Median Functions and Mode Functions.