What is the Statistics module in Python?
Python provides a built-in module to facilitate the working of mathematical statistics functions for real-valued (numeric) data. This module came into existence with Python 3.4 version. The input values in the statistics functions are not required to be sorted. The module is classified as - - Averages and Measures of the Central Location
- Measures of Spread
- Averages and Measures of the Central Location
- Measures of Spread
Today's Agenda
In this post, we will learn about the Averages and Measures of the Central Location function and their various types in Python. We will cover the following functions -- median()
- median_low()
- median_high()
- median_grouped()
These functions calculate the average for the entire population or sample data. You can also read about the Mean Functions, Mode Functions and Spread Functions.
- median()
- median_low()
- median_high()
- median_grouped()
Prerequisite
This post has been prepared for the audience who :
- Have access to a Linux-based system or a Windows-based system.
- Have Python 3 installed over their systems that can be used to run the code. Check for python version using: python --version
- And finally, who are eager to learn and try such useful functions.
- Have access to a Linux-based system or a Windows-based system.
- Have Python 3 installed over their systems that can be used to run the code. Check for python version using: python --version
- And finally, who are eager to learn and try such useful functions.
Let's get started
1. median() -->
Calculates the median (middle value) of the data. When the number of data values is odd, the middle data value is returned as the median value. While, when the number of data values is even, the median is computed by taking the average of the two middle data values.
Median is preferred when we have discrete data. The biggest advantage of using the median() function is that the data values passed in the function need not be sorted. In the case where the data is ordinal (i.e. supports order operations) but not numeric (i.e. does not support addition), we should consider using median_low() or median_high(). In the case where an empty dataset is passed, StatisticsError will be raised.
import statistics
# list of positive integer numbers with an odd count
dataset1 = [1, 3, 6, 5, 7, 8, 2, 3, 5]
dataset2 = [-8, -4, -9, -5, -2, -7]
b = statistics.median(dataset2)
dataset3 = (-6, -1, -9, 2, 8)
c = statistics.median(dataset3)
dataset4 = [D("0.5"), D("0.75"),D("0.625"),D("0.375")]
dataset5 = [F(5,2), F(7,1),F(6,5),F(37,8),F(3,9),F(8,4)]
# Printing the median
print("Median of dataset1 is :", a)
Median of dataset1 is : 5 Median of dataset2 is : -6.0 Median of dataset3 is : -1 Median of dataset4 is : 0.5625 Median of dataset5 is : 9/4
2. median_low() -->
Calculates the low median of the data. The low median value is always a constituent of the data values. When the number of data values is odd, the middle data value is returned as the median value. While when the number of data values is even, the median is the smaller one of the two middle values.
In the case where an empty dataset is passed, StatisticsError will be raised. The syntax will be the same as the median() function, only the function median_low() is used in place of median(). Considering the datasets for the median() function shown above, the median_low() output values are -
Median_low of dataset1 is : 5 Median_low of dataset2 is : -7 Median_low of dataset3 is : -1 Median_low of dataset4 is : 0.5 Median_low of dataset5 is : 2
3. median_high() -->
Calculates the high median of the data. The high median value is always a constituent of the data values. When the number of data values is odd, the middle data value is returned as the median value. While when the number of data values is even, the median is the larger one of the two middle values.
In the case where an empty dataset is passed, StatisticsError will be raised. The syntax will be the same as the median() function, only the function median_high() is used in place of median(). Considering the datasets for the median() function shown above, the median_high() output values are -
Median_high of dataset1 is : 5 Median_high of dataset2 is : -5 Median_high of dataset3 is : -1 Median_high of dataset4 is : 0.625 Median_high of dataset5 is : 5/2
4. median_grouped() -->
Calculates the median of a grouped continuous data. The median grouped returns the 50th percentile of the given data using Interpolation. In the case where an empty dataset is passed, StatisticsError will be raised.
The syntax will be the same as the median() function, only the function median_grouped() is used in place of median(). Considering the datasets for the median() function shown above, the median_grouped() output values are -
Median_grouped of dataset1 is : 4.75 Median_grouped of dataset2 is : -5.5 Median_grouped of dataset3 is : -1.0 Median_grouped of dataset4 is : 0.125 Median_grouped of dataset5 is : 2.0
In the median_grouped() function, we have an optional argument "interval". The value of this argument is 1 by default. When we change the value of the interval, the interpolation also changes.
import statistics
# list of positive integer numbers
dataset6 = [1, 2, 4, 4, 5, 7, 11]
f = statistics.median_grouped(dataset6, interval=1)
g = statistics.median_grouped(dataset6, interval=2)
h = statistics.median_grouped(dataset6, interval=3)
i = statistics.median_grouped(dataset6, interval=4)# Printing the median_grouped
print("Median_grouped with interval 1 :", f)
print("Median_grouped with interval 2 :", g)
print("Median_grouped with interval 3 :", h)
print("Median_grouped with interval 4 :", i)Median_grouped with interval 1 : 4.25 Median_grouped with interval 2 : 4.5 Median_grouped with interval 3 : 4.75 Median_grouped with interval 4 : 5.0
For more reference, you can visit the official document. You can also read about the Mean Functions, Mode Functions and Spread Functions.
Comments
Post a Comment