What is Pandas?
The Pandas package is an open-source Python library which is the most important tool at the disposal of Data Scientists and Analysts working today. Pandas make data importing, analyzing, and visualizing a lot simpler. The build-on packages of Pandas like NumPy and Matplotlib give a single, advantageous, and convenient place to do most of the data analysis and visualization work easily.
Pandas is named after Panel Data (an econometric term) and stands for Python Data Analysis Library. Pandas is a fast library service and it has high-performance and productivity for the users. Python with Pandas is used in a wide range of fields like academic
and commercial domains including finance, economics, statistics,
analytics, etc.
Today's Agenda
Prerequisite
- Have access to a Linux-based system or a Windows-based system.
- Have Python installed over their systems. Check for python version using: python --version
- Have NumPy, pytz, python-dateutil, and setuptools installed beforehand.
- And finally, who are eager to learn and try such useful module.
Let's get started
Pandas comes with the Anaconda distribution.
It can be installed using the following command -
Pandas can be installed via pip from PyPI with the following command -
In order to import Pandas all you have to do is write the following code -pip install pandas
import pandas as pd
Step 2: Start knowing Data Structures
The two primary data structures of pandas are Series and Data Frame.
A Series is basically a column, and a DataFrame is a table with multiple columns made up of a collection of series.
These data structures are built on top of the Numpy array, which means they are fast.
Step 3: Creating a Series
A Series can be created using various inputs including - Lists, Arrays, Dict or Scalar Value.
- Creating a Pandas Series by passing a list of values. In this case, we are letting Pandas create a default integer index starting from 0.
import pandas as pd import numpy as np
series1 = pd.Series([2, 6, 10, np.nan, 3, 41]) print(series1)
It's Output will be as :
0 2
1 6
2 10
3 NaN
4 3
5 41
dtype: float64
- Creating a Pandas Series using an Array and adding a customized index. The index values must be unique and have the same length as the data.
import pandas as pd import numpy as np
data = np.array(['a', 'n', 's', 'h', 'i']) series2 = pd.Series(data, index = [10, 11, 12, 13, 14]) print(series2)
It's Output will be as:
10 a
11 n
12 s
13 h
14 i
dtype: object
- Creating a Pandas Series using a Dictionary. If we do not specify any index values, then the dictionary key values are taken into account as the sorted index values. In case we specify the index values, then the corresponding values are taken.
import pandas as pd import numpy as np
data = {'a': 7, 'b': 11, 'c': 3} series3 = pd.Series(data, index = ['b', 'c', 'd', 'a']) print(series3)
It's Output will be as : b 11.0
c 3.0
d NaN
a 7.0
dtype: float64
Note : Index order is persisted and the missing element is filled with NaN
- Creating a Pandas Series using a Scalar Value. When the data is input using a scalar value, an index is a must. The input value is repeated to match the length of the index.
import pandas as pd import numpy as np
series4 = pd.Series(7, index = [0, 1, 2]) print(series4)
It's Output will be as:
0 7
1 7
2 7
dtype: int64
Step 4: Creating a Data Frame
Features of DataFrame
- All columns can be of different data types.
- The size of a dataframe is mutable.
- It consists of labeled axes (rows and columns).
- We can perform various arithmetic operations on rows and columns.
A DataFrame can be created using various inputs including - Lists, Dict, Series, Numpy arrays, another DataFrame.
- Creating a Pandas Dataframe using a list of lists.
import pandas as pd
data = [['Ball', 50], ['Notebook', 120], ['Chips', 30]] df1 = pd.DataFrame(data, columns = ['Product', 'Price'], dtype: float) print(df1)
It's Output will be as :
Product Price
0 Ball 50.0
1 Notebook 120.0
2 Chips 30.0
Note - The dtype parameter for the "Price" column changes the datatype to floating point.
- Creating a Pandas indexed Dataframe using arrays.
import pandas as pd
data = {'Name': ['Rahul', 'Oscar', 'Stephen', 'Amar'], 'Age': [28, 34, 21, 25]} df2 = pd.DataFrame(data, index = ['Rank1', 'Rank2', 'Rank3', 'Rank4']) print(df2)
It's Output will be as :
Name Age
Rank1 Rahul 28
Rank2 Oscar 34
Rank3 Stephen 21
Rank4 Amar 25
Note - The index parameter, denoted by "Rank" assigns an index to each of the row
- Creating a Pandas Dataframe using a Dictionary.
import pandas as pd
data = {'Fruits': ['Apples', 'Mangoes', 'Oranges', 'Guavas'],
'Price/kg': [70, 120, 30, 65]}
print(df3)
It's Output will be as :
Fruits Price/kg
0 Apple 70
1 Mango 120
2 Orange 30
3 Guava 65
Comments
Post a Comment