Intro to Pandas Series
Files associated with this lesson:
Lecture.ipynb
Intro to Pandas Series¶
A Series is a one-dimensional array-like object containing a typed sequence of values and an associated array of data labels, called its index.
Hands on!¶
import numpy as np
import pandas as pd
Series creation¶
pd.Series
' constructor accepts the following parameters:
- data: (required) has all the data we want to store on the Series and could be an scalar value, a Python sequence or an unidimensional NumPy ndarray.
- index: (optional), has all the labels that we want to assign to our data values and could be a Python sequence or an unidimensional NumPy ndarray. Default value:
np.arange(0, len(data))
. - dtype: (optional) any NumPy data type.
series = pd.Series([1, 2, 3, 4, 5])
series
Series have an associated type:
# Show first values of our Series
series.head()
series.dtype
series = pd.Series([1, 2, 3, 4, 5], dtype=np.float)
series
series.dtype
series = pd.Series(['a', 'b', 'c', 'd', 'e'])
series
# Using a ndarraynp.array([2, 4, 6, 8, 10
array = np.array([2, 4, 6, 8, 10])
series = pd.Series(array)
series
# With predefined index
series = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
series
# Using a dictionary (index will be defined using keys)
series = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}, dtype=np.float64)
series
Series attributes¶
These are the most common attributes to get information about a Series
:
series = pd.Series(data=[1, 2, 3, 4, 5],
index=['a', 'b', 'c', 'd', 'e'],
dtype=np.float64)
series
# Type of our Series
series.dtype
# Values of a series
series.values
type(series.values)
# Index of a series
series.index
# Dimension of the Series
series.ndim
# Shape of the Series
series.shape
# Number of Series elements
series.size
The Group of Seven¶
We'll start analyzing "The Group of Seven". Which is a political formed by Canada, France, Germany, Italy, Japan, the United Kingdom and the United States. We'll start by analyzing population, and for that, we'll use a pandas.Series
object.
# In millions
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])
g7_pop
Someone might not know we're representing population in millions of inhabitants. Series can have a name
, to better document the purpose of the Series:
g7_pop.name = 'G7 Population in millions'
g7_pop
Series are pretty similar to numpy arrays:
g7_pop.dtype
type(series.values)
g7_pop.ndim
g7_pop.shape
g7_pop.size
And they look like simple Python lists or Numpy Arrays. But they're actually more similar to Python dict
s.
g7_pop
g7_pop.index
Assigning Series
indexes¶
In contrast to lists, we can explicitly define the index:
g7_pop.index = [
'Canada',
'France',
'Germany',
'Italy',
'Japan',
'United Kingdom',
'United States',
]
g7_pop
Compare it with the following table:
Removing indexes¶
We can also remove current indexes from our Series
, going back to the original indexes. To do that we use the reset_index()
method with drop=True
parameter:
g7_pop
g7_pop.reset_index(drop=True)
g7_pop
Note that reset_index()
will return a new Series
, so if we want to keep it we need to assign it to a variable, or use inplace=True
parameter to modify the original Series
.
g7_pop.reset_index(drop=True, inplace=True)
g7_pop
Creating a Series
with indexes already¶
We can create a new Series
with its indexes labels in a single step:
values = [35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523]
indexes = ['Canada', 'France', 'Germany', 'Italy',
'Japan', 'United Kingdom', 'United States']
pd.Series(values,
index=indexes,
name='G7 Population in millions')
Creating a Series
from a data dictionary¶
We can say that Series look like "ordered dictionaries". We can actually create Series out of dictionaries:
data_dic = {
'Canada': 35.467,
'France': 63.951,
'Germany': 80.94,
'Italy': 60.665,
'Japan': 127.061,
'United Kingdom': 64.511,
'United States': 318.523
}
g7_pop = pd.Series(data_dic,
name='G7 Population in millions')
g7_pop
Creating a Series
out of other Series
¶
You can also create Series out of other series, specifying indexes:
pd.Series(g7_pop,
index=['France', 'Germany', 'Italy', 'Spain'])