Pandas Series - Selection and Indexing
Files associated with this lesson:
Lecture.ipynb
Pandas Series - Selection and Indexing¶
Pandas Series object acts in many ways like a one-dimensional NumPy array, and in many ways like a standard Python dictionary. If we keep these two overlapping analogies in mind, it will help us to understand the patterns of data indexing and selection in these data structures.
Hands on!¶
import pandas as pd
import numpy as np
The first thing we'll do is create again the Series
from our previous lecture:
data_dic = {
'Canada': 35.467,
'France': 63.951,
'Germany': 80.94,
'Italy': 60.665,
'Japan': 127.061,
'United Kingdom': 64.511,
'United States': 318.523
}
g7_pop = pd.Series(data_dic,
name='G7 Population in millions')
g7_pop
g7_pop['Canada']
g7_pop['Japan']
g7_pop['United Kingdom']
The following also works, but it's NOT recommended:
g7_pop.Japan
Slicing and multi-selection¶
Slicing also works, but important, in Pandas, the upper limit is also included:
g7_pop['Germany': 'Japan']
Multi indexing also works (similarly to numpy):
g7_pop[['Italy', 'France', 'United States']]
Indexing by sequential position¶
Indexing elements by their sequential position also works. In this case pandas evaluates the object received; if it doesn't exist as an index, it'll try by sequential position.
With sequential position the upper limit is not included.
g7_pop
g7_pop.iloc[0] # First element
g7_pop.iloc[-1] # Last element
Other examples:
g7_pop.iloc[2]
g7_pop.iloc[4]
g7_pop.iloc[2:4]
g7_pop.iloc[[3, 1, 6]]
Adding new elements to a Series
¶
In many cases we'll want to add new values to our Series
, to do that we can just simply index our Series
using the new index and then assigning a value to that index. Let's add two new records:
g7_pop['Brazil'] = 20.124
g7_pop['India'] = 32.235
g7_pop
Modifying Series
elements¶
g7_pop['Canada'] = 40.5
g7_pop
g7_pop['France'] = np.nan
g7_pop
Removing elements from a Series
¶
del g7_pop['Brazil']
g7_pop
del g7_pop['India']
g7_pop
Checking existance of a key (membership)¶
'France' in g7_pop
'Brazil' in g7_pop
Introducing loc
& iloc
¶
What's the problem with the indexing we've seen? It's not explicit. Pandas receives an element to index and it tries figuring out if we meant to select an element by its key, or its sequential position. Check out the following example:
s = pd.Series(
['a', 'b', 'c'],
index=[1, 2, 3])
s
s
What happens if we try indexing s[1]
, what should it return? a
or b
?
s[1]
In this case, the returned object is worked out by the index, not by the sequential position. But again, it's not intuitive or explicit.
Enter loc
and iloc
:
loc
is the preferred way to select elements in Series (and Dataframes) by their indexiloc
is the preferred way to select by sequential position
s.loc[1]
s.iloc[1]
g7_pop
g7_pop.iloc[-1]
g7_pop.iloc[[0, 1]]
Using our previous series:
g7_pop
g7_pop.loc['Japan']
g7_pop.iloc[-1]
g7_pop
g7_pop.loc['Canada']
g7_pop.iloc[0]
g7_pop.iloc[-1]
g7_pop.loc[['Japan', 'Canada']]
g7_pop.iloc[[0, -1]]
loc
& iloc
to modify Series
¶
g7_pop.loc['United States'] = 1000
g7_pop
g7_pop.iloc[-1] = 500
g7_pop
Introducing to Boolean arrays¶
Another way to select certain values within a Series
is using boolean arrays, also known as Conditional selection.
We can index our Series
using a list of boolean values:
g7_pop[[False, False, True, False, True, False, True]]
Or we can index our Series
using another Series
with boolean values:
condition = pd.Series([
False, False, True, False, True, False, True
], index=[
'Canada', 'France', 'Germany', 'Italy', 'Japan', 'United Kingdom', 'United States'
])
condition
g7_pop[condition]
On upcoming lectures we'll see how to use more complex conditional selections.
btc-market-price.csv
2017-04-02 00:00:00 | 1099.169125 | |
---|---|---|
0 | 2017-04-03 00:00:00 | 1141.813000 |
1 | 2017-04-04 00:00:00 | 1141.600363 |
2 | 2017-04-05 00:00:00 | 1133.079314 |
3 | 2017-04-06 00:00:00 | 1196.307937 |
4 | 2017-04-07 00:00:00 | 1190.454250 |
5 | 2017-04-08 00:00:00 | 1181.149838 |
6 | 2017-04-09 00:00:00 | 1208.800500 |
7 | 2017-04-10 00:00:00 | 1207.744875 |
8 | 2017-04-11 00:00:00 | 1226.617037 |
9 | 2017-04-12 00:00:00 | 1218.922050 |