pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})
Series is a single column of a DataFrame, it forms a list
pd.Series([1, 2, 3, 4, 5])
Meaning of this (129971, 14)
when running command df.shape
is “a DataFrame has 130,000 records split across 14 different columns, which almost 2 million entries”
iloc
-> index-based selection,
[row, column]
reviews.iloc[:, 0]
to retrieve all rows of the first columnreviews.iloc[:3, 0]
to retrieve rows from first(0) to third(2), fourth (3) not countedreviews.iloc[1:3, 0]
to retrieve rows from 2nd(1) and 3rd(2)reviews.iloc[-5:]
to retrieve the last 5 rowsloc
-> label-based selection
reviews.loc[0, 'country']
to retrieve the first row of country columniloc
compared to loc
, because it uses index position instead. But loc
provide easier ways to do things since it uses column name.reviews.loc[reviews.country == 'Italy']
-> use loc
for conditional selectionreviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]
for more than 1 condition, use the ampresand (&
) symbol, just like SQL but it’s only 1.or
we can change it to single pipe (|
) symbolreviews.loc[reviews.country.isin(['Italy', 'France'])]
to select country rows that states only Italy and France values.iloc
uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10
will select entries 0,...,9
.loc
, meanwhile, indexes inclusively. So 0:10
will select entries 0,...,10
.reviews['critic'] = 'everyone'
reviews['index_backwards'] = range(len(reviews), 0, -1)
Map
function -> it uses a single value from the Series and return a transformed version of that value, for example reviews.points.map(lambda p: p - 10)
it means the points
column value will be reinterpreted as p
subtracted with 10
for the whole rows.
Apply
function -> similarly with map
by function, what makes it differs is apply
uses calling a custom method/function for each row and transform the whole DataFrame, for example reviews.points.apply(some_method_name, axis='columns')