How to Apply Lambda & Apply Function in a Pandas Dataframe
Three strategies for creating a new Pandas DataFrame column from a calculation, and a comparison of performance
--
As always, documentation for myself made public for you…
The core learning here is to use cell magic and measure the execution time! It’s a terrific way to evaluate your code performance:
Table of Contents:
Strategy 1: Write a function, and apply that function.
Strategy 2: Write a Lambda, and apply that Lambda
Strategy 1.2: Write a better function, and apply that function.
Strategy 1: Write a function, and apply that function.
I’ve been working with a lot of canine health data lately, and one thing that has been fascinating to watch develop is the change “before COVID-19” vs. “after COVID-19”.
One of the ways to cut the data, therefore, is to tag each row (I’m dealing with a huge trough of daily data) with “before” or “after” what I see as the inflection date: 3/6/2020
So, like the Python/Pandas newb that I am, I wrote this function and then applied it to my multi-million row DF. Here’s how that goes:
First, write the function:
def covid_before_age(start_date) :
if start_date < pd.to_datetime('3/6/2020') :
covid_status = "before"
else :
covid_status = "after"
return covid_status
Second, apply the function:
df['covid_status'] = df.start_date.apply(covid_before_age)
The shape of my data frame:
df.shape
Cell magic says it takes this long to calculate:
28 minutes, 53 seconds. That’s a long time…