- Published on

# Time series Analysis 101 - Part 1

data science- Authors
- Name
- Ndamulelo Nemakhavhani
- @ndamulelonemakh

Compiled by endeesa. Last update 23/April/2022

## 1. Pandas refresher

The following notes will revisit a few pandas concepts which are important for doing time series analysis in Python. Suppose you created a pandas dataframe like the one shown in figure 1. The name assigned to the dataframe is *df*

Sales | |
---|---|

Date | |

22/04/2022 | 70 |

23/04/2022 | 50 |

24/04/2022 | 77 |

24/04/2022 | 90 |

These are some common operations that you might want to perform:

### index values of type string into datetime objects using pd_todatetime()

i. Convert```
df.index = pd.to_datetime(df.index)
```

### ii. Plot a time series on a line graph

```
# Produces a matplotlib line plot(s) using all the columns
df.plot(grid=True)
```

### iii. Index slicing

- Recall that slicing is used to filter a subset of the data based on the position
- Similarly you can slice pandas datetime indexes to filter data based on years, months, days etc.

```
# Pandas datetime indexing examples
# Filter by year 2022
timeSeries2012 = df['2022']
# Filter values from 2022 April
timeSeries2012May = df['2012-04']
```

- For more examples, visit: pandas datetime indexing tutorial

### iv. Frequency conversion

- Sometimes we may wish to downsample or updample readings into monthly, quarterly or yearly frequency
- This functionality can be easily obtained from the built-in pandas function pd.resample()

```
# Example: Convert daily readings to MONTHLY readings using the median
df.resample('M').median()
```

- Other popular frequencies: Q-quarter, D-day, W-week, A=year, T-minute etc.

### v. Merge multiple dataframes

```
# Assume we have another dataframe df2 similar to df
# We can merge the columns of these 2 dataframes as follows
df.join(other=df2, how='innner', on=None)
```

Note that if we don't specify the value for the

onargument, the two dataframes will be matched by the index. Read the docs for more info

- If instead, we wanted to merge the rows, we would use df.concat()

vi. Calculating correlation and autocorrelation

- Correlation is a simple measure that tells us whether the values between two columns vary together or not

```
# Assume you have a dataframe named 'stocks' with stock prices for microsoft and google
# The columns are named 'MSFT' and 'GOOGL' respectively
correlation = stocks['MSFT'].corr(stocks['GOOGLE'])
```

Typically when dealing with time series data. We do not calculate the correlation on the actual prices , but the percentage changes instead. Use the 'pct_change()' function to convert the values before computing the correlation.

- If we are interested in knowing the the correlation of a time series with a delayed version of itself, we can calculate the autocorrelation as follows:

```
# First convert the actual prices to returns
msft_returns = stocks['MSFT'].pct_change()
# Then compute the autocorrelation
msft_returns.autocorr()
```

## Knowledge check

Try putting the concepts covered above into practice with the following short exercise

- Download a the oil prices dataset from here
- Read the data into a pandas dataframe
- Set the index of the dataframe to be the date column
- Plot the oil prices from 2000 to 2020
- Create a new dataframe with 2019 data only and change the frequency to quarters
- Calculate lag 2 autocorrelation of the oil prices in 2019
- Plot the autocorrelation function of the oil prices for 2020(Optional)

Once completed, move on to

*Part 2*of this series where we will cover EDA(exploratory data analysis) methods applicable for time series data