In one of the ML books I was going through from Packt Publishing (I can’t figure out which book it was–it has been a while), I saw a cool example of using dynamic time warping (DTW) for detecting the similarities of music tracks. Audio data is a type of time series. Comparing the distances between two time series at each point in time can provide a sense of whether the two music tracks (time series) are more or less similar depending on the distance.
I wondered what would happen if I applied DTW to some basic financial time series and came up with some interesting visualization results.
The Jupyter notebook and full source code can be accessed on GitHub:
We’ll use a couple of Python packages that you may need to install: pandas_datareader and dtw.
Assuming you have the Python package installer configured for your environment, you can easily install the extra packages:
pip install pandas_datareader pip install dtw
Import the packages
In a .py file or a Jupyter notebook, import the following:
This function loops through a list of ticker symbols and retrieves the time series data from Yahoo! Finance.
The following function takes 3 inputs:
- A Python list of ticker symbols.
- Start Date in Python datetime format.
- End Date in Python datetime format.
The function returns a Python dictionary containing time series data for each symbol. The keys to the dictionary are the symbols.
List of Ticker Symbols
Here is a list of some random symbols from different industries. Depending on your connection speed, it could take a while to download one year’s data for a given symbol. On my home network one symbol’s series only takes a couple of seconds while on a shared network the download can take as much as 10 seconds per symbol.
Get the Data Ready
For starters, let’s pick a couple of tickers from a similar industry: MSFT (Microsoft) and AAPL (Apple).
After we set the x and y symbols, convert the Pandas data frames into Numpy arrays. We will use the Numpy arrays for doing the number crunching and plotting.
Call the DTW Function
Here is where the number crunching occurs.
The dtw( ) function takes the two time series (x and y in this case) along with a function for calculating the distance between the two time series. In this case, the distance function is simply Numpy’s norm( ) function, but it is possible to use more sophisticated distance functions. (I’m not sure what’s out there in terms of alternatives, but I mention it because you will probably get significantly different outputs depending on the distance function.)
The first two input parameters are straightforward. Let’s look at the distance function (which is the third input parameter.) The ‘dist=’ is the input label and assignment operator. lambda is Python’s nice way of declaring that one or more variables or objects (such as a Python list) will need to be parsed by some other sort of function, in this case Numpy’s norm( ) function. lambda functions can be a bit confusing depending on other language backgrounds, but if you think of Python’s lambda as a nice way to avoid writing unnecessary for-loops, you should be in good shape.
The dtw( ) function (at least, the particular implementation we are using) returns the following:
- minimum distance
- the cost matrix
- the accumulated cost matrix
- wrap path.
We are only interested in the accumulated cost matrix which is derived from the distance function we passed into dtw( ). The minimum distance, which is a scalar rather than a matrix, could be interesting when comparing many time series.
We’ll use the matplotlib package. Seaborn is another visualization package that rides on top of matplotlib and provides a rich set of convenience functions for enriched visualizations.
Looking at the graph, it becomes clear there is some sort of divergence that happens over time between the two time series. Let’s look at a regular line plot. (Since our time series are in a pandas DataFrame, it is super-easy!)
Comparing Time Series from Different Industry Sectors
Let’s compare closing prices of McDonalds and Apple. There should be some more interesting visualization results assuming there is not a tight correlation between the two series.
The line plot shows that while one time series is experiencing market support, the other experiences market resistance, and over time there is crossover. This explains the wild, yet oddly rhythmic, visual patterns in the DTW plot.
It is also interesting to see what happens when a single time series is compared against itself. This first plot is for Deutsche Bank’s closing prices, no time shifting:
Although the visuals are a bit psychedelic, there is an obvious symmetry shared between the x and y series.
Applying the Time Shift
Let’s shift the time series and see what happens. Numpy’s array slicing make it easy to do.
Perturbing a symmetrical set of time series results might also provide interesting insights depending on the use case.