For the clustering problem, only the final cost for each pairwise comparison is required; the actual warping path (or mapping of each point in one time series to the other) is superfluous for clustering. The memory complexity of the cost matrix \(C\) is \(O(nm)\), so as the length of the time series increases, the memory required increases greatly. Therefore, significant reductions in memory can be made by not storing the entire \(C\) matrix. When the warping path is not required, only a vector containing the previous row for the current step of the dynamic programming sub-problem is required (i.e., the previous three values \(c_{i-1,j-1}\), \(c_{i-1,j}\), \(c_{i,j-1}\)).
In DTW-C++, the DTW distance \(C_{x,y}\) is found for each pairwise comparison. Pairwise distances are then stored in a separate symmetric matrix, \(D^{p\times p}\), where (\(p\)) is the total number of time series in the clustering exercise. In other words, the element \(d_{i,j}\) gives the distance between time series (\(i\)) and (\(j\)).
For longer time series it is possible to speed up the calculation by using a ‘warping window'. This works by restricting which data elements on one seies can be mapped to another based on their proximity. For example, if one has two data series of length 100, and a warping window of 10, only elements with a maximum time shift of 10 between the series can be mapped to each other. So, \(x_{1}\) can only by mapped to \(y_{1}-y_{11}\). Using a warping window of 1 results in clustering with Euclidean distances, forcing one-to-one mapping with no shifting allowed. The stricter the wapring window, the greater the increase in speed. However, the data being used must be carefully considered to assertain if this will negatively impact the results. Readers are referred Sakoe et al., 1978 for detailed information on the warping window.