{"id":901,"date":"2020-04-12T20:41:24","date_gmt":"2019-05-27T19:24:18","guid":{"rendered":"https:\/\/www.aihello.com\/resources\/?p=901"},"modified":"2025-09-05T07:00:30","modified_gmt":"2025-09-05T07:00:30","slug":"multivariate-time-series-forecasting-using-deep-neural-networks","status":"publish","type":"post","link":"https:\/\/www.aihello.com\/resources\/blog\/multivariate-time-series-forecasting-using-deep-neural-networks\/","title":{"rendered":"Multivariate Time Series Forecasting using Deep Neural Networks"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Multivariate Time Series Forecasting is an important problem in many domains. Be it forecasting the demand for a product, or finding weather patterns, using time series parameters from the present to predict the future is vital to many organisations. In this article we are going to see <a href=\"https:\/\/www.aihello.com\/resources\/blog\/everything-about-optimizing-amazon-backend-keywords-guide-2022\/\">how to<\/a> use recurrent neural networks with convolutional neural networks to predict time series forecast for grocery store <a href=\"https:\/\/www.aihello.com\/resources\/blog\/boost-amazon-sales-using-google-ads\/\">sales<\/a>. The data can be found <\/span><strong><a href=\"https:\/\/www.kaggle.com\/c\/rossmann-store-sales\/data\">here<\/a><\/strong><span style=\"font-weight: 400;\">, and the code can be found <\/span><a href=\"https:\/\/gist.github.com\/wasifhameed\/1f104a21f680eaa55bce2abf9250fde5\"><strong>here<\/strong><\/a><span style=\"font-weight: 400;\"><strong><a href=\"https:\/\/gist.github.com\/wasifhameed\/1f104a21f680eaa55bce2abf9250fde5\">.<\/a><\/strong> This implementation was developed as a prototype for AI Hello, a <\/span><span style=\"font-weight: 400;\">company<\/span><span style=\"font-weight: 400;\"> based in Toronto that provides e-commerce solutions including sales prediction for sellers on e-commerce platforms such as Amazon, <\/span><span style=\"font-weight: 400;\">Woocommerce<\/span><span style=\"font-weight: 400;\">, and Shopify.<\/span><\/p><p><span style=\"font-weight: 400;\">We are using an implementation called LSTNet to perform this prediction. LSTNet uses convolutional networks and recurrent networks in conjunction. Additionally, it also provides the capability to detect long or short term patterns according to the nature of the data.<\/span><\/p><h3><b>Data Preparation and Exploration<\/b><\/h3><p><span style=\"font-weight: 400;\">For this article we\u2019ll be looking at the data-set \u2014<\/span><a href=\"https:\/\/www.kaggle.com\/c\/rossmann-store-sales\/data\"><span style=\"font-weight: 400;\"> Rossman Store Sales<\/span><\/a><span style=\"font-weight: 400;\"> \u2014 available on Kaggle.<\/span><\/p><p><span style=\"font-weight: 400;\">We are cleaning the data to get overall sales across all stores based on the day of the week and promotions. The code to extract this is given below:<\/span><\/p><p><span style=\"font-weight: 400;\">The data now has aggregated values:<\/span><\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"size-large aligncenter\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0_UOd7aEq-341p6Sq3.png\" width=\"402\" height=\"221\"><\/p><p><span style=\"font-weight: 400;\">Now let\u2019s gain a better understanding of the cleaned data.<\/span><\/p><ol><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The graph below shows the sales average by each day of the week\u200a\u2014\u200aSundays have the lowest, and Mondays enjoy the highest sales.<\/span><\/li><\/ol><p><img loading=\"lazy\" decoding=\"async\" class=\"size-large aligncenter\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0_ijU45Ghqa5SPTuSG.png\" width=\"406\" height=\"246\"><\/p><ol start=\"2\"><li><span style=\"font-weight: 400;\"> The relation between the number of customers and the total sales follow a linear pattern. It is only to be expected, but this is helpful while modelling our prediction algorithm.<\/span><\/li><\/ol><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0_bhw7ck6Np-PjFUBY.png\" width=\"458\" height=\"287\"><\/p><ol start=\"3\"><li><span style=\"font-weight: 400;\"> Promotions have a big impact. The mean sales are almost doubled when a promotion was available:<\/span><\/li><\/ol><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0_S1zqemJHampK5B-t.png\" width=\"377\" height=\"259\"><\/p><p><span style=\"font-weight: 400;\">We will split our data into test and train. We are using 20 percent of our data for training, 20 percent for validation, and we will train on 60 percent of our data.<\/span><\/p><p><span style=\"font-weight: 400;\">Before we go into the model however, there is one important question to tackle. What are our features and what are our labels? Or in other words, what input are fitting to what desired output?<\/span><\/p><p><span style=\"font-weight: 400;\">In time series problems, our input (X) will be all the records from timestep t(n-k) to t(n-1). The value we try to fit to this input will be our record at timestep t(n). To model the problem in this fashion we <a href=\"https:\/\/www.aihello.com\/resources\/blog\/key-terms-you-need-to-know-when-you-start-out-as-a-seller\/\">need to do<\/a> further processing on the data. The code below demonstrates how to transform a time-series data this way.<\/span><\/p><h3><b>Model Architecture<\/b><\/h3><p><span style=\"font-weight: 400;\">Before we train the model, let\u2019s take a look at the model architecture. LSTNet has a very novel architecture, it uses both convolutional and recurrent components. It also has a skip-gram recurrent component to <a href=\"https:\/\/www.aihello.com\/resources\/blog\/what-to-take-into-account-to-win-over-mobile-consumers\/\">take into account<\/a> the long term patterns in the data better. Below, we have a detailed analysis of each component.<\/span><\/p><ol><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Convolutional Component<\/span><\/li><\/ol><p><span style=\"font-weight: 400;\">The first layer of LSTNet is a convolutional <a href=\"https:\/\/www.aihello.com\/resources\/blog\/creating-a-knowledge-graph-without-llms\/\">network<\/a> without pooling, which aims to extract short-term patterns in the time dimension as well as local dependencies between variables. The convolutional layer consists of multiple filters of width \u03c9 and height n (the height is set to be the same as the number of variables)<\/span><\/p><ol start=\"2\"><li><span style=\"font-weight: 400;\"> Recurrent Component<\/span><\/li><\/ol><p><span style=\"font-weight: 400;\">The output of the convolutional layer is simultaneously fed into the Recurrent component and Recurrent-skip component The Recurrent component is a recurrent layer with the Gated Recurrent Unit (GRU) and uses the RELU function as the hidden update activation function .<\/span><\/p><ol start=\"3\"><li><span style=\"font-weight: 400;\"> Recurrent-skip Component<\/span><\/li><\/ol><p><span style=\"font-weight: 400;\">The Recurrent layers with GRU and LSTM unit are carefully designed to memorise the historical information and hence to be aware of relatively long-term dependencies. Due to gradient vanishing, however, GRU and LSTM usually fail to capture very long-term correlation in practice. A recurrent structure is used with temporal skip-connections to extend the temporal span of the information flow and hence to ease the <a href=\"https:\/\/www.aihello.com\/resources\/blog\/amazon-seo-strategy-tips-and-tricks-for-beginners\/\">optimisation<\/a> process. Specifically, skip-links are added between the current hidden cell and the hidden cells in the same phase in adjacent periods.<\/span><\/p><p><span style=\"font-weight: 400;\">&nbsp;4.Temporal Attention Layer<\/span><\/p><p><span style=\"font-weight: 400;\">The Recurrent-skip layer requires a predefined hyper-parameter p, which is unfavourable in the non-seasonal time series prediction, or whose period length is dynamic over time. To alleviate such issue, we consider an alternative approach, attention mechanism, which learns the weighted combination of hidden representations at each window position of the input matrix.<\/span><\/p><p>&nbsp;<\/p><h3><b>Hyper-parameters for the model<\/b><\/h3><p><span style=\"font-weight: 400;\">Following are the parameters to decide for the model :<\/span><\/p><ol><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Epochs \u2014 We found the loss function plateauing after 40 epochs for this data-set. Hence we decided running 50 epochs will provide <a href=\"https:\/\/www.aihello.com\/resources\/blog\/how-to-launch-an-amazon-mexico-business\/\">us<\/a> good accuracy.<\/span><\/li><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Window \u2014 This specifies how many time-steps from the past we should take into account for the prediction of sales in the next time-step. A key consideration is to make the window small enough to detect short term patterns, but also be aware of the larger trend. We have decided that the sales information from the past quarter would provide enough information to the predictor. Hence, our window size is 90.<\/span><\/li><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Batchsize- We are passing 128 values in one batch.<\/span><\/li><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">CNN Filters- Number of output filters in the CNN layer<\/span><\/li><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">CNN Kernel \u2014 The filter size of CNN layer. We are using a value of 6<\/span><\/li><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">GRU Units- Number of hidden states in GRU layer. We are using 100 hidden states<\/span><\/li><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Skip \u2014 Number of values to skip when looking for patterns, we are assuming weekly pattern and will use a value of 7<\/span><\/li><li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Highway \u2014 Number of timeseries values to consider for the linear layer, i.e. the auto regression layer, we will keep this to 90 too.<\/span><\/li><\/ol><p><span style=\"font-weight: 400;\">And lastly, we are using RMSE values as our error function and we will use Adam Optimizer for our model.<\/span><\/p><p><span style=\"font-weight: 400;\">Now we will see how to setup the neural network :<\/span><\/p><p><span style=\"font-weight: 400;\">With our model defined we can delve into the code for actually training our data.<\/span><\/p><p>&nbsp;<\/p><h3><b>Evaluate Model<\/b><\/h3><p><span style=\"font-weight: 400;\">The mean squared error value and the correlation value for our model during training and validation are given below. We see a considerable decrease in the error and a rising trend in the correlation, both of which are good indicators.<\/span><\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0_zjoB7oKVuW7YEAuE.png\" width=\"600\" height=\"327\"><\/p><p><span style=\"font-weight: 400;\">Given below is the prediction for the whole time period. Let\u2019s concentrate on the test values prediction.<\/span><\/p><p><span style=\"font-weight: 400;\">The blue values indicate the actual sales.<\/span><\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0__2h1ArJNvDp2vGVz.png\" width=\"600\" height=\"331\"><\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0_xiJDKcoUj7Hn-e6p.png\" width=\"600\" height=\"341\"><\/p><p><span style=\"font-weight: 400;\">We have not considered certain factors such as holidays and individual store location holidays which will help us predict the outliers, but the general trend and dips, as well as the weeks in which the sales were generally low or high have been perfectly predicted by the algorithm. Taking a closer look reveals this:<\/span><\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0_zbbT0sEBbRDwChoU.png\" width=\"600\" height=\"337\"><\/p><h3><b>Comparison<\/b><\/h3><p><span style=\"font-weight: 400;\">To compare the results, here is the prediction pattern we are getting with FB Prophet( another time series prediction tool):<\/span><\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2019\/05\/0_NPaFYLDLTuKScCSe.png\" width=\"600\" height=\"295\"><\/p><h6><span style=\"font-weight: 400;\">FB Prophet Results<\/span><\/h6><p><span style=\"font-weight: 400;\">We see that while FB Prophet is good at accounting for the periodicity and general trend of the data, it does not do as good a job when it comes to predicting actual sales value for each day. Even while sales tend to rise and fall by each week, FB Prophet predicts the same values for all weeks. Our algorithm however, is able to predict how each week is going to be different from the last, which is a huge factor when it comes to sales strategy. <\/span><\/p><p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>Predict grocery sales using Multivariate Time Series Forecasting. This article explores LSTNet, combining RNN and CNN for accurate e-commerce sales prediction.<\/p>\n","protected":false},"author":1,"featured_media":931,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[89,40,103],"tags":[177,445],"class_list":["post-901","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-amazon","category-demand-forecasting","category-machine-learning","tag-demand-forecasting","tag-ecommerce"],"_links":{"self":[{"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/posts\/901","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/comments?post=901"}],"version-history":[{"count":1,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/posts\/901\/revisions"}],"predecessor-version":[{"id":12513,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/posts\/901\/revisions\/12513"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/media\/931"}],"wp:attachment":[{"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/media?parent=901"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/categories?post=901"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/tags?post=901"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}