For companies to become competitive and increase their growth, they need to take advantage of what Artificial Intelligence and Machine Learning can bring to develop predictive models based on forecasting sales in the future.
Predictive models attempt to forecast these sales based on historical data, while taking into account the effects of seasonality, demand, holidays, promotions, and competition.
For this case study, the company requesting to create a prediction model, it has provided data from the sales department of 1,115 stores.
Sales related dataset:
- Id: transaction ID (combination of store and date)
- Store: unique identifier of the store
- Sales: daily sales, this is the target variable
- Customers: number of customers on a given day
- Open: Boolean to indicate if the store was open or closed (0 = closed, 1 = open)
- Promo: Describes if the store had any type of promotion that day or not
- StateHoliday: indicates if the day was a holiday or not (a = public holidays, b = Easter holidays, c = Christmas, 0 = It was not a holiday)
- SchoolHoliday - Indicates whether (Store, Date) is affected by public school closures
Dataset related to stores:
- StoreType: category indicating the type of store (a, b, c, d)
- Assortment: a = basic, b = extra, c = extended
- CompetitionDistance (in meters): distance to the nearest competition store
- CompetitionOpenSince [Month/Year]: date the competition opened
- Promo2: Promo2 is a continuous and consecutive promotion in some stores (0 = the store does not participate, 1 = the store participates)
- Promo2Since [Year/Week]: date on which the store starts participating in Promo2
- PromoInterval: describes the consecutive intervals where Promo2 begins, indicating the months in which it begins. p.e. “Feb,May,Aug,Nov” means that each new round of promotion starts in February, May, August, November of any year of that store
Original data source: https://www.kaggle.com/c/rossmann-store-sales/data
We will work with this data in order to figure out and predict the amount of sales in the future, the company wants to take advantage of the Machine Learning to make some decisitions about its stores.
They want to know the behavior that the stores have had throughout the year and look for trends in them, they want to be prepared to react to any change in the market.
The development of the project carried out as follows:
- It began with a quick analysis of the data, beginning with knowing some basic statistics, as well as the types of variables with which it is working.
- Data cleaning is important to avoid bias and possible erroneous analysis of the data. To avoid this, we worked with null data.
- An exploratory analysis of data in depth to find trends in sales is so important, for this, we will break down the data set to verify how they behave with respect to the year, the month and the week.
- We analyze the most relevant variables such as the number of clients, the sales that have been reported, how promotions affect in the amount sold, among others.
- Once the data has been analyzed and understood, the model is trained. To do this, in this case study we chose to use the Facebook Prophet library that helps predict time series. We made two models, one general, taking into account the data without specifying the vacation days, and another specifying these days, which can be important when predicting.
If you want to know what we found on the analysis, click on download PDF file or go directly to the repository, where in the folder Sales you will find the correspond notebook.