One surprisingly common yet boring task I run into on data analysis and marketing mix modeling projects is turning monthly or weekly data into daily. For example your affiliate report might only be compiled monthly, or your SEO analytics only exports data broken down by week.
Daily data is the most ideal format, because it gives you 7x more data points than weekly, and ~30x more data points than monthly. It’s also the most flexible, because you can always roll daily data up to weekly or monthly later: it’s not as easy to go the other way.
In these cases what do you do? The answer is Interpolation, or the practice of filling in gaps in your data. If you imagine you have just two dots of data, one for each week: interpolation works by drawing a line in between those two dots, which gives you realistic values for each day.
This is a little confusing to do in Python, but luckily I’ve open-sourced my code, to make things easier for everyone. Let’s see what interpolation from weekly and monthly to daily looks like.
The above is a realistic dataset for searches on your brand term. You can see it follows a clear weekly trend, as well as having a general movement up and to the right, with big spikes on some of the days. If we take that same daily data and group it weekly, this is what it looks like:
Now of course in our case we have the real daily data to compare, but let’s pretend for a second that we had only been given weekly data. How much definition are we losing here? Well we’ve gone from 882 days to 127 weeks, but you can see the general shape is still there. Let’s use our interpolation function to draw lines between those dots. So we’re going to scale back up from 127 points to 882.
You can see how the exact same shape has been maintained from chart to chart – we can’t possibly ‘know’ anything about the inter-week trend if we just have weekly data, so the best we can do is maintain the same shape but fill in the gaps in between. Let’s see how much more definition we lose on monthly.
Now we’re down to just 30 rows, from almost 2 ½ years worth of data. You can hopefully see that building a model based on monthly data would be pretty inaccurate unless we had a decent amount of history. Wherever possible we want to get that monthly data converted to daily, so it can at least support the other (daily) variables in the model. What does the monthly data look like converted to daily with Interpolation?
You can see here that the same general shape shows up, but we have lost a lot of definition. We’re not really seeing any of the spikes we saw in the weekly and daily data. We also have an issue at the end of the last month, where it’s (incorrectly) dragging the average down due to lack of definition in the data. Ok finally let’s bring this all together, so we can see it in one place:
This lays it all out pretty clearly. Sure we do lose a lot of granularity here, but if weekly or monthly is all you need, Interpolation does a pretty good job of capturing the basic trends. So if the rest of your variables are daily, and you need to resample your monthly or weekly variables down to match, Interpolation is a pretty good bet.
Note: this won't do anything for you if ALL of your data is weekly or monthly, but if most of your main variables are daily and you just have to convert a handful of monthly or weekly variables to fit the model, go right ahead!
*The code I used here is all in a Jupyter Notebook and Open Source library, which you can access here. Feel free to use it and improve it!*