This analytics problem is a rite of passage – we all eventually get asked it. And it usually goes to a tune like this:
“I have a list of projects with start and end dates. I want to know at any given time, how many projects are active.”
(Btw the TL;DR for this is actually this great video by my TIL colleague Andre)
In this case, we looking for how many orders are being prepared for shipment at any given date?
Here’s my viz.
Using Superstore, the data you’re given looks something like this.
And visually represented, we know that these projects all overlap on March 16 2016.
But how can we express this as a line chart? i.e. not at a given point in time, but how many over time?
For such a simple question, it actually has a rather complex (but easy to implement!) solution. It requires a product join.
The difficulty with this easy to ask question is that …how can Tableau plot data for a datapoint that effectively doesn’t exist? That is, when we look at the data, March 16 only occurs 3 times.
So as smart as Tableau is, it does not know that all the orders in the list are in preparation mode, just like the other 3. (Well Tableau CAN know, but lets put densification aside for another day).
Step 1: Prepare your Data
Given this train of thought, that means we need to give each entry (order ID/employee/project/whatever your flavor of granularity) every date we want to compare against.
And since we want to be able to compare against any date, we are going to multiply each entry 365 times for 365 dates. This will pad out the underlying dates and give the data structure we need to make this work!
That means you’ll need a dataset with dates you want to see in 1 column and a dummy variable in another. I’ve very creatively called mine “One”.
Then you’ll have to add the dummy variable to your original dataset. In this case, I added a “1” running a straight column down the Superstore data.
Set it as an inner join in Tableau
Step 2: Create Your Date Filter
.. then put this filter on columns.
Step 3: Start to create your view!
Put date from your dummy dataset on columns and add the count distinct of your chosen dimension on rows.
In mine, I get something like this.
Other resources I’ve used to get to this method include this great video by my colleague Andre and of course, the great Bethany Lyon’s talk on Cross Database Joins at TC16.