Skip to main content

Weighting Survey Data

I learnt this interesting technique while teaching a session for a management consultancy.

The question is to investigate the viability of a market – this can be learnt through survey data! But in order for our survey results to be useful, they should match the general “profile” of the country. This can be driven through demographic profiling such as age and gender. Ideally your sample should be already reflective i.e. if 20% of your population of respondents are women aged 18-25, then so should your survey responses.
But this isn’t always so easy to achieve! 

The next best thing is to weight your responses. This only works if you already have some population stats on your audience.

Say for instance we want to understand the market for a new Poutine product. Given its a regional speciality, the majority of our customers will probably be Canadian.

True population

Country% of population
Canada0.7
Italy0.1
Ireland0.1
UK0.1

Here’s a profile of the folks who have responded to our survey

Our Sample

CountryName% of population
CanadaEmily0.2
CanadaNai0.2
ItalyBene0.2
IrelandKevin0.2
UKBen0.2

If we profile our data, we see the profiles don’t match at all! But what can we do- getting survey samples to exactly match our population is time consuming and expensive.   

Country% of sample% of population
Canada0.40.7<< In our sample, 40% of respondents are Canadian. But our true market population will be 70%! 
Italy0.20.1<< Also our market has less Italians, Irish and British actually over-represent the actual scenario
Ireland0.20.1
GB0.20.1

But even if our sample is not representative of our population, we can create a new column that helps us redistribute the weighting and help us extrapolate correct trends to the country level.

  1. Prepare look up tables with population proportions calculated 
  2. Create a weight index– 
    1. Calculation is [% of population] / [% of sample]. This allows us to scale up demographics of our samples that are under-represented in our data. 
  3. The data comes out as so :

The Weight Index and Person Weight columns are the key parts of this technique. When summing up both columns, both total to 5 (as we have 5 respondents) but the weight index allows us to proportion more of our results to reflect the demographics we’ve matched against originally. 
I’ve attached a sample here if you’re interested in taking a review.


I hope you’ll get to use this technique with your next survey data use case!

Tableau 2017 – Conference Tips!

If you’re headed to the conference, here’s a couple of pointers to prepare/look out for. I would highly recommend being organised so you can be open to random opportunities that arise.

General

  • If you have food allergies – pack accordingly. There’s food everywhere but all pickup food stations
  • Comfy walking shoes. The conference is MASSIVE and you’ll be on the go from breakfast of 3AM for a week.
  • Buy Conference Swag ASAP. The store was clear out by Thursday. You can always order online but its just not the same!

Learning at Conference

  • Plan your sessions ahead of time (and create back up plans). I did my schedule in Excel so I could keep track of it all. Take into account the travel time between venues if you can.
  • Make your life easy and just let go of the FOMO. There are so many amazing events and sessions. It’s going to happen – so just be happy with the plan you’ve committed to. I remember feeling so much stress on my first day and Emma from TIL also shared this sentiment. Her advice? You gotta let it go!
  • Feel free to leave a session, if a session is really terrible/too full for you to absorb information. Its about making the most of it for you.
  • Check out the Vendor Expo. I think of it as a professional development session in itself! Ask lots of questions– they’ll be happy you did.
  • Take notes during sessions. I carried my laptop around.
  • Pick good speakers. Its also tough for those preparing the sessions – its a lot of work! Some might not have a had enough time, not great public speakers etc.. so that’s going to reflect in the end product.
  • Get good seats. Last year, people in the back probably had a hard time because many of the rooms were long and skinny with no screens for the back end of the crowd.

FUN!

  • Create a WhatsApp group with your travel mates
    • With 15,000 people, you’ll need it to help navigate where’s the best sessions, events etc.
  • Go out and socialize. Tableau folks are some of the friendliest, quirky data analysts around. What a world of difference it makes to meet them in person!

That’s all I can think about for now. Any others I missed? Let me know in the comments!

See you in Vegas!

Emily

Information is Beautiful – G Research Lecture Series Review

David McCandeless is a titan in data visualisation – no doubt about it. And with The Science Museum’s immersive IMAX screen, we saw the vastness of a billion dollars, examined the medical veracity of “superfoods” and proved the most controversial question of all – what is the best dog breed? (All backed with data!)

With beautiful colours, composition and storytelling, McCandeless’ approach to data visualisation invites all audiences to engage with the world around us. Like a bakery window with glossy cakes, he draws users in with data, design and a story.

But how does this message fare in a room full of quants? He demonsrates data visualization’s power to engage with his final audience game, asking a series of questions of what is more popular on Google: Beer vs Wine, Cornflakes vs Toast, Youtube vs Sex (the answer may surprise you). 400 people immediately took to the 101 of data analysis: to think, take a guess, and to revise one’s answer. By demonstrating how data visualization can bring conversation, McCandless taught an audience of all disciplines how we can be inquisitive about our data- to move from information into knowledge.

Unsurprisingly, the topic of fake news was raised during the Q&A period – how can we spot fake news? As a journalist, ad man, and designer and developer, all these talents still suggest this 1 answer: its tough to spot and trusting your source is key. Data Visualisation is powerful – use it wisely.

Here are some of my pictures from the event:

How to Analyze Year over Year Performance at the Weekday level

When I worked in industry, everyday I reported sales performance at the day level and compare its results against the same weekday the previous year.

“If we had a bad YoY comparison, did we have a big promotion the same day last year? Or if not, which categories did better or worse?”

To add to the complexity, the business had a regular promotional week where new pricing would update every Friday. So comparing Jan 6, 2017 (a Friday with new prices and promotional tactics) against Jan 6, 2016 (a Wednesday) was not helpful!

Its an excellent way to look at the business tactically, and get some actionable insights quickly. And luckily it’s really easy to do in Tableau!

Step 1: Find the same weekday last year

 

Step 2: Add the same datasource again

Here, I’ve added Superstore again through the other menu since it was giving me grief through the “Saved data sources” pane. This way, I can just simply navigate to the dataset sitting under “My Tableau Repository”

 

Step 3: Configure the blend through the Edit Relationships

Here we specify that we want the “Day/Month/Year” of Order Date and Order Date in the Previous Year to match, as well as the product subcategory. This is how we’ll be able to see which subcategories had new or declines in sales between the years.

 

 

Step 4: Validate your data post Edit Relationships

Put [Order Date] and [Order Date LY] from the same data source (the first one we pulled in) on rows and [Sales] from the same datasource on Text.

Then put [Sales] from your secondary datasource on the marks pane to create the following table.

In the view below, I’ve already renamed it to be [Sales LY] in the secondary datasource. But remember that its only because I’ve got my edit relationships set up and I’m using this as a secondary datasource that it will render sales this way. Otherwise, it will not retrieve sales from last year,

 

Want to check to see if its pulling in the right value? Just scroll up [Order Date] and check if its referencing the right sales amount. Here – the $732 in sales match!

 

Step 5: Create your Year Over Year Calculation

 

Then see what it looks like against time..

 

I find this view (while computationally correct) too messy and variable to garner any insight.

I find it much more useful when I filter the view to the last 10 days. Even better when I conditionally color it against if its done better or worse than last year!

 

 

Step 6: Build a dual axis bar chart  with [Sales] on 1 axis and [Sales LY] on the other.

 

Final Thoughts

In the last view, I’ve already got it filtering by a particular day, but I can easily attribute which categories I made gains in, and which ones I wasn’t able to produce better results compared to last year. Maybe the promotional tactic wasn’t as good, maybe there were extenuating factors.  But its great to know isn’t it?

 

Hope this will be helpful for you!

 

 

 

 

 

 

Makeover Monday – Washington Metro

Here’s my Makeover Monday contribution. For those that don’t follow the project, its where our community of data viz folks try to reinterpret and improve on an existing infographic/chart.

 

Here’s the original

 

Here’s my contribution: