The question is to investigate the viability of a market – this can be learnt through survey data! But in order for our survey results to be useful, they should match the general “profile” of the country. This can be driven through demographic profiling such as age and gender. Ideally your sample should be already reflective i.e. if 20% of your population of respondents are women aged 1825, then so should your survey responses.
But this isn’t always so easy to achieve!
The next best thing is to weight your responses. This only works if you already have some population stats on your audience.
Say for instance we want to understand the market for a new Poutine product. Given its a regional speciality, the majority of our customers will probably be Canadian.
True population
Country  % of population 
Canada  0.7 
Italy  0.1 
Ireland  0.1 
UK  0.1 
Here’s a profile of the folks who have responded to our survey
Our Sample
Country  Name  % of population 
Canada  Emily  0.2 
Canada  Nai  0.2 
Italy  Bene  0.2 
Ireland  Kevin  0.2 
UK  Ben  0.2 
If we profile our data, we see the profiles don’t match at all! But what can we do getting survey samples to exactly match our population is time consuming and expensive.
Country  % of sample  % of population  
Canada  0.4  0.7  << In our sample, 40% of respondents are Canadian. But our true market population will be 70%! 
Italy  0.2  0.1  << Also our market has less Italians, Irish and British actually overrepresent the actual scenario 
Ireland  0.2  0.1  
GB  0.2  0.1 
But even if our sample is not representative of our population, we can create a new column that helps us redistribute the weighting and help us extrapolate correct trends to the country level.
The Weight Index and Person Weight columns are the key parts of this technique. When summing up both columns, both total to 5 (as we have 5 respondents) but the weight index allows us to proportion more of our results to reflect the demographics we’ve matched against originally.
I’ve attached a sample here if you’re interested in taking a review.
I hope you’ll get to use this technique with your next survey data use case!
Point

X2

Y2

A

1

1

B

2

1

C

4

3

D

5

4

Centroid 1


Point

X2

Y2

X1

Y1


A

1

1

1

1


B

2

1

1

1


C

4

3

1

1


D

5

4

1

1


Point

X2

Y2

Centroid 2


A

1

1

2

1


B

2

1

2

1


C

4

3

2

1


D

5

4

2

1

But this format allows us to really clearly see the following mathematical operations:
1) Finding the difference from the centroids 2) Squaring these differences 3) Adding the squares 4) finding the square root.
Step 1

Step 2

Step 3

Step 4


Centroid Points

Centroid

Find the difference

Square the difference

Add the squares

Find the square root


1

1

Centre 1

0

0

0

0

0

0


1

1

Centre 1

1

0

1

0

1

1


1

1

Centre 1

3

2

9

4

13

3.605551275


1

1

Centre 1

4

3

16

9

25

5


2

1

Centre 2

0

0

0

0

0

0


2

1

Centre 2

0

0

0

0

0

0


2

1

Centre 2

2

2

4

4

8

2.828427125


2

1

Centre 2

3

3

9

9

18

4.242640687

Step 2

Compute new centroids (only centroid 2 here)


B

2

1


C

4

3


D

5

4


New centroid is the average

3.6667

2.6666667

Find the square root


Point

X2

Y2

X1

Y1


B

2

1

Centroid 2

3.666667

2.6667

2.357022604


C

4

3

Centroid 2

3.666667

2.6667

0.471404521


D

5

4

Centroid 2

3.666667

2.6667

1.885618083


B

2

1

Centroid 1

1

1

1


C

4

3

Centroid 1

1

1

3.605551275


D

5

4

Centroid 1

1

1

5

Lets show you how it can be done with Superstore data!
Step 1: Identify which orders contain the product, category etc which fit your criteria.
Step 2: Use this as an identifier against the whole order.
We can do this with an easy level of detail calculation! We are essentially isolating to each order OrderID (since its fixed to this in the calc) then checking against which is the max value. Since any string value is inherently larger than nothing (the null value), it will return the string and populate against each order ID.
Step 3: Place this new calculation on filters
This will bring back all orders where it contains bookcases. But it will also keep its accessory products!
From here its just about rearranging the view to find what other product categories are being sold with bookcases. Place a count distinct of order ID to columns with subcategory to rows.
The view below tells us that for the 224 orders that contain a bookcase sold, we also had 56 orders that sold at least 1 quantity of a binder.
But when I place profit onto color, this tells me that for the past bookcase orders, not only have 42 orders had a bookcase and a phone in the same basket. But I’ve also made a nice profit of $2,687 from customers buying these items together! This is probably important as we are not making a profit from Bookcases!
Interested in the workbook? Check it out on Tableau Public!
Thanks for reading!
]]>
If you’d like to build along, you can access the data here: Final Datasets
]]>That’s all I can think about for now. Any others I missed? Let me know in the comments!
See you in Vegas!
Emily
]]>