Skip to main content

Viz Club – May 2016

For this month’s Viz Club, we analyzed the Panama Papers from the ICIJ Database.

Kudos to the Viz Club members:

  • Sophie Sparks for choosing a winning dataset
  • David Pires for making the Pub reso (we managed to get by without a pocket projector, but if you come with one, I’ll buy you a drink AND let you choose the music.)
  • John MacDonald for sourcing the dataset despite being away for work

I would say the night was split into 3 parts:

  • Trying to download journalism’s largest data hack in a pub. It was exactly like that scene in The Big Short except there was no Brad Pitt and no internet. While no where near as painful as our first Viz Club,  we got there in the end.

<fast forward the the second hour>

  • Trying out the dataset on Tableau 10.2 beta with excellent driving by Waseem. While our immediate question was whether we could put charts in tooltips (answer: not yet) we got a chance to play around with the beautiful new UI and potentially find a bug with the mapping function.

Viz Stretches Viz Stretches

We also found some interesting insights about the addresses in this ICIJ datasource. Mind you, they are pretty upfront that the extracts are not the entire database and require professional judgement before determining if there was wrongful action.

Especially after vizzing the “addresses” table, it paints a completely different picture of the news with a significant number of companies from Asia including China, Taiwan and Hong Kong who have trusts. In fact, the focus of The Panama Papers has been around the trust we have/privilege we put on our politicians. I think this is an excellent example of having a goal before diving into data because, especially at an aggregate level, the data can tell a very different story.

Disclaimer as you enter the site Disclaimer as you enter the site

The last part of the night kicked off once Chris found that the entities table outlined where companies had set up their trusts. Once we figured out it was a simple delimit to get the entity’s country and its respective trust’s registered location, we hit our second wind.

Yes- we immediately wanted to build a map with curvy lines!

Exact reaction right here. Exact reaction right here.

Sophie was all over this. She pulled up this gem from Alan Eldridge where we wouldn’t even have to do the data prep after we delimited because in Tableau 9.0, we could pivot the data right in Tableau!

Sophie in the zone Sophie in the zone

Alas no lines! No map! No VIZ!? After looking at the blog post in more depth at home, it turns out the solution was a tad more complicated than we expected (and that we couldn’t use the generated lat/lons in calcs).

:'( :'(

If you have any reservations about attending Viz Club, note that we have a reasonable 9.30 hard stop since it is a work night. We viz party like sensible folk.

<fast forward to today>

Of course, we can’t come away from Viz Club with nothing to show! Here’s the entity table in the Panama Papers visualized from the originating company’s country to its trust in a different country. Data preparation was done in Alteryx and I ended up using Chris DeMartini’s excellent blog post on Great Arcs in Tableau.

Click on the image to go into the desktop version of this viz. For this version, please set your resolution to 1920×1080 so the Title doesn’s look off. Not good UI practice I know, but look at how great that title looks overlaying the map!

And the curvy lines? Here you go!

Click on the image to go into the desktop version of this viz.

Overall, it was a really fun night of catching up, learning about Tableau’s newest features and getting hands on with one of the biggest stories of the year. Can’t wait to see what the next one will bring!

Quick Notes – What is “Scaling” in Tableau Server?

Often times we talk about Tableau Server being a highly scalable product, where it can host data sources and deploy reports to tens/hundreds/thousands of people.

Scalable often refers to flexibility in 2 ways:

  1. “Scaling Up”: making your current server more “beefy” with greater processing power and memory on the same server
  2. “Scaling Out”: adding more servers to your existing infrastructure.

Scaling up – hiring a more experienced barista (see slide above for analogy)

Tableau has many different components which make up its functions. The most recent visualization of its breakdown is below from the 9.0 whitepaper.

Source: https://www.tableau.com/sites/default/files/media/whitepaper_tableau_server9.0scalability_eng_2.pdf Source: https://www.tableau.com/sites/default/files/media/whitepaper_tableau_server9.0scalability_eng_2.pdf

Tableau Server will allow you to store ALL of these different components … on 1 machine. This configuration is optimal for a particular type of customer: where they expect they will need infrastructure/power to crunch more data faster and don’t expect a change in the number of customers.

Perhaps its analyst teams are small and they don’t expect to have many users and reports, but there is LOTS of data with a likelihood to increase. In this case, it might be inefficient to add servers (“scaling out”- a concept we’ll get to below) because if processes are spread across multiple machines (and they have to talk to each other) performance and speed might suffer.

If you expect your data processing needs will grow, but your user base will remain the same, consider scaling up.

Scaling Out: Hiring More Baristas at the same level (see analogy below)

On the other hand, you may choose to scale out. Some companies do choose to do this as buying more hardware is sometimes more cost efficient compared to scaling up. Also Tableau Server is unique in that it has a built-in Gateway to help spread the load of requests across servers.

Below are 2 examples of how to architect Tableau Server if there are 3 or 5 servers. Figure 4 with 5 servers is configured with a specific use case in mind. It would be worth briefing your Tableau Server consultant on how your data is currently architected to see if you can integrate some key features of Tableau Server (such as connecting to a shared data sources).

This is a popular enterprise deployment structure as Tableau is a data exploration tool, and not limited to only 1 part of the business. Departments in multiple functions will have individual reports, but IT administration in permissioning and deployment is in 1 centralized location. If you expect to have many users to on board into the future, consider scaling out your Tableau Server.

That’s all I’ve got on Tableau Server for now. Hopefully its given you a quick overview into the product and some key deployment scenarios. Let me know if you have any comments or questions 😀

Quick Notes – What Do I Need to Consider for my Tableau Server Deployment?

This blog post is meant to provide a general overview to Tableau Server – to be familiar with common lingo and first line questions when you meet with your Tableau Server consultant.

Once your team has decided that Tableau Server is the way to go, there are 2 key parts of how you will architect this product to match your organizations needs:

  1. Internal Assessment: Understanding how to configure your Tableau Server based on what are your user’s requirements
  2. External Assessment: Understanding your organization’s technical infastructure

Internal to Tableau Server – How to Set Up Tableau Server

Your Tableau consultant will likely ask you questions around how many users will be using the system at the same time (concurrent users). Makes sense since it will help determine how much your server has to work to “serve up” information and graphics to your users.

Also, how complex the Tableau workbook is will impact performance on Server. Take a look at this article some quick wins in this area.

Of course, how much data is being processed in the datasources will be important as well as more data= more processing= slower performance.

External to Tableau Server – Your Organization’s Network

Tableau Server will sit in your organization’s ecosystem of enterprise tools. You’ll be choosing whether to host the server on a physical environment, virtual (VMware, HyperV) or in the cloud (Amazon Web Services).

Physical servers are recommended where the data is highly sensitive and needs to stay within a specific boundary/server (reach out to your privacy and security team for guidance). Virtual Machines are the most common implementation practice.

Also the complexity of your network can be broken down into 3 components: the location of your users, data and network. For example, if your server is located in the US with your European offices using Tableau Server, and your data repository in Canada– your data will have to travel from Canada to the US (where your server is located) to be processed, and then travel across the Atlantic to render on your user’s computer in the UK (and you think you hate long haul flights!).

Ideally, we’d like for these 3 components to be as close together as possible to optimize performance.

Stay tuned for Part 2 of “What is Scaling”. We often refer to Tableau Server has a highly scalable product. What does this mean? I’ll cover this in my next blog post.

data+women May 2016

After a month and half, we had our first data+women meetup on Thursday (YAY!). Thank you to those in the Tableau community who came in to share their support (and Andy for helping us coordinate the room and pizza). Sophie kicked off the panel discussion and hopefully the conversations would be a conduit for us to build a community of shared experiences.

Into the future, I’d like for us members to share their hobby work in data analysis. For the next session, I’m hoping Damiana and I will be able to do a demo on how to work with unstructured machine data with Tableau. Its mostly because this is REALLY cool and I think there are people who might be interested!

During the panel discussion, one of the shared experiences we mentioned was how women suffer greatly from Imposter Syndrome. It came up when Andy noticed that the women he’s interviewed for the Data School have a tendency to ask less questions during this process. Emma Whyte told us about how she felt she “almost talked her way out of a job” with The Information Lab. My first blog post with the Data School was exactly this in action. The great thing is that Andy has also noticed that while there might be this confidence gap in the beginning, they also leave the school with the greatest confidence too.

This comment was also brought up as well …

“Why are we talking about this anyways? Men also have moments of unertainty in how our work will be received”

Its a great point as no one can be confident 100% of the time. Also this dialogue about women and confidence can sometimes come across a bit overplayed- everyone’s a bit bored of this story.

I’ve given this a lot of thought and to me, this might be less about how unique the confidence gap might be to women but rather its interesting how it seems to persist cross culturally (the data school has recruited folks from all over the EU and 2 from Canada) and even at multiple levels of leadership and talent. From that, surely a network of encouragement could be helpful and data+women could be our small contribution.

I highly recommend reading this article about a coder at Pinterest who was almost dissuaded from a career in software engineering. Her story is striking because of her incredible CV“WTH? How could she feel this way?”. Despite her full advantages, she recalls her first semester at Stanford and how her confidence was shaken upon hearing classmates groan at how easy the assignments were– the same assignments that took her 15 hours to complete (what jerks were her classmates right?). I’m hoping that by having an active community, we can ALL tap into that support.

Andy also did an excellent job showcasing how Makeover Monday has analyzed various women’s issues such as How Much Do Women Work Globally (this week’s Makeover Monday), Literacy Rates between Genders, Women Underrepresented on Tech Boards, and Roles in Government.

The one below is probably one of my favorite from Andy. Not only is it beautiful to look at, but the highlight table on the bottom still provides a quick and easy way to analyze all the states equally and then for us to understand how these measures are distributed geographically.

Overall, I’m really happy that we were able to hold the event and start to build momentum. We all gotta start somewhere 🙂

Favorite Formatting Tips and Tricks

Formatting has to be one of the most dreadful parts of dashboard design. All the analysis fun of dashboarding is done but now the clean-up (sigh). However, here are a couple of quick tips that will save you time and hair-pulling effort in getting everything to look JUST right.

Format all your worksheet titles in a dashboard in 3 clicks

Did you know you could format all your worksheets in a dashboard with 3 clicks? Go Dashboard->Format and look for a section called “Dashboard Subtitles”. This will format all sheet titles in your dashboard the exact same way.

Format tooltips from the Dashboard

Just found this last week and it blew my mind. Did you know that you can adjust your tooltips from the dashboard? Instead of going into each sheet, you can adjust and preview in the dashboard. Such a timesaver!

Copy and Paste from  the Worksheet formatting

Have your formatting JUST right for 1 worksheet… and have 10 others to go? Since it’s taking the formatting conditions from the sheet, (rather than the marks card) It doesn’t always work for 100% of the formatting but it’ll get you 80% of the way there. Right click on the tab, and you’ll see “Copy formatting”. Select that and right click on your end sheet to “Paste Formatting”.

Right-click sheet and format the WHOLE worksheet Right-click sheet and format the WHOLE worksheet

Hate what you’ve done? Clear your entire worksheet in 1 click

Right click on the bolded header and select Clear. Right click on the bolded header and select Clear.

BONUS – get rid of sheet borders in row divider and column divider.

I often use this when I’m creating the Top Line numbers (so pretty much all of my dashboards!)

Happy Vizzing!