The #1 Mistake Customers Make with Tableau

AIM Consulting

Tableau Development - Data & Analytics - AIM Consulting

05/18/2016 | Data and Analytics

Tableau is very clear and passionate about the company’s mission — to help people see and understand data. Their data visualization technology is second to none, a wonderful front-end tool that makes data come alive.

But when it comes to implementing Tableau, customers often make the mistake of relying on Tableau as a full data manipulation tool without an upstream data management platform or a plan to build one. Tableau has some very handy data manipulation features, but its focus is on front-end visualization, not data management, so these features have a few major constraints.

This issue becomes particularly problematic when customers want to manipulate data across more than one source. As an example, let’s consider a scenario where an e-commerce company has all of its transaction data stored on Server 1, split across multiple tables, but its website data stored in Server 2.

Server 1 – Transaction Data

transaction data

Server 2 – Configuration Settings

configuration settings

As you import data into Tableau, you can write a custom SQL query to join data across multiple tables and transform it as needed; however, this SQL query is limited to one and only one location (i.e., server). In the simplified example above, the transaction data can be summarized across all the tables on Server 1 and brought into Tableau to analyze data related to customers against the products they ordered, when they ordered them, what segment they belong to, and so on.  However, if you are trying to combine data in Server 1 to data on a different server or data source (Server 2), Tableau has limitations.

For example, let’s say the e-commerce company releases multiple versions of their UI. They store information related to each customer’s configuration settings on a production server (Server 2).  The data on Server 2 stores information about which UI experience each customer is set to have, along with the date on which they were configured for that particular UI.

The tables on Server 1 include all customer transactions. In order for the company to identify which customers were launched on what UI and when, the company needs info from Server 2.  In the customer transaction data, they need to add a dimension for tracking the Launch Date of the new UI for each customer, and then compare the transaction date to the UI launch date to determine whether the customer viewed the new vs. legacy UI.  Although this seems reasonable, it is unfortunately impossible to do in Tableau if the data resides in separate servers.

Although Tableau supports connecting to a wide variety of data stored in a variety of places, you cannot query across multiple servers in the same connection. Data from each server must be brought in with a separate data connection. Once data sources are separately brought into Tableau, you cannot query across them using a SQL Join feature.

Tableau has another function called Blending that will combine data from multiple data sources, but this capability has restraints. First, you designate a data set as Primary, and you can access all of that data. A second data source would be considered secondary. However, you can only pull measurable data from the secondary data source. In other words, if the data in Server 2 was purely numerical, you could use the Blend feature to combine it with the data in Server 1. However, dimensional data such as date stamps and other parameters that provide mapping or contextual information are not measurable so no relationship between dimensions in secondary tables can be made to the primary table.

Let’s look at another example:

Server 3 – Hadoop

hadoop

Let’s say that our e-commerce company wants to analyze the browsing behaviors that led to customer transactions in order to evaluate the effect of enhancements to the new UI on specific customer segments. The browsing behavior, such as a newly added filter for example, is stored on a Hadoop server along with information like CustomerID, BrowseDate, and other fields. However, not located on the Hadoop server are CustomerName, Segment, and other descriptive elements. In this case, it would again be impossible to look at the impact browsing behavior had on a customer segment without comparing this data to the data in Server 1.  Instead, you will need to maintain meta data (Ex: customer segmentation mappings) on Hadoop and Server 1.  Tableau cannot join the data because it exists on different servers and it cannot blend the data because information needed from a secondary source is dimensional, not measurable.

The Solution: Upstream Data Manipulation

The surefire way to avoid these difficulties is to manipulate your data upstream from Tableau to ensure you have what you need before creating your visualizations. Investing in a data solution, which might be as simple as dedicated SQL views or as complex as a data warehouse, helps to ensure your data is combined sufficiently before it reaches Tableau.

A workaround that many companies use is to maintain an overall master file with all the descriptive information, for example a very large Excel file, and then pull in measures from dynamic SQL tables. You can quickly see that this can work for smaller companies for a short time, but as they grow, and for larger companies in the first place, this workaround becomes impractical as the Excel file would be too large and unwieldy.

Whatever method you choose, the point is to make sure you are manipulating and combining data before importing into Tableau. This has the added benefit of preventing slow load times in Tableau, as Tableau won’t perform costly data manipulations and instead create beautiful data visualizations.

The Importance of Data Governance

The above scenario makes a strong case for the role of data governance in an organization. Some organizations might be nervous at the idea of dedicated SQL views, warehouses or sandboxes, but if consistent processes have been established and standards followed for governing data, these concerns are alleviated. For example, organizations with strong data governance would find it easier to enable a “safe sandbox” environment for managing data upstream, embedding formulas at the analyst level, training analysts to do more self-service, and other measures for proper handling of data.

A Final Word

Tableau is a wonderful tool but it is not intended to be a panacea for all data-related challenges. Carefully managing your data upstream before it ever reaches Tableau will better allow you to leverage this amazing technology to visually understand your data.


Are you interested in strategic consulting, a project-based solution or managed services in the Business Intelligence, Data Governance, and Data and Analytics space? Tell us about your challenge to learn about our flexible engagement model and the approach we would take with your organization.

Discover the right solution for your business.

Request a Meeting

Whether you need help with technology strategy and implementation, an in-flight project in need of additional resources, or individual contributors to bridge critical gaps in your teams, AIM is here to help.

 


Ready for a career you love?

VIEW OPENINGS

  • This field is for validation purposes and should be left unchanged.

Request a Solution

Whether you need help with technology strategy and implementation, an in-flight project in need of additional resources, or individual contributors to bridge critical gaps in your teams, AIM is here to help.

Ready for a career you love?

View openings

  • This field is for validation purposes and should be left unchanged.