Outdated Version

You are viewing an older version of this section. View current production version.

Connecting to Tableau

You can use Tableau to connect to a cluster for exploring your data. This guide provides guidelines and best practices for developing business Tableau dashboards using SingleStore DB data and covers the following:

Installing Tableau Desktop

To develop business Tableau dashboards using SingleStore DB data, you first need to install Tableau Desktop.

If you do not have Tableau Desktop already installed on your PC or Mac, you can download a 14-day trial version from the Tableau Download site.

image

(Optional) Install Tableau Reader

You can use Tableau Reader to package a dashboard, along with the supporting data, and share it with the recipients, even if they do not have Tableau Desktop (trial or licensed version) installed.

Tableau Reader is available as a free download.

Connecting Tableau to SingleStore DB

After installing Tableau Desktop, connect it to SingleStore DB through the following steps:

  1. Download and install MySQL ODBC connector.

  2. Configure MySQL ODBC connector.

  3. Connect Tableau to SingleStore DB.

Download and Install ODBC Connector for MySQL

  1. Navigate to the MYSQL ODBC Connector site.

  2. Select Product Version as 5.3.12 and then choose your Operating System and OS Version from the drop-down lists. Newer 8.x versions of the ODBC connector may require additional configuration changes when connecting to a cluster. See the SingleStore DB documentation for more information.

    image

  3. Pick the appropriate installer option for your system and then click Download. After the download finishes, install the driver.

Configure MySQL ODBC Connector

You need an existing ODBC connection before connecting Tableau with SingleStore DB.

  1. If you are using Windows, open the search bar and type ODBC.

  2. Click Set up data sources (ODBC).

  3. Configure a new ODBC data source for SingleStore DB.

    a. Specify TCP/IP Server and Port as 3306.

    b. Specify User, Password, and Database.

    image

  4. Click Test to test the connectivity and click OK.

Connect Tableau to SingleStore DB

  1. Open Tableau Desktop and navigate to Connect > To a Server > More… > MemSQL.

    image

  2. In the Sign In window, specify the Server, Username, and Password.

  3. Select the Require SSL check box if the cluster has been configured for secure client connections.

    image

Setting Up SingleStore DB Data Source in Tableau

On the data source page, perform the following steps:

  1. The data source name defaults to the server value you entered previously. Enter a unique data source name to be used in Tableau. For example, use a data source naming convention that helps other users of the data source identify which data source to connect to.

  2. On the left pane, navigate to the Database drop-down list and select a database or search by database name.

  3. Under Table, select a table or search by table name.

  4. Drag a table to the canvas, and then select the sheet tab to start your analysis.

    image

You can also perform different joins, table appends, filters in the data source along with setting up live connections and extracts. More information is available on the Tableau Online Help.

After the data source set up, you are ready to visualize and analyze data.

Tableau Best Practices

Follow these strategies to build efficient Tableau dashboards.

Data Strategies

  1. Keep simple data sources: The performance of visualization depends on the underlying data sources. To improve performance, extract only the data that is needed for the worksheet to perform its analysis.

  2. Extract only the required data: Minimize joined tables. If the analysis requires data from joined tables, then edit the connection to remove unused data.

    Execute data source filters before executing traditional filters for context filtering so that the extract is smaller and takes less time to refresh. Although context filtering creates a flat table with initial performance issues, performance improves for subsequent views and filters.

    Use extracts filtered by context filters so that it contains only the data that is needed. Extracts are stored in an internal structure, which is easier for Tableau to query and access. Also, calculated fields are saved as actual data, saving further computation time. One of the drawback of using extracts is that the data is not real-time and a scheduled task is required to refresh the data.

  3. Hide unused columns: Hiding unused columns (dimensions/measures) minimizes extract refresh time or custom SQL query time. You can hide fields in the data window or data source, or allow Tableau to hide all unused fields before creating the extract in the Extract Data box.

  4. Use Extracts: Tableau uses different techniques to optimize the extract. You can also improve visualization by aggregating the data for a visible dimension, known as an aggregated extract. When users interact with an aggregated extract, all calculations and summations have already been compiled.

    Extracts can be filtered with data source filters, which can help to control the size of the extract in two different ways, depending on when the filter is applied:

    • If a data source filter is in place prior to extract creation, the extract will contain filtered records.

    • If a data source filter is put in place after extract creation, the filter will be applied against the full extracted data set. So, your extract will contain all the data but will only show what the data source filter is allowing.

Filtering

  1. Minimize quick filters: Quick filters require Tableau to run a query against the database to determine the values to display for the selected dimension. Therefore, the quick filters that do not require querying the database for values are custom value list, wildcard match, relative date filters, and the browse period date filter.

    It is recommended that you avoid quick filters that require knowledge of the values in the database such as multiple value list, single value list, compact list, slider, measure filters, and ranged date filters.

  2. Use ‘All values in database’ option in quick filters: Use the default option ‘All values in database’ for a quick filter and avoid using ‘Only relevant values’ option. The default option makes all values in the database for that particular field available for user selection. In contrast, ‘Only relevant values’ compares the values returned from the database with those in other quick filters to show only the values that apply, given the choice made on the other filters. This behavior can bring performance issues, especially if the dashboard contains more than two quick filters.

  3. Avoid quick filters or actions that generate context filters: Context filters create a context TEMP table with the values that go through the filter. All other filters access this TEMP table to draw their values from the limited set of data. This improves the performance of the dashboard; however, if the context filter does not trim down the data to a more manageable set in the new context table then it may cause a performance issue with the visualization.

    If you use a context filter, make the TEMP table as small as possible. For example, eliminate columns that are not needed for that particular visualization to reduce the size of the data set at least to one tenth of the original size. Also, the context filter should be used against slow-changing values or dimensions only.

  4. Keep range quick filters simple: To display results for across large separated periods of time, use a visualization rather than a quick filter. This is applicable for the following date filters:

  • Relative date filters, which are used for a date range that is relative to a specific date.

  • Range of date filters, which are used for a defined range of discrete dates.

  • Discrete date filters, which are used for the individual dates selected from a list. It is recommended to avoid discrete date filters.

  1. Replace quick filters with action filters: Instead of using multiple quick filters, ‘Only Relevant Values’ option, or quick filters with too many values, use actions filters as they do not require Tableau to run additional queries. These filters work on the users actions, such as clicking on a mark. Action filters can also operate as cascading filters in a filter hierarchy, where values are filtered out as they traverse through the hierarchy.

  2. Avoid using action filters from several sheets for a single dashboard layout as quick filters: Creating visualizations in different worksheets to use them later as action filters in a single dashboard generates extra load every time the dashboard is loaded, as the visualizations are refreshed with every action from the user. When this situation arises, quick filters may be a better solution as they are only loaded once when the dashboard is loaded, and then the filter is applied across all the sheets simultaneously.

SQL Code

  1. Limit the use of customer SQL code in live connections: SQL connections are issued to the database inside a subquery, which can include other clauses from Tableau like GROUP BY, ORDER BY, WHERE and more. Even with efficient SQL code, the extra clauses issued to the database can slow down performance. Use custom SQL only if Tableau cannot generate the desired outcome.

    If a SQL command is necessary, then create a view inside the database and connect to it from Tableau. If it is not possible to create a new view, build a data extract with the SQL code. It will run only once when the extract is built or refreshed, minimizing the effect on visualization performance.

  2. Remove extra clauses: Effective SQL code provides Tableau the required data to produce the desired outcome. Remove extra clauses in the SQL code for Tableau to organize and visualize data effectively.

Calculation

  1. Consider data types for faster calculation: Tableau provides a massive list of functions, divided into different categories that assist in creating calculated fields. The performance impact from different data types may be unnoticeable on smaller data sets, but these differences are more pronounced as the number of records increase. In general, the fastest calculations involve Boolean or number data types, followed by dates, and finally string calculations. It is important to consider ways to achieve same calculated results using faster data types.

  2. Avoid blended calculations: Blended calculations occur when you have to query different data sources to obtain a single calculated field in your visualization. In this case, Tableau needs to query each data source separately to retrieve the values. This can affect performance, especially in large data set. An alternative is to prepare a new view on the data layer on the database server to keep data processing outside Tableau.

  3. Avoid row-level calculations involving parameters: Row-level or record-level calculations operate on every record in the underlying data. Every row calculation consumes time, but when the parameter contains a significant variety of values, for example a table as parameter, it increases time processing exponentially.

Rendering

  1. Avoid high mark counts: Marks are the points, plots, or symbols on the visualizations. Each mark must be created and positioned before the report can be rendered. Use the capabilities of Tableau to drill down or interact with visualizations to obtain better results.

  2. Minimize the file size of images or custom shapes: Big images or shapes result in slow loading and exporting process. Keep images below 50Kb, 32x32 pixels in dimensions, and use efficient image compression formats to reduce load time.

Implementing TDC

Tableau Datasource Customization (TDC) helps optimize the interaction of the Tableau dashboard with a cluster. Since Tableau is designed to create TEMP tables, it can sometimes result in performance issues. TDC can stop creation of these tables, along with other configuration settings.

Setting up TDC File

Before you create a TDC file, ensure that you have the Tableau version and the driver name. The following are few sample codes that can be used to set up TDC files for SingleStore DB.

Sample 1

\<?xml version=\'1.0\' encoding=\'utf-8\' ?\>

\<connection-customization class=\'memsql\' enabled=\'true\'
version=\'10.0\'\>

\<vendor name=\'memsql\'/\>

\<driver name=\'memsql\'/\>

\<customizations\>

\<customization name=\'CAP\_ODBC\_BIND\_SUPPRESS\_WIDE\_CHAR\'
value=\'yes\' /\>

\<customization name=\'CAP\_CREATE\_TEMP\_TABLES\' value=\'no\' /\>

\<customization name=\'CAP\_QUERY\_SUBQUERY\_QUERY\_CONTEXT\'
value=\'yes\' /\>

\<customization name=\'CAP\_SELECT\_INTO\' value=\'no\' /\>

\<customization name=\'CAP\_ODBC\_BIND\_FORCE\_MEDIUM\_STRING\_BUFFERS\'
value=\'yes\' /\>

\</customizations\>

\</connection-customization\>

Sample 2


\<?xml version=\'1.0\' encoding=\'utf-8\' ?\>

\<connection-customization class=\'mysql\' version=\'2019.1\'
enabled=\'true\'\>

\<vendor name=\'mysql\' /\>

\<driver name=\'mysql\' /\>

\<customizations\>

\<customization name=\'CAP\_ODBC\_BIND\_SUPPRESS\_WIDE\_CHAR\'
value=\'yes\' /\>

\<customization name=\'CAP\_CREATE\_TEMP\_TABLES\' value=\'no\' /\>

\<customization name=\'CAP\_QUERY\_SUBQUERY\_QUERY\_CONTEXT\'
value=\'yes\' /\>

\<customization name=\'CAP\_SELECT\_INTO\' value=\'no\' /\>

\<customization name=\'CAP\_ODBC\_BIND\_FORCE\_MEDIUM\_STRING\_BUFFERS\'
value=\'yes\' /\>

\</customizations\>

\</connection-customization\>

Copy the code into a text editor and save the file with a .tdc file extension. On a Windows machine, place the .tdc file in the folder C:\Users\<user name>\Documents\My Tableau Repository\Datasources. Make sure there is only one .tdc file in this location.

Info

Tableau will not create or customize a connector to work with a specific ODBC driver.

For more information on TDC and customization options, see the documentation on ODBC capabilities and customization.

TDC Configuration Properties

As per the need and capability of the data source, configure the following properties for your TDC file:

Customization Property Recommended Setting
CAP_CREATE_TEMP_TABLES Set to yes if Tableau can create temporary tables to be used for complex or optimized queries. Also refer CAP_SELECT_INTO config property.
CAP_CONNECT_STORED_PROCEDURE Set to yes to allow support for connecting to a stored procedure.
CAP_FAST_METADATA Set to yes for small to moderate size schemas and to enhance performance when creating new connections. This property controls whether Tableau should enumerate all objects immediately when you connect. Disable this property to allow search for specific schemas or tables instead of retrieving all objects. You can search for all objects by using an empty string. This property is available for version 9.0 and later.
CAP_MULTIPLE_CONNECTIONS_FROM_SAME_IP Set to no to prevent Tableau from creating more than one active connection to the database. This is conventional setting that may increase stability but lower the performance.
CAP_ODBC_BIND_BOOL_AS_WCHAR_01LITERAL Set to yes to bind a Boolean data type as WCHAR containing values 0 or 1.
CAP_ODBC_BIND_BOOL_AS_WCHAR_TFLITERAL Set to yes to bind a Boolean data type as WCHAR containing values t or f.
CAP_ODBC_BIND_FORCE_DATE_AS_CHAR Set to yes to force the Tableau native ODBC protocol to bind date values as CHAR.
CAP_ODBC_BIND_FORCE_DATETIME_AS_CHAR Set to yes to force the Tableau native ODBC protocol to bind datetime values as CHAR.
CAP_ODBC_BIND_FORCE_MAX_STRING_BUFFERS Set to yes to force the Tableau native ODBC protocol to use maximum-size buffers (1MB) for strings instead of the size described by metadata.
CAP_ODBC_BIND_FORCE_MEDIUM_STRING_BUFFERS Set to yes to force the Tableau native ODBC protocol to use medium-sized buffer (1K) for strings instead of the size described by metadata.
CAP_ODBC_BIND_FORCE_SMALL_STRING_BUFFERS Set to yes to force the Tableau native ODBC protocol to use small buffers for strings instead of the size described by metadata.
CAP_ODBC_BIND_FORCE_SIGNED Set to yes to force binding integers as signed.
CAP_ODBC_BIND_SUPPRESS_COERCE_TO_STRING Set to yes to prevent the Tableau native ODBC protocol from binding non-string data as strings (requesting driver conversion).
CAP_ODBC_BIND_SUPPRESS_INT64 Set to yes to prevent the Tableau native ODBC protocol from using 64-bit integers for large numeric data.
CAP_ODBC_BIND_SUPPRESS_PREFERRED_CHAR Set to yes to prevent the Tableau native ODBC protocol from preferring a character type that differs from the driver default.
CAP_ODBC_BIND_SUPPRESS_PREFERRED_TYPES Set to yes to prevent the Tableau native ODBC protocol from binding any data according to its preferred wire types. Hence, Tableau will only bind according to the data types described by the ODBC driver via metadata.
CAP_ODBC_BIND_SUPPRESS_WIDE_CHAR Set to yes to prevent the Tableau native ODBC protocol from binding strings a WCHAR. Instead, they will be bound as single-byte CHAR arrays, and processed locally for any UTF-8 characters contained within.
CAP_ODBC_EXPORT_BUFFERS_RESIZABLE Set to yes to allow export buffers to be reallocated after the first batch to improve performance.
CAP_ODBC_EXPORT_BUFFERS_SIZE_MASSIVE Set to yes to force the use of large buffers for insert. If CAP_ODBC_EXPORT_BUFFERS_RESIZABLE is not set or disabled, a fixed row count is used.
CAP_ODBC_EXPORT_BUFFERS_SIZE_MEDIUM Set to yes to force the use of medium-sized buffers for insert. If CAP_ODBC_EXPORT_BUFFERS_RESIZABLE is not set or disabled, a fixed row count is used.
CAP_ODBC_EXPORT_BUFFERS_SIZE_SMALL Set to yes to force the use of small buffers for insert. If CAP_ODBC_EXPORT_BUFFERS_RESIZABLE is not set or disabled, a fixed row count is used.
CAP_ODBC_EXPORT_CONTINUE_ON_ERROR Set to yes to continue data insert despite errors. Some data sources report warnings as errors.
CAP_ODBC_FETCH_BUFFERS_RESIZABLE Set to yes to allow buffers to be reallocated after fetch to improve performance or handle data truncation.
CAP_ODBC_FETCH_BUFFERS_SIZE_FIXED Set to yes to ignore the width of a single row when computing the total rows to fetch.
CAP_ODBC_FETCH_BUFFERS_SIZE_MASSIVE Set to yes to force the use of large buffers. If CAP_ODBC_FETCH_BUFFERS_SIZE_FIXED is enabled, a fixed row count is used.
CAP_ODBC_FETCH_BUFFERS_SIZE_MEDIUM Set to yes to force the use of medium-sized buffers. If CAP_ODBC_FETCH_BUFFERS_SIZE_FIXED is enabled, a fixed row count is used.
CAP_ODBC_FETCH_BUFFERS_SIZE_SMALL Set to yes to force the use of small buffers. If CAP_ODBC_FETCH_BUFFERS_SIZE_FIXED is enabled, a fixed row count is used.
CAP_ODBC_FETCH_CONTINUE_ON_ERROR Set to yes to allow the Tableau native ODBC protocol to continue fetching resultset despite errors (some data sources report warnings as errors).
CAP_ODBC_FETCH_RESIZE_BUFFERS Set to yes to allow the Tableau native ODBC protocol to automatically resize buffers and fetch again, if data truncation occurred.
CAP_ODBC_METADATA_SUPPRESS_EXECUTED_QUERY Set to yes to prevent Tableau from executing a query as a means of reading metadata. While Tableau typically includes a row-limiting clause in such metadata queries (for example, LIMIT, or WHERE 1=0), this may not help when used with a Custom SQL connection for database systems with poor query optimizers. Note that this property may prevent Tableau from determining the connection metadata properly.
CAP_ODBC_METADATA_SUPPRESS_PREPARED_QUERY Set to yes to prevent Tableau from using a prepared query as a means of reading metadata. A prepared query is often the fastest way to accurately read metadata. However, not all database systems are capable of reporting metadata for a prepared query without actually executing the query. Note that certain metadata, for example from connections using Custom SQL, cannot be retrieved if this property and CAP_ODBC_METADATA_SUPPRESS_EXECUTED_QUERY are both set.
CAP_ODBC_METADATA_SUPPRESS_SELECT_STAR Set to yes to prevent reading metadata using a SELECT * query.
CAP_ODBC_SUPPRESS_PREPARED_QUERY_FOR_ALL_COMMAND_QUERIES Set to yes to execute all commands directly (no prepared statement).
CAP_ODBC_SUPPRESS_PREPARED_QUERY_FOR_DDL_COMMAND_QUERIES Set to yes to execute DDL commands for example, CREATE TABLE directly (no prepared statement).
CAP_ODBC_SUPPRESS_PREPARED_QUERY_FOR_DML_COMMAND_QUERIES Set to yes to execute DML commands for example INSERT INTO directly (no prepared statement).
CAP_ODBC_SUPPRESS_SYS_SCHEMA_STORED_PROCS Set to yes to explicitly add the “SYS” schema to the schema exclusions when enumerating stored procedures.
CAP_ODBC_TRANSACTIONS_COMMIT_INVALIDATES_PREPARED_QUERY Set to yes to indicate that a transaction will invalidate all prepared statements and close any open cursors.
CAP_ODBC_TRANSACTIONS_SUPPRESS_AUTO_COMMIT Set to yes to prevent the Native ODBC protocol from using default auto-committing transaction behavior in ODBC. This property cannot be used with CAP_ODBC_TRANSACTIONS_SUPPRESS_EXPLICIT_COMMIT property.
CAP_ODBC_TRANSACTIONS_SUPPRESS_EXPLICIT_COMMIT Set to yes to prevent the Native ODBC Protocol from explicitly managing transactions. This property cannot be used with CAP_ODBC_TRANSACTIONS_SUPPRESS_AUTO_COMMIT property.
CAP_ODBC_TRIM_CHAR_LEAVE_PADDING Set to yes to leave whitespace padding at the end of a character or text data type. Most data sources will trim this whitespace automatically, but the behavior depends on the driver.
CAP_ODBC_TRIM_VARCHAR_PADDING Set to yesto force the Tableau native ODBC protocol to trim trailing whitespace from VARCHAR columns, which the driver has erroneously padded.
CAP_QUERY_BOOLEXPR_TO_INTEXPR Set to yes if Tableau must force any Boolean expressions to an integer value in order to include in a resultset.
CAP_QUERY_FROM_REQUIRES_ALIAS Set to yes if the FROM clause must provide an alias for the given table.
CAP_QUERY_GROUP_BY_ALIAS Set to yes if SQL queries with aggregations can reference the grouping columns by their corresponding alias in the SELECT list, e.g. GROUP BY "none_ShipCountry_nk".
CAP_QUERY_GROUP_BY_DEGREE Set to yes if SQL queries with aggregations can reference the grouping columns by the ordinal position of each column, e.g. GROUP BY 2, 5. Also refer CAP_QUERY_SORT_BY_DEGREE property.
CAP_QUERY_HAVING_REQUIRES_GROUP_BY Set to yes if Tableau must use an artificial grouping field for any query which has a HAVING clause but no grouping columns.
CAP_QUERY_HAVING_UNSUPPORTED Set to yes if the SQL syntax for HAVING is unsupported. Instead, Tableau may use subqueries. Also refer CAP_QUERY_SUBQUERIES property.
CAP_QUERY_INCLUDE_GROUP_BY_COLUMNS_IN_SELECT Set to yes to require all GROUP BY expressions to appear in the SELECT expression list.
CAP_QUERY_JOIN_ACROSS_SCHEMAS Set to yes if SQL queries can express joins between tables located in different schemas.
CAP_QUERY_JOIN_ASSUME_CONSTRAINED Set to yes to cull inner joins even if the database tables do not have FK-PK relationships.
CAP_QUERY_JOIN_PUSH_DOWN_CONDITION_EXPRESSIONS Set to yes to rewrite joins to simplify the ON clause conditions to simple identifier comparisons.
CAP_QUERY_JOIN_REQUIRES_SCOPE Set to yes if SQL queries must scope each join clause within parentheses to ensure a proper order of evaluation.
CAP_QUERY_JOIN_REQUIRES_SUBQUERY Set to yes to force join expressions involving more than two tables to be composed with subqueries.
CAP_QUERY_NULL_REQUIRES_CAST Set to yes if the data source requires NULL literals to be cast to an explicit data type.
CAP_QUERY_SELECT_ALIASES_SORTED Set to yes if Tableau must impose a deterministic order on the SELECT expressions (sorted by alias) to ensure that query results can be properly matched with each field in the Tableau visualization. This is only required for data sources which do not preserve the aliases of the SELECT expressions when returning metadata with the query results.
CAP_QUERY_SORT_BY_DEGREE Set to yes if SQL queries can reference the sorting columns by the ordinal position of each column, e.g. ORDER BY 2, 5. Also refer CAP_QUERY_GROUP_BY_DEGREE property.
CAP_QUERY_SUBQUERIES Set to yes if the data source supports subqueries.
CAP_QUERY_SUBQUERIES_WITH_TOP Set to yes if the data source supports a TOP or LIMIT row-limiting clause within a subquery.
CAP_QUERY_SUBQUERY_DATASOURCE_CONTEXT Set to yes to use subquery filtered query context to implement data source filters. This property is available in Tableau 8.0 through Tableau 9.3 only.
CAP_QUERY_SUBQUERY_QUERY_CONTEXT Set to yes to force Tableau to use a subquery for context filters instead of a temporary table or locally cached results.
CAP_QUERY_TOP_N Set to yes if the data source supports any form of row-limiting clause.
CAP_QUERY_TOPSTYLE_LIMIT Set to yes if the data source uses LIMIT as the row-limiting clause.
CAP_QUERY_TOPSTYLE_ROWNUM Set to yes if the data source supports an Oracle-style filter on ROWNUM as the row-limiting clause.
CAP_QUERY_TOPSTYLE_TOP Set to yes if the data source uses TOP as the row-limiting clause.
CAP_QUERY_USE_QUERY_FUSION Set to no to prevent Tableau from combining multiple individual queries into a single combined query. Turn off this property for performance tuning or if the database is unable to process large queries. This property is enabled by default and is available in Tableau 9.0 and later for all data sources except Tableau data extracts. Support for this property in Tableau data extracts is available in Tableau 9.0.6.
CAP_SELECT_INTO Set to yes if Tableau can create a table from the resultset of another query. Also refer CAP_CREATE_TEMP_TABLES property.
CAP_SELECT_TOP_INTO Set to yes if Tableau can use a TOP or LIMIT row-limiting clause when creating a table from a query resultset.
CAP_SUPPRESS_DISCOVERY_QUERIES Set to yes to prevent Tableau from detecting the supported SQL syntax for a variety of clauses.
CAP_SUPPRESS_DISPLAY_LIMITATIONS Set to yes to suppress displaying any warnings about limitations for this data source.