Outdated Version

You are viewing an older version of this section. View current production version.

Connecting to Talend Open Studio

Talend Open Studio (TOS) for data integration is one of the most powerful data integration ETL tools available in the market. TOS lets you to easily manage all the steps involved in the ETL process, beginning from the initial ETL design through the execution of ETL data load.

The purpose of this document is to provide guidelines when integrating Talend Open Studio with SingleStore DB database using SingleStore (MemSQL) Connector.

The certification matrix below shows the latest versions tested for SingleStore DB and Talend Open Studio:

Certification Matrix Versions
Talend Open Studio 7.2
MemSQL 6.7 and newer
MariaDB JDBC Driver 2.3.0

Prerequisites

The following prerequisites need to be satisfied before making the connection:

  1. Download Talend Open Studio. Place the executable in a directory of your choosing.

  2. Make sure the SingleStore DB Cluster is running on the server.

  3. Download MariaDB JDBC jar from here.

Note: Talend Studio requires specific third-party Java libraries or database drivers (.jar files) to be installed to connect to sources and targets. These libraries or database drivers, known as external modules, can be installed during job creation and execution.

Creating and Designing a Talend Job

Follow the steps below to create a new project:

  1. Double click the Talend Executable to launch Talend Open Studio.

  2. The Talend Studio window will be displayed, select Create a new project. Provide a name for the project.

  3. Click Finish. The project will open in Studio. This can take some time for synchronizing the repository and other routines.

  4. On the right panel under the Repository tab, right-click the Job Designs node and select Create Job from the menu.

  5. In the New Job wizard, enter name of the job and other information. Click Finish.

  6. A design workspace is displayed. The Job created is listed under the Job Designs node in the Repository tree view.

Adding Components to the Job

Now that the Job is created, components have to be added to the design workspace to connect to SingleStore DB Database.

There are three SingleStore DB components available:

  1. MemSQLQueryInput
  2. MemSQLTableNameInput
  3. MemSQLOutput

On the Designer Tab, enter the name of the component and double click on the highlighted option. The component be get added to the design workspace.

Note: Details and properties of the above components are provided at the end of this document.

Configuring Components and Running the Job

  1. Add the MemSQLTableNameInput to the workspace and double click it.

  2. Fill in the required details under the Component tab.

    image

  3. Check and verify that the connection is successful.

  4. Click on the tableName and in the pop up, select the required table from the list of tables available in the database. Click OK.

  5. Click on Guess Schema, verify the inputs in the schema pop up, and click OK.

  6. Add the MemSQLQueryInput to the workspace and double click it.

  7. Fill in the required details under the Component tab.

    image

  8. Check verify that the connection is successful.

  9. In the Query box, enter the Select query for the tables to be imported.

  10. Click on Guess Schema, verify the inputs in the schema pop up, and click OK.

  11. In the workspace area, right click on the MemSQLQueryInput component, select Row->Main, and connect it with the MemSQLOutput component.

  12. Double click on MemSQLOutput and provide the details in the Component tab as shown.

    image

  13. Click on the tableName box, select the table name where the data is to be inserted. For a new table, click Custom and provide the name of the new table. Click OK.

  14. In Action, check the action that is to be performed on the target table.

  15. Check the box Create table if not exists if the target table is not created.

  16. Click on the Sync columns.

  17. Go to the Run Job Tab and Click on Run.

  18. After the Job is completed, verify the data loaded in the database table from the MemSQL prompt. This can also be verified on the Talend Job window.

MemSQL Job Components - Details and Properties

MemSQLTableNameInput

Executes a DB query with a strictly defined order which must correspond to the schema definition.

This component connects to a given database and extracts fields based on a user-defined query. Then it passes the selected data to the next component via a Main row link.

Standard Properties

These properties are used to configure MemSQLTableNameInput running in the Standard Job framework.

The Standard MemSQLTableNameInput component belongs to the Databases families.

The component in this framework is available in all Talend products with Big Data.

Basic Settings

Property Type Either Built-in or Repository.
Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.
Built-in: No property data stored centrally.
Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved.
Use an existing connection Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

Note: When a Job contains the parent Job and the child Job, if you need to share an existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:

1. In the parent level, register the database connection to be shared in the Basic settings view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database connection.

For an example about how to share a database connection across Job levels, see Talend Studio User Guide.
jdbcUrl Enter the jdbc compliant URL to SingleStore DB along with additional jdbc parameters.

This component utilizes the MariaDB driver.
Username and Password Enter the authentication data used to connect to the SingleStore DB database to be used.
Schema and Edit Schema A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.
Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.
Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.
Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:
View schema: choose this option to view the schema only.
Change to built-in property: choose this option to change the schema to Built-in for local changes.
Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
Table Name Name of the table to be read.

MemSQLQueryInput

Executes a DB query with a strictly defined order which must correspond to the schema definition.

This component connects to a given database and extracts fields based on a user-defined query. Then it passes the selected data to the next component via a Main row link.

Standard Properties

These properties are used to configure MemSQLQueryInput running in the Standard Job framework.

The Standard MemSQLQueryInput component belongs to the Databases families.

The component in this framework is available in all Talend products with Big Data.

Basic Settings

Property type Either Built-in or Repository.
Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.
Built-in: No property data stored centrally.
Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved.
Use an existing connection Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

Note: When a Job contains the parent Job and the child Job, if you need to share an existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:

1. In the parent level, register the database connection to be shared in the Basic settings view of the connection component which creates that very database connection.

2. In the child level, use a dedicated connection component to read that registered database connection.

For an example about how to share a database connection across Job levels, see Talend Studio User Guide.
jdbcUrl Enter the jdbc compliant URL to SingleStore DB along with additional jdbc parameters.

This component utilizes the MariaDB driver.
Username and Password Enter the authentication data used to connect to the SingleStore DB database to be used.
Schema and Edit Schema A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.
Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.
Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.
Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

View schema: choose this option to view the schema only.
Change to built-in property: choose this option to change the schema to Built-in for local changes.
Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
Query Must be a SELECT only query

MemSQLOutput

Reads data incoming from the preceding component in the Job and executes the action defined on a given SingleStore DB table and/or on the data contained in the table.

MemSQLOutput connects to a given SingleStore DB database and inserts, updates, upserts, deletes, and bulk loads in that database.

Standard Properties

These properties are used to configure MemSQLOutput running in the Standard Job framework.

The Standard MemSQLOutput component belongs to the Big Data and the Databases families.

The component in this framework is available in all Talend products with Big Data.

Basic Settings

Property type Either Built-in or Repository.
Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.
Built-in: No property data stored centrally.
Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved.
Use an existing connection Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

Note: When a Job contains the parent Job and the child Job, if you need to share an existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:

1. In the parent level, register the database connection to be shared in the Basic settings view of the connection component which creates that very database connection.

2. In the child level, use a dedicated connection component to read that registered database connection.

For an example about how to share a database connection across Job levels, see Talend Studio User Guide.
jdbcUrl Enter the jdbc compliant URL to SingleStore DB along with additional jdbc parameters.

This component utilizes the MariaDB driver.
Username and Password Enter the authentication data used to connect to the SingleStore DB database to be used.
Table Enter the name of the table to be written. Note that only one table can be written at a time
If table is not present, then under Custom you can enter the name of the table. The ‘Create If Not Exists’ property must also be selected as this will then create the table in the SingleStore DB instance.
Create table if not exists The table is created if it does not exist.
varcharLength If ‘Create table if not exists’ is checked then you can enter the max size of varchar fields.

Default: -1 This means typically 255
keys If ‘Create table if not exists’ is checked OR Action on Data is Update / Upsert. Then user can enter which field(s) should be marked as keys
ignoreUpdate If Action on Data is Update / Upsert then user can enter which field(s) should be ignored when component performs Update statement.
rewriteBatchStatements If checked will create a single batch operation to be executed on the SingleStore DB instance
truncateTable If checked will truncate data in the selected table before performing the given action selected.
Action on data On the data of the table defined, you can perform:

Insert: Add new entries to the table. If duplicates are found, the job stops.

Update: Make changes to existing entries.

Upsert: Insert a new record. If the record with the given reference already exists, an update would be made.

Delete: Remove entries corresponding to the input flow.

Bulk Load: Uses the SingleStore DB bulk load command

Warning:
You must specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the update and delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column name on which you want to base the update operation. Do the same in the Key in delete column for the deletion operation.

Schema and Edit schema A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.
Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.
Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.

When the schema to be reused has default values that are integers or functions, ensure that these default values are not enclosed within quotation marks. If they are, you must remove the quotation marks manually.

For more details, see Verifying default values in a retrieved schema.
Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

View schema: choose this option to view the schema only.

Change to built-in property: choose this option to change the schema to Built-in for local changes.

Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion.

If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

Advanced Settings

Use Batch Size Select this check box to activate the batch mode for data processing. In the Batch Size field that appears when this check box is selected, you can type in the number you need to define the batch size to be processed.