This guide covers various ways of loading data into the system. We recommend trying option 1, and if that is not sufficient, trying option 2 then option 3.
- Option 1: Loading from a file
- Option 2: Streaming data using MemSQL Pipelines.
- Option 3: Loading from MySQL
Option 1: Loading Data stored in a file
MemSQL supports standard SQL loading constructs. For instance, after creating tbl_name, run:
LOAD DATA INFILE 'file_name.tsv' INTO TABLE tbl_name FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
Option 2: Streaming Data Using MemSQL Pipelines
Data can be streamed from a variety of sources directly to MemSQL using MemSQL Pipelines. Pipelines currently supports streaming data from the following sources:
- Amazon S3
- Azure Blob
- A Filesystem
You can set up and start running a MemSQL Pipeline using two simple commands:
CREATE PIPELINE my_pipline AS LOAD DATA...
The required arguments for
CREATE PIPELINE are specific to each source. For more information on getting started with the source of your choosing, please see the following Quickstart guides:
- Kafka Pipelines Quickstart
- S3 Pipelines Quickstart
- Azure Blob Pipelines Quickstart
- Filesystem Pipelines Quickstart
- HDFS Pipelines Overview
Option 3: Loading from MySQL
The popular open source tool
mysqldump is a straightforward way to load data from MySQL into MemSQL. For example, log into a machine running a MySQL database (named
foo in the following examples), and issue the following commands:
$ mysqldump -h 127.0.0.1 -u root --databases foo > foodump.sql
You can also dump specific tables by using the
$ mysqldump -h 127.0.0.1 -u root --databases foo --tables tbl_name > foodump.sql
Since MemSQL is a distributed database, you need to tell MemSQL how to shard the data across the cluster. If all your tables have primary keys, MemSQL will automatically use the primary keys to shard them, so you can skip this next step. Otherwise, you need to open up foodump.sql in your favorite editor and either add shard keys or declare the tables as reference tables (for more details, see How to Port Your Applications to MemSQL.
For instance, suppose foodump.sql contains
CREATE TABLE `users` ( `id` bigint(20) NOT NULL, `username` varchar(11) DEFAULT NULL, `address` varchar(10) );
To make this a valid MemSQL table, consider changing it to
CREATE TABLE `users` ( `id` bigint(20) NOT NULL, `username` int(11) DEFAULT NULL, `address` varchar(10), SHARD KEY(`id`) );
For more information on choosing a shard key, see Primary Key as the Shard Key. Once all your tables have either shard key or primary key declarations, you can run the following on the master aggregator:
mysql -h 127.0.0.1 -u root < foodump.sql
All tables and data from users will now be loaded into MemSQL.