Outdated Version

You are viewing an older version of this section. View current production version.

Filesystem Pipelines Quickstart

Alert
Filesystem Pipelines Requires MemSQL 5.8.5 or above.

Filesystem Pipeline Quickstart

To create and interact with a Filesystem Pipeline quickly, follow the instructions in this section.

Prerequisites

To complete this Quickstart, your environment must meet the following prerequisites:

  • Operating System: Mac OS X or Linux
  • Docker: Version 1.12 or newer. If using Mac OS X, these instructions are written for Docker for Mac. Docker Toolbox is compatible as well, but no instructions are provided. While Docker is required for this Quickstart, Pipelines and MemSQL itself have no dependency on Docker.

Part 1: Creating a MemSQL Database and Filesystem Pipeline in Docker

Now that you have a directory that contains a file, you can use MemSQL to create a new pipeline and ingest the messages. In this part of the Quickstart, you will create a Docker container to run MemSQL and then create a new Filesystem pipeline.

In a new terminal window, execute the following command:

docker run --name memsql -p 3306:3306 -p 9000:9000 memsql/quickstart:5.8.7

This command automatically downloads the memsql/quickstart Docker image from Docker Hub, creates a new container using the image, assigns the container a user-friendly name (memsql), and finally starts the container.

You will see a number of lines outputted to the terminal as the container initializes and MemSQL starts. Once the initialization process is complete, open a new terminal window and execute the following command:

docker exec -it memsql /bin/bash

This opens up a command line in the docker container. The next set of commands will create the test_directory locally in the docker container and create a data file with the data to load into the database.

mkdir test_directory /* Create directory called 'test_directory' */

cd test_directory /* Navigate to new directory */

cat > books.txt /* Create a new file called 'books.txt' */

/* Copy the following text to the terminal: */

The Catcher in the Rye, J.D. Salinger, 1945
Pride and Prejudice, Jane Austen, 1813
Of Mice and Men, John Steinbeck, 1937
Frankenstein, Mary Shelley, 1818

Ctrl-D

For the purposes of this example, we have created the books.txt file in the test_directory in the root of the filesystem. In a typical real-world scenario, however, the filesystem extractor would be used with an NFS-mounted drive, with the source directory located on a different server.

For the next step you will create the database and table to receive the data.

memsql

CREATE DATABASE books;

USE books;

CREATE TABLE classic_books
(
title VARCHAR(255),
author VARCHAR(255),
date VARCHAR(255)
);

These statements create a new database named books and a new table named classic_books, which has three columns: title, author, and date.

Now that the destination database and table have been created, you can create a Filesystem pipeline. In Part 1 of this Quickstart, you uploaded the books.txt file to your bucket. To create the pipeline, you will need the following information:

  • The name of the directory, such as: test_directory
CREATE PIPELINE library
AS LOAD DATA FS '/test_directory/*'
INTO TABLE `classic_books`
FIELDS TERMINATED BY ',';

This statement creates a new pipeline named library, but the pipeline has not yet been started. To start it, execute the following statement:

START PIPELINE library;

This statement starts the pipeline. To see whether to the pipeline is running, execute:

SHOW PIPELINES;

If the pipeline is successfully running, you will see the following result:

memsql> SHOW PIPELINES;
+----------------------+---------+
| Pipelines_in_books   | State   |
+----------------------+---------+
| library              | Running |
+----------------------+---------+
1 row in set (0.00 sec)

At this point, the pipeline is running and the contents of the books.txt file should be present in the classic_books table. Execute the following statement:

SELECT * FROM classic_books;

The result of this statement will show the following result:

memsql> SELECT * FROM classic_books;
+------------------------+-----------------+-------+
| title                  | author          | date  |
+------------------------+-----------------+-------+
| The Catcher in the Rye |  J.D. Salinger  |  1945 |
| Pride and Prejudice    |  Jane Austen    |  1813 |
| Of Mice and Men        |  John Steinbeck |  1937 |
| Frankenstein           |  Mary Shelley   |  1818 |
+------------------------+-----------------+-------+
4 rows in set (0.13 sec)

Next Steps

Now that you have a running pipeline, any new files you add to your bucket will be automatically ingested. To understand how a Filesystem pipeline ingests large amounts of files in a directory, see the Parallelized Data Loading section in the Extractors topic. You can also learn more about how to transform the ingested data by reading the Transforms topic.