Outdated Version

You are viewing an older version of this section. View current production version.

EXTRACT PIPELINE ... INTO OUTFILE

Info

MemSQL Helios does not support this command.

This command takes a sample of the data streaming into your pipeline and copies it into a file on disk. After your file has been created, you can pipe it back into your transform for iterative testing and debugging of the transform.

Syntax

EXTRACT PIPELINE pipe_line
[FROM 'source_partition'
   [OFFSETS start_offset TO end_offset]
]
INTO OUTFILE 'file_name'

Remarks

pipe_line is the configured pipeline.
file_name the output file containing your sample data.
source_partition is a source partition ID.
start_offset and end_offset can be used to extract the exact range of sample data.

Info

You cannot run EXTRACT PIPELINE when the pipeline is in a Running or Error state.

Return Type

A file containing transform data that can be used during debugging operations. For example, the following will take the output file and pipe it into a transform file. This can show you any mistakes in how your transform code is applied to the data streaming into your pipeline.

$ cat sample_output | python transform.py

Examples

The following saves random sample data.

EXTRACT PIPELINE p INTO OUTFILE 'transform_output';

The following is useful if there is a specific partition or file with a known problem.

EXTRACT PIPELINE p FROM '6' INTO OUTFILE 'transform_output';

The following extracts an exact range of data, which is useful if the problematic data is in a specifically known kafka region.

EXTRACT PIPELINE p FROM '10' OFFSETS 0 TO 6 INTO OUTFILE 'transform_output';

Was this Article Helpful?

Questions?

Ask the Community, find solutions in the Troubleshooting Overview, get Self-Paced Training or learn more about our Support Plans.