Outdated Version

You are viewing an older version of this section. View current production version.

EXTRACT PIPELINE ... INTO OUTFILE

This command takes a sample of the data streaming into your pipeline and copies it into a file on disk. After your file has been created, you can pipe it back into your transform for iterative testing and debugging of the transform.

Syntax

EXTRACT PIPELINE pipe_line
[FROM 'source_partition'
   [OFFSETS start_offset TO end_offset]
]
INTO OUTFILE 'file_name'

Remarks

pipe_line is the configured pipeline.
file_name the output file containing your sample data.
source_partition is a source partition ID.
start_offset and end_offset can be used to extract the exact range of sample data.

Info

You cannot run EXTRACT PIPELINE when the pipeline is in a Running or Error state.

Return Type

A file containing transform data that can be used during debugging operations. For example, the following will take the output file and pipe it into a transform file. This can show you any mistakes in how your transform code is applied to the data streaming into your pipeline.

$ cat sample_output | python transform.py

Examples

The following saves random sample data.

EXTRACT PIPELINE p INTO OUTFILE 'transform_output';

The following is useful if there is a specific partition or file with a known problem.

EXTRACT PIPELINE p FROM '6' INTO OUTFILE 'transform_output';

The following extracts an exact range of data, which is useful if the problematic data is in a specifically known kafka region.

EXTRACT PIPELINE p FROM '10' OFFSETS 0 TO 6 INTO OUTFILE 'transform_output';

Was this Article Helpful?

Questions?

Ask the Community, find solutions in the Troubleshooting Overview, get Self-Paced Training or learn more about our Support Plans.