You are viewing an older version of this section. View current production version.
EXTRACT PIPELINE ... INTO OUTFILE
This command takes a sample of the data streaming into your pipeline and copies it into a file on disk. After your file has been created, you can pipe it back into your transform for iterative testing and debugging of the transform.
EXTRACT PIPELINE pipe_line [FROM 'source_partition' [OFFSETS start_offset TO end_offset] ] INTO OUTFILE 'file_name'
pipe_lineis the configured pipeline.
file_namethe output file containing your sample data.
source_partitionis a source partition ID.
end_offsetcan be used to extract the exact range of sample data.
You cannot run
EXTRACT PIPELINE when the pipeline is in a
A file containing transform data that can be used during debugging operations. For example, the following will take the output file and pipe it into a transform file. This can show you any mistakes in how your transform code is applied to the data streaming into your pipeline.
$ cat sample_output | python transform.py
The following saves random sample data.
EXTRACT PIPELINE p INTO OUTFILE 'transform_output';
The following is useful if there is a specific partition or file with a known problem.
EXTRACT PIPELINE p FROM '6' INTO OUTFILE 'transform_output';
The following extracts an exact range of data, which is useful if the problematic data is in a specifically known kafka region.
EXTRACT PIPELINE p FROM '10' OFFSETS 0 TO 6 INTO OUTFILE 'transform_output';