You are viewing an older version of this section. View current production version.
EXTRACT PIPELINE ... INTO OUTFILE
This command takes a sample of the data streaming into your pipeline and copies it into a file on disk. After your file has been created, you can pipe it back into your transform for iterative testing and debugging of the transform.
Syntax
EXTRACT PIPELINE pipe_line
[FROM 'source_partition'
[OFFSETS start_offset TO end_offset]
]
INTO OUTFILE 'file_name'
Remarks
pipe_line
is the configured pipeline.file_name
the output file containing your sample data.source_partition
is a source partition ID.start_offset
andend_offset
can be used to extract the exact range of sample data.
You cannot run EXTRACT PIPELINE
when the pipeline is in a Running
or Error
state.
Return Type
A file containing transform data that can be used during debugging operations. For example, the following will take the output file and pipe it into a transform file. This can show you any mistakes in how your transform code is applied to the data streaming into your pipeline.
$ cat sample_output | python transform.py
Examples
The following saves random sample data.
EXTRACT PIPELINE p INTO OUTFILE 'transform_output';
The following is useful if there is a specific partition or file with a known problem.
EXTRACT PIPELINE p FROM '6' INTO OUTFILE 'transform_output';
The following extracts an exact range of data, which is useful if the problematic data is in a specifically known kafka region.
EXTRACT PIPELINE p FROM '10' OFFSETS 0 TO 6 INTO OUTFILE 'transform_output';