Outdated Version

You are viewing an older version of this section. View current production version.

EXTRACT PIPELINE ... INTO OUTFILE

This command takes a sample of the data streaming into your pipeline and copies it into a file on disk. After your file has been created, you can pipe it back into your transform for iterative testing and debugging of the transform.

Syntax

EXTRACT PIPELINE pipe_line
[FROM 'source_partition'
   [OFFSETS start_offset TO end_offset]
]
INTO OUTFILE 'file_name'

Remarks

  • pipe_line is the configured pipeline.
  • file_name the output file containing your sample data.
  • source_partition is a source partition ID.
  • start_offset and end_offset can be used to extract the exact range of sample data.
Info

You cannot run EXTRACT PIPELINE when the pipeline is in a Running or Error state.

Return Type

A file containing transform data that can be used during debugging operations. For example, the following will take the output file and pipe it into a transform file. This can show you any mistakes in how your transform code is applied to the data streaming into your pipeline.

$ cat sample_output | python transform.py

Examples

The following saves random sample data.

EXTRACT PIPELINE p INTO OUTFILE 'transform_output';

The following is useful if there is a specific partition or file with a known problem.

EXTRACT PIPELINE p FROM '6' INTO OUTFILE 'transform_output';

The following extracts an exact range of data, which is useful if the problematic data is in a specifically known kafka region.

EXTRACT PIPELINE p FROM '10' OFFSETS 0 TO 6 INTO OUTFILE 'transform_output';