Stream and batch transformations in Jitterbit Design Studio
Jitterbit supports these methods for processing a transformation:
Streaming and batch transformations are the preferred methods to use when the amount of memory a Jitterbit transformation uses needs to be limited. In cases where you are unable to use either a streaming or batch transformation, chunking may be applicable as described below.
Note
From the perspective of a transformation, a SOAP web service response/request corresponds to an XML source/target when considering these limitations on processing.
Streaming transformation
A streaming transformation loads one record at a time into memory, performs the transformation of the record, and writes the target to disk. This minimizes the amount of memory that is used during the transformation to what is needed to transform one record.
Streaming is automatically applied to transformations where the source and target are both flat structures (for example, a single database table or a single CSV file) and these requirements are fulfilled:
-
Streaming has not been explicitly disabled by setting
AutoStreaming=0
in thejitterbit.conf
file. -
Streaming has not been explicitly disabled by setting the Jitterbit variable
$jitterbit.transformation.auto_streaming
to0
orfalse
. -
No instance-resolving functions are used, such as
FindByPos
,FindValue
, orSum
. -
These dictionary and array functions are not present:
GetSourceInstanceMap
,GetSourceAttrNames
,GetSourceElementNames
,GetSourceInstanceElementMap
,GetSourceInstanceArray
,GetSourceInstanceElementArray
. -
The XML function
GetXMLString
is not present. -
There are no multiple mappings in the target.
-
The transformation does not have a condition defined in the target.
No other action is needed; streaming will automatically be used by the transformation when it is processed.
Example
Transformations that automatically use streaming include these data structures:
- CSV to CSV
- Single table to single table
- Single table to CSV
- CSV to single table
Batch transformation
With transformations that do not meet the criteria for streaming, the entire source is read into memory and the transformation is performed in memory. This method is usually the most efficient in terms of time, but can lead to out-of-memory errors if the source is very large. In those cases, either a batch transformation or chunking must be used.
A batch transformation is similar to a streaming transformation, but it processes several source records at a time (the batches) and has fewer limitations than streaming. It can be used in cases where streaming is not automatically used and is applicable for hierarchical sources (for example, for multiple database tables with one or more parent-child relationship or for hierarchical file formats.
To enable a batch transformation, right-click on the source node that is to become the batch source node and from the menu displayed select Define batch transformation.
In the dialog that appears, enter the maximum batch size. This is the number of records that should be read into memory in each batch. Choosing a batch size that is too small will slow the transformation. However, all the source and target data for a batch must fit into the available memory.
Example
Batch transformations can be used in these cases where the source is very large:
- Hierarchical database to either CSV or a single table
- Hierarchical database to hierarchical text/database
- Hierarchical text to either CSV or a single table
- Hierarchical text to hierarchical text/database
Chunking
In a situation where a streaming or batch transformation is neither applicable or possible, you may be able to use chunking to make the transformation require less memory. For very large XML sources and targets, chunking may be the only option. (If memory use is not an issue, a streaming or batch transformation is always the preferred choice.) For more information, see Chunking.