In this pattern, a source record does not contain all the data required by the target. For example, in the case of a parent-child relationship, if a child record is to be updated in a target, typically a parent record reference is required by the target. Or, if working with a product record, the source record may only have a SKU number, but the target also wants a group or class reference.
One way to handle this is to do in-line or dynamic lookup, by inserting a function call in the transformation, such that as each record is read, a lookup is made to a different object, and that value is populated in the field. Functions that can perform this are:
- DBLookup is useful for getting a single field in a database source.
- DBLookupAll and DBExecute, are helpful if more than one field needs to be fetched.
- If SFDC holds the data needed, then SFLookup, SFLookupAll and SFCacheLookup could be used. SFCacheLookup has the added efficiency of caching such that the login process can be skipped if the SFDC session is still active.
However, there is a cost to using these functions. Take using SFLookup in a transformation as an example, where it is being used to get an ID from an object. When a source data set is passed through a transformation, each record is evaluated and all script functions are executed individually. If there are 10 records to be processed, there will be 10 calls to SFDC. If the integration processes a small number of records, then using in-line lookups is a great solution. But what if the universe of valid lookup values is not 1,000, but 10,000 records are to be processed. Performing an additional 10,000 calls in a transformation, where each call takes at least a second, is very inefficient and slow. The far better way is to pay the small time penalty to lookup the 1,000 records into a dictionary and use that for lookups.
What is a dictionary?
In Jitterbit, a dictionary is a special type of global variable array that holds a key-value pair. The steps and functions are:
- Initialize the dictionary using the Dict function.
- Load data with a key and a value, like '4011', 'Banana', '4061', 'Lettuce', '4063', 'Tomato'. Use the AddtoDict function.
- Lookup a key, first check if key already exists using the HasKey function.
- Lookup data by passing the key (4011) and getting back the value (Banana). The script uses a '[ ]' after the dictionary name, like:
$d["key"] = "value"
A dictionary enables our scenario such that one operation can get data from the source and load the dictionary, making it available to other operations for lookups. An initial operation can load the dictionary with 10,000 records and a later operation can quickly pass a key to the dictionary and get a value.
A number of things to keep in mind with dictionaries:
- The scope of dictionaries are limited to the instance of the chain of operations. For example, if operation A loads $MyDict with 10,000 records, only those operations that are linked using Success or Failure paths, or with RunOperation() will have access to that dictionary. But, if an operation uses chunking and threading, and has a transformation that populates a dictionary, the dictionary will be inconsistent. This is because Jitterbit does not take the values assigned to variables by multiple operation threads and concatenate into a single value set. This is true for all global variables or arrays. Use the default chunking/threading values when building an operation that populates dictionaries.
- Dictionaries, because they use a binary search, are very fast at finding keys and returning values. A key can usually be found within five to six tries. In contrast, compare this type of search to looping through a 10,000 record array to find a key.
- Dictionaries are not written to memory, so they will not materially impact available server memory for processing.
The customer has two tables, one with product information and the other with product categories, both of which are needed to update a 3rd table. The source is a view on a data warehouse, which is optimized to provide data in bulk, but not for rapid lookups. Using the DBLookup function for thousands of records would be quite slow. Also, the customer has a CSV file that contains information used to filter out data from the source
PWE.01 Get LR List
- This reads an external file into temporary storage
PWE.02 Set Product Dict
- A script initializes the Dictionary:
- Reads from a source temporary file
- Transformation loads the values into a dictionary:
- Note here that the value is actually 2 values, separated by a '|'. The alternative would be to create 2 dictionaries, one for Flag and another for Launch_Release_Date, which would be unnecessary complexity.
PWE.03 Query Product from Teradata
- The transformation has a condition to filter out products not in the CSV file, and which also assigns values to variables that are used in the transformation.
- Note that we are splitting out (with a '|') what was loaded in the previous operation and loading into $Product.Flag and $Product.ReleaseDate.
PWE.04 Product Categories Query
This is a bulk SFDC query (not SOAP), which does not impact the customer API limits. It loads data into a local temporary file.
PWE.05 Product Category Dict
- This loads a dictionary with the Code as a key, and the SFDC data as a value
PWE.06 Process Style
- The dictionary is initialized:
- Transformation looks up the ID in the Product_Category field
- Note the use of HasKey to check if the key exists, instead of just ...
- If a key is passed that does not exist in the dictionary, then an error will be thrown since we are trying to lookup a non-existent value.
PWE.07 Bulk Upsert Styles
- Finally the file is bulk loaded to SFDC.
In this example we are processing an XML response, extracting the SKU's and values into a dictionary for use in a later operation.
- This script is in the second to last post-op script:
- This script does a number of things, so these comments refer to the actions taken relative to dictionaries.
- The dictionary DCL.LineItem is initialized.
- Local variables 'mysku' and 'myquantity' are populated from the XML source.
- DCL.LineItem is filled in using 'mysku' as the key, and 'myquantity' as the value.
- The dictionary is used in a later operation, F.5.2.4
- The transformation displays the mapping to Shipped Qty.
- ShippedQty script uses the source field as a key to retrieve the shipped quantity and fill in the value.
- Note that we check (using HasKey) if the key exists or not.
Last updated: Apr 19, 2019