Filter Duplicate Records in a Source File¶
If your source records may contain duplicates, and you want to process only the first record and filter out the duplicates, you can accomplish this using scripts and a condition. This process is also referred to as deduplication.
Once you have your Transformation in place, identify the field you want to use to check for duplicates.
The field Email was selected for this example.
Double-click this field in the target side to display the Formula Builder box:
In the upper left (Script) section, input these lines, as in the image above:
This will create a dynamic variable, set the name of that variable equal to the value in the field Email, set the value of the variable to "1", and then return the value of the field Email.
Click OK to save.
Once you have completed the step above, right-click on _flat_ below the target.
Select Add Condition. This adds another item at the very top of your target fields called Condition.
Double-click this Field to display the Formula Builder box.
Input this formula into the script section:
If(Get(Email)==1, False, True)
The Get function will use the value in the field Email to retrieve the dynamic variable that we created in the previous steps and check if the variable has a value of 1. If it does, it will skip over the record; otherwise, it will continue with the insertion.
- Click OK to save.
Our completed sample transformation should looks like this:
Once you have completed the steps outlined above, you will be able to run your transformation, and the duplicates will not be processed.