When defining a schema, some data may be converted during processing. This page describes how this data is converted.
Fields with Spaces
Spaces included as part of a field name are replaced with an underscore _.
CSV with Headers
If providing a CSV file with a header row, these rules are applied to generate column names:
Special characters are replaced with a question mark ?.
Spaces are replaced with an underscore _.
If a column name is blank, it is replaced with f1, f2, f3, and so on.
If a column name starts with a number, it is prefixed with an underscore _.
If a column name is repeated, it is appended with 2, 3, 4, and so on.
Unlimited-Precision Data Types
Unlimited-precision data types, such as XML decimal, are converted to double data types and therefore have a precision limit imposed on them that could, as a result, truncate data.
The precision limit is within the range of the min and max values of a signed long, which is –2,147,483,648 to 2,147,483,647. If outside of this range, consider using a string data type.
Non-Primitive Data Types
Primitive data types including string, integer, long, date, float, double, and boolean are fully supported. When creating a new custom flat or hierarchical schema or editing any schema, these data types are available to choose from in the Type dropdown by default. For new custom schemas, non-primitive data types such as datetime are not supported.
However, non-primitive data types are supported for schemas that are automatically generated from a server-based connector or were mirrored from such a schema. After they are generated, such schemas can also be manually edited in the custom schema editor. If the schema contains any non-primitive data types, these will also be listed in the Type dropdown when editing such a schema.
Fields that have a null value are included in the resulting data schema despite having no data. As they also have no defined data type, these fields are treated as having a string data type.