Getting Started
These notes are designed for users new to YAML mappings. Where appropriate, links will be added to more specific YAML documentation.
These mappings configure the Extract, Transform, Load (ETL) mapper within NdrImport. In its simplest form, the purpose of a mapping is to take the raw incoming data and identify it with the correct canonical name in the rawtext and where applicable map the value to a field, whilst also performing any cleaning/transformation of the data on import.
It’s important to note here that the data that goes into the rawtext remains untouched and is exactly as the data was loaded, any transformation/cleaning of data only takes place when it is mapped into a field. The raw source data remains unchanged in the rawtext so that it can be referenced when going back to data provider with any queries/QA issues. Furthermore, the rawtext maintains the transparency of any incoming data. This allows for internal audits to assess if any errors are the artifact of internal system processing or if the data was submitted containing these errors.
Example Data: Note: Real data typically contains many more fields than shown in these examples, however the rules described here are transferable where applicable.
Prov_code | Surname | Forename | dob | addr1 | addr2 | addr3 | postcode | adm_method |
---|---|---|---|---|---|---|---|---|
RGT01 | Mouse | Mickey | 31/12/1950 | 20 London Road | Disney World | London | AB12 3CD | 01 |
RGT01 | Simpson | Homer | 01/01/1970 | 742 Evergreen Terrace | Springfield | London | 01 |
The first step is to create a list of lower case column headings: Note: It is advised to seed this from the header row of the data to avoid any typing errors.
- column: prov_code
- column: surname
- column: forename
- column: dob
- column: addr1
- column: addr2
- column: addr3
- column: postcode
- column: adm_method
We will now take each of these columns in turn and label them with the appropriate canonical name for the rawtext and where applicable map the value to a field, whilst doing any cleaning or joining of fields. It is advised that you maintain a list of approved canonical names for each source type. NDR canonical names are maintained within their private ticketing system wiki.
Where the incoming column name differs from the of canonical rawtext_name, the rawtext_name needs to be defined in the mapping so that the data is labelled correctly in the rawtext. Looking at the first column prov_code, let’s assume that this differs from the canonical name providercode. Therefore we’ll define the rawtext_name:
- column: prov_code
rawtext_name: providercode
In this example, let’s assume that prov_code needs to be mapped to the field providercode:
- column: prov_code
rawtext_name: providercode
mappings:
- field: providercode
Moving onto the second column, surname, let’s assume that the incoming column name is the same as the canonical name therefore we do not need to define the rawtext_name. However we do need to map the column to the field surname:
- column: surname
mappings:
- field: surname
clean: :name
You’ll notice I have also added ‘clean: :name’ to the surname mapping. There is a list of Inbuilt Cleaning Methods which are used to clean data when it is mapped to a field.
For a selection of data items, this is very common behavior and so the Standard YAML mappings syntax was created. In this particular instance, the surname mapping can be simply defined as:
- standard_mapping: surname
Let’s assume that the forename column differs from the canonical name forenames, so we need to define the rawtext_name in the mapping. However, the forenames field benefits from the functionality of the Standard YAML mappings. So instead of having to use the below mapping for the column forename:
- column: forename
rawtext_name: forenames
mappings:
- field: forenames
clean: :name
We can overide the standard mapping’s column definition, using:
- standard_mapping: forenames
column: forename
Again, let’s assume that the dob column also differs from the canonical name dateofbirth. So the rawtext_name will be defined in the mapping, as well as the field name. If date is loaded as a string (ie not a Date formatted Excel cell), the format of the incoming date also needs to be defined:
- column: dob
rawtext_name: dateofbirth
mappings:
- field: dateofbirth
format: %d/%m/%Y
For further information regarding date formats, please visit Date Formats.
In the example data, the address information is sent in a structured format over 3 separate columns, addr1, addr2 and addr3. In order to map this data to the address field, we will need to join the 3 columns.
Firstly, let’s assume that the incoming column names do not match the canonical name. We use the multiples convention shown below. We then need to define the field the data is being mapped to, the order in which each column appears in that field and then how we would like to join the data.
For example:
- column: addr1
rawtext_name: address1
mappings:
- field: address
order: 1
join: ', '
- column: addr2
rawtext_name: address2
mappings:
- field: address
order: 2
- column: addr3
rawtext_name: address3
mappings:
- field: address
order: 3
Here the we have used the rawtext_name multiples convention for rawtext_name: address (address1..N), specified the order in which we would like each of the columns to appear in the field and then joined the data using ‘, ‘. Note: join: only needs to be defined in the first of the columns being joined. The above mapping would result in the address below for the first example record:
20 London Road, Disney World, London
The penultimate column to map is the postcode column. Let’s assume that the incoming column name is the same as the canonical name and that there is a mapped field postcode which benefits from the Standard YAML mappings functionality.
Therefore, the mapping is:
- standard_mapping: postcode
Finally, the last column to map is adm_method, again let’s assume that the incoming column name is different to the canonical name admissionmethod, therefore we need to define the rawtext_name. There is no mapped field for this data item, so the mapping will simply be:
- column: adm_method
rawtext_name: admissionmethod
This is the complete mapping for the example data:
- column: prov_code
rawtext_name: providercode
mappings:
- field: providercode
- standard_mapping: surname
- standard_mapping: forenames
column: forename
- column: dob
rawtext_name: dateofbirth
mappings:
- field: dateofbirth
format: %d/%m/%Y
- column: addr1
rawtext_name: address1
mappings:
- field: address
order: 1
join: ', '
- column: addr2
rawtext_name: address2
mappings:
- field: address
order: 2
- column: addr3
rawtext_name: address3
mappings:
- field: address
order: 3
- standard_mapping: postcode
- column: adm_method
rawtext_name: admissionmethod
Common Gotchas
- Not labelling the correct rawtext_name. Please ensure that the canonical names are used exactly, if you receive a column that isn’t listed in the canonical names, please get in touch with your canonical name list maintainer.
- Maintain the indentation of the mapping