XML mappings
Introduction
Extensible Markup Language (XML) is a markup language that provides rules to define any data. ndr_import allows that data to be mapped in the same way as tabular data.
XML can contain repeating data items/sections, which the column mappings would need to verbosely define. XML allows for unlimited repetition, so it would be very hard to have all columns accounted for.
NdrImport::Xml::Table only requires each column to appear in the mapping once. It will identify any repeating data item/section xpaths that haven’t been accounted for and create appropriate column mappings in memory.
The logic covers all current use cases; additional features may be needed if more use cases are identified.
NdrImport::Xml::Table
NdrImport::Xml::Table requires some additional configuration so that the “records” are correctly identified.
format- this should always bexml_tableso that ndr_import knows which handler to usexml_record_xpath- this is the xpath - relative to the root - that indicates the start of a new recordpattern_match_record_xpath- setting thistruetreats thexml_record_xpathas a regular expression; the default is to treat it as a stringslurp- setting this totruewill ensure the data is slurped; the default is to stream the XMLyield_xml_record- setting this to true will yield all “klasses” created from a single XML record (identified byxml_record_xpath); the default is to yield per klassxml_file_metadata- See xml file metadata
NdrImport::Xml::Table example:
Given the below example data:
<root>
<records>
<record_1>
<data_item>value</data_item>
<another_data_item>Another value</another_data_item>
</record_1>
<record_2>
<data_item>value</data_item>
<another_data_item>Another value</another_data_item>
</record_2>
</records>
</root>
The NdrImport::Xml::Table mapping might look like:
- !ruby/object:NdrImport::Xml::Table
filename_pattern: !ruby/regexp //
format: xml_table
xml_record_xpath: 'records\/record_\d+'
pattern_match_record_xpath: true
slurp: false
yield_xml_record: false
columns:
...
Column mappings:
The column should be the data item node name.
Outside of the normal column mappings (rawtext_name etc), columns will also define xml_cell. This is a hash containing configuration that ndr_import uses to find the data, identify if the data item is repeating, and then act accordingly.
xml_cell contains:
relative_path- this is the relative path from thexml_record_xpathto the data itemattribute- this is the attribute (if present) e.g. extension, codemultiple- does this data item appear more than once within a klass? If set to true, additional column mappings will be added with a_1,_2etc suffix on the rawtext_nameincrement_field_name- similar tomultipleabove, but adds a suffix to each mapped field when set to true. This should only be set to true if the column has mapped fieldsbuild_new_record- this is only needed where column level klasses are defined. This should be set to false if you do not want an additional klass added to the masked mappings. This might be where you have a repeating item within a klass, but you only expect one instance of that klass within a “record”, as defined byxml_record_xpathklass_section- this is the relative path fromxml_record_xpathto the section that would trigger a new klass. If there is a data item flagged asmultiple, the number of timesklass_sectionappears determines if the klass should have a#1,#2etc suffix.
Column examples:
<root>
<records>
<record_1>
<section_1>
<part_1>
<repeating_item code=value />
<repeating_item code=value />
<another_data_item>Another value</another_data_item>
</part_1>
</section_1>
</record_1>
</records>
</root>
The below examples assume we’re using the above NdrImport::XmlTable mapping
Example mapping for another_data_item which is a single, non-repeating data item:
- column: repeating_item
klass: SomeTestKlass
rawtext_name: blah
xml_cell:
id:
relative_path: section_1/part_1
attribute:
Example mapping for repeating_item:
- column: repeating_item
klass: SomeTestKlass
rawtext_name: blah
xml_cell:
id:
relative_path: section_1/part_1
attribute: code
multiple: true
increment_field_name: true
build_new_record: false
If the repeating_item was expected to be in many klasses - where the section_1 section triggered a new klass - the mapping would look like:
- column: repeating_item
klass: SomeTestKlass
rawtext_name: blah
xml_cell:
id:
relative_path: section_1/part_1
attribute: code
multiple: true
increment_field_name: true
klass_section: section_1
Cheat sheet
| Scenario | multiple | increment_field_name | build_new_record | klass_section |
|---|---|---|---|---|
| Single data item, non repeating | false or omit key | false or omit key | false or omit key | omit key |
| Repeating data item, single klass expected | true | true (if mapped fields present) | false | omit key |
| Repeating data item, one or more klasses expected | true | true (if mapped fields present) | omit key | relative path to section |