XML mappings
Introduction
Extensible Markup Language (XML) is a markup language that provides rules to define any data. ndr_import
allows that data to be mapped in the same way as tabular data.
XML can contain repeating data items/sections, which the column mappings would need to verbosely define. XML allows for unlimited repetition, so it would be very hard to have all columns accounted for.
NdrImport::Xml::Table
only requires each column to appear in the mapping once. It will identify any repeating data item/section xpaths that haven’t been accounted for and create appropriate column mappings in memory.
The logic covers all current use cases; additional features may be needed if more use cases are identified.
NdrImport::Xml::Table
NdrImport::Xml::Table
requires some additional configuration so that the “records” are correctly identified.
format
- this should always bexml_table
so that ndr_import knows which handler to usexml_record_xpath
- this is the xpath - relative to the root - that indicates the start of a new recordpattern_match_record_xpath
- setting thistrue
treats thexml_record_xpath
as a regular expression; the default is to treat it as a stringslurp
- setting this totrue
will ensure the data is slurped; the default is to stream the XMLyield_xml_record
- setting this to true will yield all “klasses” created from a single XML record (identified byxml_record_xpath
); the default is to yield per klassxml_file_metadata
- See xml file metadata
NdrImport::Xml::Table
example:
Given the below example data:
<root>
<records>
<record_1>
<data_item>value</data_item>
<another_data_item>Another value</another_data_item>
</record_1>
<record_2>
<data_item>value</data_item>
<another_data_item>Another value</another_data_item>
</record_2>
</records>
</root>
The NdrImport::Xml::Table
mapping might look like:
- !ruby/object:NdrImport::Xml::Table
filename_pattern: !ruby/regexp //
format: xml_table
xml_record_xpath: 'records\/record_\d+'
pattern_match_record_xpath: true
slurp: false
yield_xml_record: false
columns:
...
Column mappings:
The column
should be the data item node name.
Outside of the normal column mappings (rawtext_name etc), columns will also define xml_cell
. This is a hash containing configuration that ndr_import uses to find the data, identify if the data item is repeating, and then act accordingly.
xml_cell
contains:
relative_path
- this is the relative path from thexml_record_xpath
to the data itemattribute
- this is the attribute (if present) e.g. extension, codemultiple
- does this data item appear more than once within a klass? If set to true, additional column mappings will be added with a_1
,_2
etc suffix on the rawtext_nameincrement_field_name
- similar tomultiple
above, but adds a suffix to each mapped field when set to true. This should only be set to true if the column has mapped fieldsbuild_new_record
- this is only needed where column level klasses are defined. This should be set to false if you do not want an additional klass added to the masked mappings. This might be where you have a repeating item within a klass, but you only expect one instance of that klass within a “record”, as defined byxml_record_xpath
klass_section
- this is the relative path fromxml_record_xpath
to the section that would trigger a new klass. If there is a data item flagged asmultiple
, the number of timesklass_section
appears determines if the klass should have a#1
,#2
etc suffix.
Column examples:
<root>
<records>
<record_1>
<section_1>
<part_1>
<repeating_item code=value />
<repeating_item code=value />
<another_data_item>Another value</another_data_item>
</part_1>
</section_1>
</record_1>
</records>
</root>
The below examples assume we’re using the above NdrImport::XmlTable
mapping
Example mapping for another_data_item
which is a single, non-repeating data item:
- column: repeating_item
klass: SomeTestKlass
rawtext_name: blah
xml_cell:
id:
relative_path: section_1/part_1
attribute:
Example mapping for repeating_item
:
- column: repeating_item
klass: SomeTestKlass
rawtext_name: blah
xml_cell:
id:
relative_path: section_1/part_1
attribute: code
multiple: true
increment_field_name: true
build_new_record: false
If the repeating_item
was expected to be in many klasses - where the section_1
section triggered a new klass - the mapping would look like:
- column: repeating_item
klass: SomeTestKlass
rawtext_name: blah
xml_cell:
id:
relative_path: section_1/part_1
attribute: code
multiple: true
increment_field_name: true
klass_section: section_1
Cheat sheet
Scenario | multiple | increment_field_name | build_new_record | klass_section |
---|---|---|---|---|
Single data item, non repeating | false or omit key | false or omit key | false or omit key | omit key |
Repeating data item, single klass expected | true | true (if mapped fields present) | false | omit key |
Repeating data item, one or more klasses expected | true | true (if mapped fields present) | omit key | relative path to section |