VCF File Metadata
Introduction
VCF files contain a header storing metadata, NdrImport::Vcf::Table
now supports retrieval and storage of that data.
vcf_file_metadata
NdrImport::Vcf::Table
can optionally storevcf_file_metadata
. This is a hash of { attribute name => regular expression }.- The
NdrImport::File::Vcf
handler usesvcf_file_metadata
to locate the metadata from within the file, then sets thefile_metadata
attribute as a hash of { attribute name => regular expression first captured group }. - The
UniversalImporterHelper
then assigns the handler.file_metadata to theNdrImport::Table
attributetable_metadata
, which can then be accessed downstream.
Example:
Given the below example data:
##contig=<ID=GL000194.1,length=191469>
##contig=<ID=GL000225.1,length=211173>
##contig=<ID=GL000192.1,length=547496>
##contig=<ID=NC_007605,length=171823>
##contig=<ID=hs37d5,length=35477943>
##fileDate=2023-03-29
##reference=file:///data/humanGenome/hs37d5.fa
##source=Platypus_Version_0.8.1
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
1 26387783 . G A 847.77 PASS AC=1;AF=0.500;AN=2;DP=85;set=Intersection GT:AD:DP:GQ:PL:SAC 0/1:52,32:84:99:876,0,1277:21,31,14,18
The NdrImport::Vcf::Table
mapping might look like:
- !ruby/object:NdrImport::Vcf::Table
filename_pattern: !ruby/regexp //
vcf_file_metadata:
genome_build: /##reference=file:///data/humanGenome\/(.+)\z/
columns:
...
This would result in a table_metadata
value of:
{ genome_build: 'hs37d5.fa' }