mygenomics.cloud - Genomics & ADAM Format — Digging Deeper









Search Preview

Genomics & ADAM Format — Digging Deeper – MyGenomics

mygenomics.cloud
Skip to content MyGenomics Cloud Scale Genomic Analytics
.cloud > mygenomics.cloud

SEO audit: Content analysis

Language Error! No language localisation is found.
Title Genomics & ADAM Format — Digging Deeper – MyGenomics
Text / HTML ratio 3 %
Frame Excellent! The website does not use iFrame solutions.
Flash Excellent! The website does not have any flash contents.
Keywords cloud DO0 GZIP VC33272572 OPTIONAL optional R0 ENCRLEBIT_PACKEDPLAIN SZ5636064 BINARY INT32 D2 FLOAT array D1 binary group OUTF8 UTF8 BOOLEAN int32
Keywords consistency
Keyword Content Title Description Headings
DO0 47
GZIP 47
VC33272572 46
OPTIONAL 41
optional 41
R0 38
Headings
H1 H2 H3 H4 H5 H6
2 2 2 0 0 0
Images We found 1 images on this web page.

SEO Keywords (Single)

Keyword Occurrence Density
DO0 47 2.35 %
GZIP 47 2.35 %
VC33272572 46 2.30 %
OPTIONAL 41 2.05 %
optional 41 2.05 %
R0 38 1.90 %
ENCRLEBIT_PACKEDPLAIN 32 1.60 %
SZ5636064 28 1.40 %
BINARY 28 1.40 %
INT32 24 1.20 %
D2 22 1.10 %
FLOAT 22 1.10 %
array 21 1.05 %
D1 20 1.00 %
binary 14 0.70 %
group 13 0.65 %
OUTF8 12 0.60 %
UTF8 12 0.60 %
BOOLEAN 12 0.60 %
int32 12 0.60 %

SEO Keywords (Two Word)

Keyword Occurrence Density
GZIP DO0 47 2.35 %
VC33272572 ENCRLEBIT_PACKEDPLAIN 32 1.60 %
SZ5636064 VC33272572 28 1.40 %
R0 D2 17 0.85 %
R0 D1 16 0.80 %
BINARY GZIP 14 0.70 %
BINARY OUTF8 12 0.60 %
INT32 GZIP 12 0.60 %
OPTIONAL INT32 11 0.55 %
INT32 R0 11 0.55 %
optional int32 11 0.55 %
FLOAT GZIP 11 0.55 %
OPTIONAL BINARY 10 0.50 %
optional binary 10 0.50 %
OUTF8 R0 9 0.45 %
UTF8 optional 9 0.45 %
REQUIRED F1 8 0.40 %
required group 8 0.40 %
VC33272572 ENCRLEPLAIN 8 0.40 %
array REPEATED 7 0.35 %

SEO Keywords (Three Word)

Keyword Occurrence Density Possible Spam
SZ5636064 VC33272572 ENCRLEBIT_PACKEDPLAIN 28 1.40 % No
BINARY GZIP DO0 14 0.70 % No
INT32 GZIP DO0 12 0.60 % No
FLOAT GZIP DO0 11 0.55 % No
OPTIONAL INT32 R0 11 0.55 % No
OPTIONAL BINARY OUTF8 9 0.45 % No
BINARY OUTF8 R0 9 0.45 % No
F1 array REPEATED 7 0.35 % No
REQUIRED F1 array 7 0.35 % No
INT32 R0 D1 7 0.35 % No
OPTIONAL FLOAT R0 7 0.35 % No
FLOAT R0 D2 6 0.30 % No
OPTIONAL BOOLEAN R0 6 0.30 % No
BOOLEAN GZIP DO0 6 0.30 % No
SZ6045075 VC33272572 ENCRLEPLAIN 5 0.25 % No
OUTF8 R0 D2 4 0.20 % No
OPTIONAL INT64 R0 4 0.20 % No
REPEATED FLOAT R1 4 0.20 % No
array REPEATED FLOAT 4 0.20 % No
array FLOAT GZIP 4 0.20 % No

SEO Keywords (Four Word)

Keyword Occurrence Density Possible Spam
OPTIONAL BINARY OUTF8 R0 9 0.45 % No
OPTIONAL INT32 R0 D1 7 0.35 % No
REQUIRED F1 array REPEATED 7 0.35 % No
OPTIONAL FLOAT R0 D2 6 0.30 % No
LIST repeated float array 4 0.20 % No
BINARY OUTF8 R0 D1 4 0.20 % No
array REPEATED FLOAT R1 4 0.20 % No
BINARY OUTF8 R0 D2 4 0.20 % No
F1 array REPEATED FLOAT 4 0.20 % No
array FLOAT GZIP DO0 4 0.20 % No
BINARY OUTF8 R1 D2 3 0.15 % No
repeated float array required 3 0.15 % No
OPTIONAL BOOLEAN R0 D2 3 0.15 % No
float array required group 3 0.15 % No
from 1000 Genome Project 3 0.15 % No
its equivalent representation in 2 0.10 % No
LIST repeated binary array 2 0.10 % No
ADAM to process VCF 2 0.10 % No
to process VCF Files 2 0.10 % No
VCF Format to its 2 0.10 % No

Internal links in - mygenomics.cloud

Technology
Technology – MyGenomics
Blog
Blog – MyGenomics
Contact
Contact – MyGenomics
About Priyanka Dangi
About Priyanka Dangi – MyGenomics
Clustering of Genotype Information from 1000 Genome Project using k-means||, ADAM and Spark MLLib
Clustering of Genotype Information from 1000 Genome Project using k-means||, ADAM and Spark MLLib – MyGenomics
Cluster Analysis — (Data points | Birds) of (classes | feathers) flock together….
Cluster Analysis — (Data points | Birds) of (classes | feathers) flock together…. – MyGenomics
Comparing VCF Format to its equivalent representation in ADAM/Avro for a Triallelic site
Comparing VCF Format to its equivalent representation in ADAM/Avro for a Triallelic site – MyGenomics
Genomics & ADAM Format — Digging Deeper
Genomics & ADAM Format — Digging Deeper – MyGenomics
Applying ADAM to process VCF Files from 1000 Genome Project
Applying ADAM to process VCF Files from 1000 Genome Project – MyGenomics

Mygenomics.cloud Spined HTML


Genomics & ADAM Format — Digging Deeper – MyGenomics Skip to content MyGenomics Cloud Scale Genomic Analytics Home Technology Blog ContactWell-nighPriyanka Dangi Genomics & ADAM Format — Digging Deeper Posted on May 23, 2016December 13, 2016 pdangiPosted in ADAM ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing is the original reference from UC Berkeley well-nigh ADAM implementation. ADAM files are 25% smaller on disk than compressed BAM files. ADAM is implemented on top of Avro and Parquet. In 2013, Cloudera and Twitter engineers teamed-up to open-source a new columnar storage format for Hadoop, tabbed Parquet. Since then, Parquet has gotten very good adoption due to space and query efficiency. What's up with the name -- Parquet par·quet pärˈkā/ noun noun: parquet; noun: parquet flooring; plural noun: parquet floorings; plural noun: parquets; noun: Parquet; noun: the Parquet 1. flooring well-balanced of wooden blocks serried in a geometric pattern. 2. North American the ground floor of a theater or auditorium. Parquet is a self-describing data storage format. In these self-describing storage formats, the data well-nigh the data (schema) is embedded in the data itself. The schema is comprised of metadata such as element names their types, compression/encoding scheme used (if any), statistics, and a lot more. Parquet is designed ground-up with ramified nested data structures in mind. For increasingly information on the repetition/definition level tideway to encoding such data structures, please refer to Dremel: Interactive Analysis of Web-Scale Datasets. Parquet is designed to support very efficient pinch and encoding schemes. Parquet allows pinch schemes to be specified on a per-column level. The initial lawmaking defines the file format, provides Java towers blocks for processing columnar data, and implements Hadoop Input/Output Formats, Pig Storers/Loaders, and an example of a ramified integration — Input/Output formats that can convert Parquet-stored data directly to and from Thrift objects. For increasingly information, please trammels the Apache site for Parquet I inspected the files generated by vcf2adam using parquet-tools. Here are the outputs: 1) Schema message org.bdgenomics.formats.avro.Genotype { optional group variant { optional int32 variantErrorProbability; optional binary contigName (UTF8); optional int64 start; optional int64 end; optional binary referenceAllele (UTF8); optional binary alternateAllele (UTF8); optional group svAllele { optional binary type (ENUM); optional binary turnout (UTF8); optional boolean precise; optional int32 startWindow; optional int32 endWindow; } optional boolean isSomatic; } optional binary contigName (UTF8); optional int64 start; optional int64 end; optional group variantCallingAnnotations { optional boolean variantIsPassing; required group variantFilters (LIST) { repeated binary variety (UTF8); } optional boolean downsampled; optional bladder baseQRankSum; optional bladder fisherStrandBiasPValue; optional bladder rmsMapQ; optional int32 mapq0Reads; optional bladder mqRankSum; optional bladder readPositionRankSum; required group genotypePriors (LIST) { repeated bladder array; } required group genotypePosteriors (LIST) { repeated bladder array; } optional bladder vqslod; optional binary culprit (UTF8); required group nature (MAP) { repeated group map (MAP_KEY_VALUE) { required binary key (UTF8); required binary value (UTF8); } } } optional binary sampleId (UTF8); optional binary sampleDescription (UTF8); optional binary processingDescription (UTF8); required group alleles (LIST) { repeated binary variety (ENUM); } optional bladder expectedAlleleDosage; optional int32 referenceReadDepth; optional int32 alternateReadDepth; optional int32 readDepth; optional int32 minReadDepth; optional int32 genotypeQuality; required group genotypeLikelihoods (LIST) { repeated bladder array; } required group nonReferenceLikelihoods (LIST) { repeated bladder array; } required group strandBiasComponents (LIST) { repeated int32 array; } optional boolean splitFromMultiAllelic; optional boolean isPhased; optional int32 phaseSetId; optional int32 phaseQuality; } META Information creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) extra: parquet.avro.schema = {"type":"record","name":"Genotype","namespace":"org.bdgenomics.formats.avro","fields":[{"name":"variant","type":["null",{"type":"record","name":"Variant","fields":[{"name":"variantErrorProbability","type":["null","int"],"doc":"The Phred scaled error probability of a variant, given the probabilities of\n the variant in a population.","default":null},{"name":"contigName","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"The reference contig that this variant exists on.","default":null},{"name":"start","type":["null","long"],"doc":"The 0-based start position of this variant on the reference contig.","default":null},{"name":"end","type":["null","long"],"doc":"The 0-based, sectional end position of this variant on the reference contig.","default":null},{"name":"referenceAllele","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"A string describing the reference allele at this site.","default":null},{"name":"alternateAllele","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"A string describing the variant allele at this site. Should be left null if\n the site is a structural variant.","default":null},{"name":"svAllele","type":["null",{"type":"record","name":"StructuralVariant","fields":[{"name":"type","type":["null",{"type":"enum","name":"StructuralVariantType","doc":"Descriptors for the type of a structural variant. The most specific descriptor\n should be used, if possible. E.g., duplication should be used instead of\n insertion if the inserted sequence is not novel. Tandem duplication should\n be used instead of duplication if the duplication is known to follow the\n duplicated sequence.","symbols":["DELETION","INSERTION","INVERSION","MOBILE_INSERTION","MOBILE_DELETION","DUPLICATION","TANDEM_DUPLICATION"]}],"doc":"The type of this structural variant.","default":null},{"name":"assembly","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"The URL of the FASTA/NucleotideContig turnout for this structural variant,\n if one is available.","default":null},{"name":"precise","type":["boolean","null"],"doc":"Whether this structural variant undeniability has precise breakpoints or not. Default\n value is true. If the undeniability is imprecise, conviction intervals should be provided.","default":true},{"name":"startWindow","type":["null","int"],"doc":"The size of the conviction window virtually the start of the structural variant.","default":null},{"name":"endWindow","type":["null","int"],"doc":"The size of the conviction window virtually the end of the structural variant.","default":null}]}],"doc":"The structural variant at this site, if the unorganized allele is a structural\n variant. If the site is not a structural variant, this field should be left\n null.","default":null},{"name":"isSomatic","type":["boolean","null"],"doc":"A boolean describing whether this variant undeniability is somatic; in this case, the\n `referenceAllele` will have been observed in flipside sample.","default":false}]}],"doc":"The variant tabbed at this site.","default":null},{"name":"contigName","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"The reference contig that this genotype's variant exists on.","default":null},{"name":"start","type":["null","long"],"doc":"The 0-based start position of this genotype's variant on the reference contig.","default":null},{"name":"end","type":["null","long"],"doc":"The 0-based, sectional end position of this genotype's variant on the reference contig.","default":null},{"name":"variantCallingAnnotations","type":["null",{"type":"record","name":"VariantCallingAnnotations","fields":[{"name":"variantIsPassing","type":["null","boolean"],"default":null},{"name":"variantFilters","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}},"default":[]},{"name":"downsampled","type":["null","boolean"],"doc":"True if the reads tent this site were randomly downsampled to reduce coverage.","default":null},{"name":"baseQRankSum","type":["null","float"],"doc":"The Wilcoxon rank-sum test statistic of the wiring quality scores. The wiring quality\n scores are separated by whether or not the wiring supports the reference or the\n unorganized allele.","default":null},{"name":"fisherStrandBiasPValue","type":["null","float"],"doc":"The Fisher's word-for-word test score for the strand bias of the reference and alternate\n alleles. Stored as a phred scaled probability. Thus, if:\n\n * a = The number of positive strand reads tent the reference allele\n * b = The number of positive strand reads tent the unorganized allele\n * c = The number of negative strand reads tent the reference allele\n * d = The number of negative strand reads tent the unorganized allele\n\n This value takes the score:\n \n -10 log((a + b)! * (c + d)! * (a + c)! * (b + d)! / (a! b! c! d! n!)\n\n Where n = a + b + c + d.","default":null},{"name":"rmsMapQ","type":["null","float"],"doc":"The root midpoint square of the mapping qualities of reads tent this site.","default":null},{"name":"mapq0Reads","type":["null","int"],"doc":"The number of reads at this site with mapping quality equal to 0.","default":null},{"name":"mqRankSum","type":["null","float"],"doc":"The Wilcoxon rank-sum test statistic of the mapping quality scores. The mapping\n quality scores are separated by whether or not the read supported the reference or the\n unorganized allele.","default":null},{"name":"readPositionRankSum","type":["null","float"],"doc":"The Wilcoxon rank-sum test statistic of the position of the wiring in the read at this site.\n The positions are separated by whether or not the wiring supports the reference or the\n unorganized allele.","default":null},{"name":"genotypePriors","type":{"type":"array","items":"float"},"doc":"The log scale prior probabilities of the various genotype states at this site.\n The number of elements in this variety should be equal to the ploidy at this\n site, plus 1.","default":[]},{"name":"genotypePosteriors","type":{"type":"array","items":"float"},"doc":"The log scaled posterior probabilities of the various genotype states at this site,\n in this sample. The number of elements in this variety should be equal to the ploidy at\n this site, plus 1.","default":[]},{"name":"vqslod","type":["null","float"],"doc":"The log-odds ratio of stuff a true vs. false variant under a trained statistical model.\n This model can be a multivariate Gaussian mixture, support vector machine, etc.","default":null},{"name":"culprit","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"If known, the full-length that unsalaried the most to this variant stuff classified as\n a false variant.","default":null},{"name":"attributes","type":{"type":"map","values":{"type":"string","avro.java.string":"String"},"avro.java.string":"String"},"doc":"Additional full-length info that doesn't fit into the standard fields above.\n\n They are all encoded as (string, string) key-value pairs.","default":{}}]}],"doc":"Statistics placid at this site, if available.","default":null},{"name":"sampleId","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"The unique identifier for this sample.","default":null},{"name":"sampleDescription","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"A unravelment of this sample.","default":null},{"name":"processingDescription","type":["null",{"type":"string","avro.java.string":"String"}],"doc":"A string describing the provenance of this sample and the processing applied\n in genotyping this sample.","default":null},{"name":"alleles","type":{"type":"array","items":{"type":"enum","name":"GenotypeAllele","doc":"An enumeration that describes the allele that corresponds to a genotype. Can take\n the pursuit values:\n\n * Ref: The genotype is the reference allele\n * Alt: The genotype is the unorganized allele\n * OtherAlt: The genotype is an unspecified other unorganized allele. This occurs\n in our schema when we have split a multi-allelic genotype into two genotype\n records.\n * NoCall: The genotype could not be called.","symbols":["Ref","Alt","OtherAlt","NoCall"]}},"doc":"An variety describing the genotype tabbed at this site. The length of this\n variety is equal to the ploidy of the sample at this site. This variety may\n reference OtherAlt alleles if this site is multi-allelic in this sample.","default":[]},{"name":"expectedAlleleDosage","type":["null","float"],"doc":"The expected spoonful of the unorganized allele in this sample.","default":null},{"name":"referenceReadDepth","type":["null","int"],"doc":"The number of reads that show vestige for the reference at this site.\n\n @see alternateReadDepth\n @see readDepth","default":null},{"name":"alternateReadDepth","type":["null","int"],"doc":"The number of reads that show vestige for this unorganized allele at this site.\n\n @see referenceReadDepth\n @see readDepth","default":null},{"name":"readDepth","type":["null","int"],"doc":"The total number of reads at this site. May not equal (alternateReadDepth +\n referenceReadDepth) if this site shows vestige of multiple unorganized alleles.\n\n @see referenceReadDepth\n @see alternateReadDepth\n\n @note Analogous to VCF's DP.","default":null},{"name":"minReadDepth","type":["null","int"],"doc":"The minimum number of reads seen at this site wideness samples when joint\n calling variants.\n\n @note Analogous to VCF's MIN_DP.","default":null},{"name":"genotypeQuality","type":["null","int"],"doc":"The phred-scaled probability that we're correct for this genotype call.\n\n @note Analogous to VCF's GQ.","default":null},{"name":"genotypeLikelihoods","type":{"type":"array","items":"float"},"doc":"Log scaled likelihoods that we have n copies of this unorganized allele.\n The number of elements in this variety should be equal to the ploidy at this\n site, plus 1.\n\n @note Analogous to VCF's PL.","default":[]},{"name":"nonReferenceLikelihoods","type":{"type":"array","items":"float"},"doc":"Log scaled likelihoods that we have n non-reference alleles at this site.\n The number of elements in this variety should be equal to the ploidy at this\n site, plus 1.","default":[]},{"name":"strandBiasComponents","type":{"type":"array","items":"int"},"doc":"Component statistics which subsume the Fisher'sWord-for-wordTest to snift strand bias.\n If populated, this element should have length 4.","default":[]},{"name":"splitFromMultiAllelic","type":["boolean","null"],"doc":"We split multi-allelic VCF lines into multiple\n single-alternate records. This bit is set if that happened for this\n record.","default":false},{"name":"isPhased","type":["boolean","null"],"doc":"True if this genotype is phased.\n\n @see phaseSetId\n @see phaseQuality","default":false},{"name":"phaseSetId","type":["null","int"],"doc":"The ID of this phase set, if this genotype is phased. Should only be populated\n if isPhased == true; else should be null.\n\n @see isPhased","default":null},{"name":"phaseQuality","type":["null","int"],"doc":"Phred scaled quality score for the phasing of this genotype, if this genotype\n is phased. Should only be populated if isPhased == true; else should be null.\n\n @see isPhased","default":null}]} file schema: org.bdgenomics.formats.avro.Genotype ---------------------------------------------------------------------------------------------------- variant: OPTIONAL F:8 .variantErrorProbability: OPTIONAL INT32 R:0 D:2 .contigName: OPTIONAL BINARY O:UTF8 R:0 D:2 .start: OPTIONAL INT64 R:0 D:2 .end: OPTIONAL INT64 R:0 D:2 .referenceAllele: OPTIONAL BINARY O:UTF8 R:0 D:2 .alternateAllele: OPTIONAL BINARY O:UTF8 R:0 D:2 .svAllele: OPTIONAL F:5 ..type: OPTIONAL BINARY O:ENUM R:0 D:3 ..assembly: OPTIONAL BINARY O:UTF8 R:0 D:3 ..precise: OPTIONAL BOOLEAN R:0 D:3 ..startWindow: OPTIONAL INT32 R:0 D:3 ..endWindow: OPTIONAL INT32 R:0 D:3 .isSomatic: OPTIONAL BOOLEAN R:0 D:2 contigName: OPTIONAL BINARY O:UTF8 R:0 D:1 start: OPTIONAL INT64 R:0 D:1 end: OPTIONAL INT64 R:0 D:1 variantCallingAnnotations: OPTIONAL F:14 .variantIsPassing: OPTIONAL BOOLEAN R:0 D:2 .variantFilters: REQUIRED F:1 ..array: REPEATED BINARY O:UTF8 R:1 D:2 .downsampled: OPTIONAL BOOLEAN R:0 D:2 .baseQRankSum: OPTIONAL FLOAT R:0 D:2 .fisherStrandBiasPValue: OPTIONAL FLOAT R:0 D:2 .rmsMapQ: OPTIONAL FLOAT R:0 D:2 .mapq0Reads: OPTIONAL INT32 R:0 D:2 .mqRankSum: OPTIONAL FLOAT R:0 D:2 .readPositionRankSum: OPTIONAL FLOAT R:0 D:2 .genotypePriors: REQUIRED F:1 ..array: REPEATED FLOAT R:1 D:2 .genotypePosteriors: REQUIRED F:1 ..array: REPEATED FLOAT R:1 D:2 .vqslod: OPTIONAL FLOAT R:0 D:2 .culprit: OPTIONAL BINARY O:UTF8 R:0 D:2 .attributes: REQUIRED F:1 ..map: REPEATED F:2 ...key: REQUIRED BINARY O:UTF8 R:1 D:2 ...value: REQUIRED BINARY O:UTF8 R:1 D:2 sampleId: OPTIONAL BINARY O:UTF8 R:0 D:1 sampleDescription: OPTIONAL BINARY O:UTF8 R:0 D:1 processingDescription: OPTIONAL BINARY O:UTF8 R:0 D:1 alleles: REQUIRED F:1 .array: REPEATED BINARY O:ENUM R:1 D:1 expectedAlleleDosage: OPTIONAL FLOAT R:0 D:1 referenceReadDepth: OPTIONAL INT32 R:0 D:1 alternateReadDepth: OPTIONAL INT32 R:0 D:1 readDepth: OPTIONAL INT32 R:0 D:1 minReadDepth: OPTIONAL INT32 R:0 D:1 genotypeQuality: OPTIONAL INT32 R:0 D:1 genotypeLikelihoods: REQUIRED F:1 .array: REPEATED FLOAT R:1 D:1 nonReferenceLikelihoods: REQUIRED F:1 .array: REPEATED FLOAT R:1 D:1 strandBiasComponents: REQUIRED F:1 .array: REPEATED INT32 R:1 D:1 splitFromMultiAllelic: OPTIONAL BOOLEAN R:0 D:1 isPhased: OPTIONAL BOOLEAN R:0 D:1 phaseSetId: OPTIONAL INT32 R:0 D:1 phaseQuality: OPTIONAL INT32 R:0 D:1 row group 1: RC:33272572 TS:85910781 ---------------------------------------------------------------------------------------------------- variant: .variantErrorProbability: INT32 GZIP DO:0 FPO:4 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .contigName: BINARY GZIP DO:0 FPO:60 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .start: INT64 GZIP DO:0 FPO:116 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .end: INT64 GZIP DO:0 FPO:172 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .referenceAllele: BINARY GZIP DO:0 FPO:228 SZ:23647/38902/1.65 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY .alternateAllele: BINARY GZIP DO:0 FPO:23875 SZ:23628/39106/1.66 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY .svAllele: ..type: BINARY GZIP DO:0 FPO:47503 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN ..assembly: BINARY GZIP DO:0 FPO:47559 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN ..precise: BOOLEAN GZIP DO:0 FPO:47615 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN ..startWindow: INT32 GZIP DO:0 FPO:47671 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN ..endWindow: INT32 GZIP DO:0 FPO:47727 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .isSomatic: BOOLEAN GZIP DO:0 FPO:47783 SZ:4343/4159246/957.69 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN contigName: BINARY GZIP DO:0 FPO:52126 SZ:9418/6537/0.69 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY start: INT64 GZIP DO:0 FPO:61544 SZ:82613/186755/2.26 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY end: INT64 GZIP DO:0 FPO:144157 SZ:82610/186755/2.26 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY variantCallingAnnotations: .variantIsPassing: BOOLEAN GZIP DO:0 FPO:226767 SZ:4351/4159246/955.93 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .variantFilters: ..array: BINARY GZIP DO:0 FPO:231118 SZ:60/45/0.75 VC:33272572 ENC:RLE,PLAIN .downsampled: BOOLEAN GZIP DO:0 FPO:231178 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .baseQRankSum: FLOAT GZIP DO:0 FPO:231234 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .fisherStrandBiasPValue: FLOAT GZIP DO:0 FPO:231290 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .rmsMapQ: FLOAT GZIP DO:0 FPO:231346 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .mapq0Reads: INT32 GZIP DO:0 FPO:231402 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .mqRankSum: FLOAT GZIP DO:0 FPO:231458 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .readPositionRankSum: FLOAT GZIP DO:0 FPO:231514 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .genotypePriors: ..array: FLOAT GZIP DO:0 FPO:231570 SZ:60/45/0.75 VC:33272572 ENC:RLE,PLAIN .genotypePosteriors: ..array: FLOAT GZIP DO:0 FPO:231630 SZ:60/45/0.75 VC:33272572 ENC:RLE,PLAIN .vqslod: FLOAT GZIP DO:0 FPO:231690 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .culprit: BINARY GZIP DO:0 FPO:231746 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN .attributes: ..map: ...key: BINARY GZIP DO:0 FPO:231802 SZ:60/45/0.75 VC:33272572 ENC:RLE,PLAIN ...value: BINARY GZIP DO:0 FPO:231862 SZ:60/45/0.75 VC:33272572 ENC:RLE,PLAIN sampleId: BINARY GZIP DO:0 FPO:231922 SZ:1861645/50020288/26.87 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY sampleDescription: BINARY GZIP DO:0 FPO:2093567 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN processingDescription: BINARY GZIP DO:0 FPO:2093623 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN alleles: .array: BINARY GZIP DO:0 FPO:2093679 SZ:4776200/18794086/3.93 VC:66545144 ENC:RLE,PLAIN_DICTIONARY expectedAlleleDosage: FLOAT GZIP DO:0 FPO:6869879 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN referenceReadDepth: INT32 GZIP DO:0 FPO:6869935 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN alternateReadDepth: INT32 GZIP DO:0 FPO:6869991 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN readDepth: INT32 GZIP DO:0 FPO:6870047 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN minReadDepth: INT32 GZIP DO:0 FPO:6870103 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN genotypeQuality: INT32 GZIP DO:0 FPO:6870159 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN genotypeLikelihoods: .array: FLOAT GZIP DO:0 FPO:6870215 SZ:59/45/0.76 VC:33272572 ENC:RLE,PLAIN nonReferenceLikelihoods: .array: FLOAT GZIP DO:0 FPO:6870274 SZ:59/45/0.76 VC:33272572 ENC:RLE,PLAIN strandBiasComponents: .array: INT32 GZIP DO:0 FPO:6870333 SZ:59/45/0.76 VC:33272572 ENC:RLE,PLAIN splitFromMultiAllelic: BOOLEAN GZIP DO:0 FPO:6870392 SZ:4343/4159246/957.69 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN isPhased: BOOLEAN GZIP DO:0 FPO:6874735 SZ:4343/4159246/957.69 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN phaseSetId: INT32 GZIP DO:0 FPO:6879078 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN phaseQuality: INT32 GZIP DO:0 FPO:6879134 SZ:56/36/0.64 VC:33272572 ENC:RLE,BIT_PACKED,PLAIN ADAM, VCF Post navigation Applying ADAM to process VCF Files from 1000 Genome ProjectComparing VCF Format to its equivalent representation in ADAM/Avro for a Triallelic site Leave a Reply Cancel reply Your email write will not be published. Required fields are marked *Comment Name * Email * Website Search for: Recent Posts Clustering of Genotype Information from 1000 Genome Project using k-means||, ADAM and Spark MLLib Cluster Analysis — (Data points | Birds) of (classes | feathers) flock together…. Comparing VCF Format to its equivalent representation in ADAM/Avro for a Triallelic site Genomics & ADAM Format — Digging Deeper Applying ADAM to process VCF Files from 1000 Genome Project