When is vntr used




















Such loci are termed VNTRs. The total number of base pairs at this locus could vary from to In this diagram, only three different variants alleles are illustrated for the VNTR locus, but 50 or more different alleles are often found at human VNTR loci. Analysis of a VNTR locus by Southern hybridization most commonly results in a two-band pattern, comprised of a band inherited from each parent.

A one-band pattern can occur if the size of the two parental bands are the same or nearly the same. For our simple example of three different alleles designated A, B, and C illustrated above, six unique DNA profiles are possible. Coloring of symbols shows that population also had a strong effect, reflecting distance from the reference, which is primarily European.

C Alleles detected per locus. Each bar represents a specific number of alleles detected across all datasets. Coloring shows that proportion of loci where the reference allele was or was not observed.

D Copies gained or lost. Each bar represents a specific number of copies gained or lost in non-reference VNTR alleles relative to the reference allele. Loss was always more frequently encountered. E VNTR locus sample support. Data shown are the common loci from the sample NYGC dataset. Each bar represents the number of samples calling a locus as a VNTR. Bin size is Bar height is number of loci with that sample support.

However, detection was also positively correlated with population, which seemed likely due to the evolutionary distance of populations from the reference genome, which is primarily European Notably, within each trio, the VNTR counts were similar. This was because in these genomes, which consist of two copies of an underlying haploid genome, the single allele represented at any VNTR locus would frequently be a reference allele and so the locus would not be called as a VNTR. Overall, VNTRseek found approximately 1.

Although there were allele lengths that VNTRseek could not detect, this bias persisted even when restricting the loci to only those where gain and loss could both be observed Supplementary Figure S5. The overabundance of VNTR copy loss may actually be an underestimate. By contrast, the reference locus needed to have a minimum of 2. Higher observed copy loss could be explained by a bias in the reference genome towards including higher copy number repeats , or by an overall mutational preference for copy loss.

High heterozygosity in human populations suggests higher genetic variability and may have beneficial effects on a range of traits associated with human health and disease Since calculating heterozygosity for VNTRs is not straightforward because of limitations on discovering alleles, especially within shorter reads , we used the percentage of detected, per-sample heterozygous VNTRs as an estimate for heterozygosity. Interestingly, despite the previous comment, within genomes that were comparable in read length and coverage, the fraction of heterozygous loci clustered within populations Supplementary Figures S2—S4 , with African genomes generally having more heterozygous calls and East Asians fewer.

This result is consistent with previous findings of population differences in SNP heterozygosity among Yoruban and Ashkenazi Jewish individuals with respect to European individuals , , and suggests higher genomic diversity among African genomes, as has been previously noted Extreme loss of heterozygosity in small variants has previously been reported in these samples by Illumina Basespace with the number of heterozygous small variants in HC being four times lower in the tumor tissue compared to the normal.

Additionally, in both tumors a large number of loci exhibited loss of both alleles in comparison to the normal tissue Supplementary Table S2. Given that the coverage for the tumor samples was significantly higher than for the normal tissue, it is unlikely that these observations were due to artifacts.

Also, the tumor samples did not show a higher percentage of filtered multi VNTRs too many alleles than the normal samples 1.

Knowledge of gene associations with somatic tumor mutations VNTR alleles present in a tumor, but not normal tissue could be useful as indicators of cancer prognosis and for therapy. TRIM24 has been associated with prognosis in breast cancer — and over-expression of DUSP4 has been shown to improve the outcome of chemotherapy and overall survival , A total of common VNTRs overlapped with protein coding genes including exons.

Interestingly, increasing the threshold for common VNTRs did not reduce the number dramatically Supplementary Figure S6 , suggesting that these VNTRs have not occurred randomly, but rather have undergone natural selection. Our reference TR set comprised only 0. These results suggest that VNTR alleles may affect gene regulation in multiple tissues. To detect association between VNTR genotypes and expression of nearby genes, we paired VNTRs to any gene within 10 kb and after removing genes with low expression and controlling for confounders, applied a one-way ANOVA test to determine if there was a significant difference between the average gene expression levels for the VNTR genotypes.

Three of the top genes are shown in Figure 2. METTL23 is known to function as a regulator in the transcriptional pathway for human cognition and has been associated with mental retardation and intellectual disability JMJD6 is associated with pancreatitis and tumorgenesis , However, individuals had genotypes outside of our detection range which likely represented longer expansions and these individuals showed higher expression of this gene.

More examples are given in Supplementary Figures S26—S Gene expression differences and VNTR genotype. Shown are violin plots of gene expression values log 2 normalized TPM for three genes which displayed significant differential expression when samples were partitioned by VNTR allele genotype. Additional examples are shown in Supplementary Figures S26—S Genotype is indicated in labels on the X-axis and numbers refer to copies gained or lost relative to the reference allele.

Number of samples in each partition is shown in parenthesis. In these examples, the effect size for at least one genotype class was significant. We further investigated whether VNTR alleles are population-specific and whether they can be used to predict ancestry. Understanding the occurrence of population-specific VNTR alleles will be useful when controlling for population effects in GWAS, and more generally in interpreting gene expression differences among people of different ancestry.

We found that the first, second, fourth, and fifth principal components PCs separated the super populations as shown in Figure 3. Each PC captured a small fraction of the variation in the dataset, suggesting that there was substantial variation between individuals from the same population.

PCA was performed to reduce the dimensions of the data. PC3 not shown captured batch effects due to differences in coverage. Some American sub-populations proved hardest to separate, likely due to ancestry mixing. The first PC separated Africans, suggesting furthest evolutionary distance. The second PC separated East Asians. The third PC captured coverage bias. The American population had a sub-population of Puerto Ricans that clustered with the Iberian Spanish population, suggesting mixed ancestry These loci overlapped with genes and 51 coding exons.

Africans had the highest number of population-specific alleles , followed by East Asians 65 , while Americans had the lowest 13 , suggesting more mixed ancestry. We observed 63 loci that had a population-specific allele in each population. Each dot represents an allele in one sample.

Samples are separated vertically by super-population. Dots are jiggered in a rectangular area to reduce overlap. Population-specific alleles show up as bands over-represented in one population.

Note that the allele bias towards pattern copy loss relative to the reference allele is apparent and that at one locus second from left the reference allele was the population-specific allele since almost no reference alleles were observed in the four other populations.

The details of these seven loci are given in Supplementary Table S To show the reliability of our results, we experimentally validated VNTR predictions at 13 loci in the three related AJ genomes, and also compared VNTRseek predictions to alleles experimentally validated in the literature. We additionally used accurate long reads on one genome HG to find evidence of the predicted alleles.

Separately, we showed the consistency of our predictions in two ways: first, we looked at inheritance consistency among four trios mother, father, child , and second, we compared result for genomes sequenced on two different platforms. In the remaining case, two predicted alleles were separated by only 15 nucleotides and could not be distinguished.

At two loci, other bands were also observed. In one, all three family members contained an allele outside the detectable range of VNTRseek longer than the reads. In the other, one allele that was detectable was missed in two family members see Table 3 for a summary of results, Supplementary Table S6 for details of the experiment, and Supplementary Figures S13—S17 for gel images.

All but one of the 66 bands predicted by VNTRseek were validated. Out of the original 17 VNTR loci experimentally validated in that paper, four were not included in our reference set and for one, the matching TR could not be determined. In total, 11 out of 16 detectable alleles were correctly predicted, four were not found in the NA sample with sufficient read size bp , and one was incorrectly predicted in the HG sample and not found in the other two Supplementary Table S7.

In all cases, only a handful of loci were inconsistent Supplementary Table S9. In , the Genomes Phase 3 sequenced 30 genomes using Illumina HiSeq at read length bp. Note, however, that read coverage was not the same for both datasets, causing variation in statistical power. The current study represents, to our knowledge, the largest analysis of human whole genome sequencing data to detect copy number variable tandem repeats VNTRs and greatly expands the growing information on this class of genetic variation.

The TRs genotyped consisted of some minisatellites occupying the mid-range of pattern sizes, from seven to bp. When considering the largest dataset in our study individuals , we found that, on average, each genome exhibited non-reference alleles at VNTR loci and among those, were common VNTRs. In addition to their widespread occurrence, further evidence of minisatellite VNTR importance can be seen in the enrichment of these loci in genes and gene regulatory regions promoters, transcription factor binding sites, DNAse hypersensitive sites, and CpG islands.

Our entire set of VNTRs overlapped with protein coding genes and exons. The common VNTRs occurred within or were proximal to over protein coding genes, including overlapping with exons. Biological function enrichment among these genes included neuron development and differentiation, and behavior. These observations are consistent with the finding that VNTR expansions in humans compared to primates are associated with gain of cognitive abilities 24 , and possible involvement of VNTRS with many neurodegenerative diseases and behavioral disorders The overabundance of VNTR proximity to genes suggests that variability at these loci could affect gene expression and indeed, we observed that the expression levels of genes were significantly correlated with the presence of specific VNTR alleles in lymphoblastoid cell lines of individuals.

In 25 , expression levels in 46 tissues for genes were tested and eQTLs were found. Similarly, in 26 21 VNTRs were found associated with expression in 38 genes. However, their VNTR alleles were larger and did not overlap the ones we tested in this study.

These findings are suggestive, but more study is required, both to determine if there is more evidence of tissue specific gene expression variation associated with VNTR genotypes 25 , 26 , 87 and if such correlational differences can be definitively tied to actions associated with specific VNTR alleles such as regulator binding affinity changes in regulatory regions.

For more elaborate studies such as these, it will be essential that for each sample used to measure gene expression, the raw whole genome sequencing data be available, so that specialized software programs, such as VNTRseek can be used to determine VNTR genotype. However, it is well known that hidden differences can lead to misinterpretation of GWAS results, and care is particularly important when those differences are tied to human ancestry.

Relevant to this, we have determined that of the common VNTR loci contain alleles showing significant population specificity and that these loci intersect with genes. Population-specific alleles also have the potential for use in tracing early human migration.

We have shown through principal component analysis with common VNTR alleles that super-populations are easily separated. Further, we have constructed a decision tree based on common VNTR alleles that obtains nearly perfect classification of individuals at the super population level.

It will be interesting to see whether, with more information, classification can be refined further to encompass specific sub-populations, whether a minimal minisatellite VNTR set can be established for high accuracy population classification, and whether VNTR alleles can be used to estimate mixed ancestry as is done now with SNP haplotyping.

This is true because VNTRseek requires that the tandem array fit within a read. Longer reads will help, but the abundance of high-coverage. Alternate methods exist 87 , 95 , but these have not reported an ability to handle macrosatellite VNTRs where the arrays and patterns are hundreds to thousands of base pairs long.

For this range of the tandem repeat spectrum, new tools must be developed. Another limitation comes from use of the Tandem Repeats Finder, which requires that the array contain at least 1. Previous studies on VNTR prevalence in the human genome have been limited to a subset of minisatellites inside the transcriptome and a limited number of genomes. Future research can be expected to further enhance our understanding of this important class of genomic variation. The reference TR set files, output VCF files, and the pre-processed data files along with the code to create figures and tables are published at: DOI Funding for open access charge: University funds.

Treangen T. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Google Scholar. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. Lim K. Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Richard G. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes.

Taylor J. Slipped-strand mispairing at noncontiguous repeats in Poecilia reticulata: a model for minisatellite birth.

Levinson G. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Madsen C. In vivo and in vitro evidence for slipped mispairing in mammalian mitochondria. Jeffreys A. Repeat instability at human minisatellites arising from meiotic recombination. EMBO J. Expansions and contractions in bp minisatellites by gene conversion in yeast. Bustamante A. Dynamic of mutational events in variable number tandem repeats of Escherichia coli O H7. BioMed Res. Vogler A. Effect of repeat copy number on variable-number tandem repeat mutations in Escherichia coli O H7.

Evolution of variable number tandem repeats and its relationship with genomic diversity in Salmonella typhimurium. Verstrepen K. Intragenic tandem repeats generate functional variability.

Legendre M. Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Res. Panigrahi I. Genetic fingerprinting for human diseases: applications and implications.

Google Preview. Sinha M. Molecular basis of identification through DNA fingerprinting in humans. Imam J. DNA fingerprinting: discovery, advancements, and milestones. Denoeud F. Predicting human minisatellite polymorphism. Deka R. A population genetic study of six VNTR loci in three ethnically defined populations. Human Genet. Hancock J.

Trinucleotide expansion diseases in the context of micro- and minisatellite evolution Hammersmith Hospital, April 1—3, Duitama J. Large-scale analysis of tandem repeat variability in the human genome. Nucleic Acids Res. Sonay T. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Bakhtiari M. Variable Number Tandem Repeats mediate the expression of proximal genes. Human Genome Structural Variation Consortium et al.

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs. Trepicchio W. Krontiris T. An association between the risk of cancer and mutations in the HRAS1 minisatellite locus. Wang S. Zukic B. Vasiliou S. Vafiadis P. Greenwood T. Promoter and intronic variants affect the transcriptional regulation of the human dopamine transporter gene. Lovejoy E. The serotonin transporter intronic VNTR enhancer correlated with a predisposition to affective disorders has distinct regulatory elements within the domain based on the primary DNA sequence of the repeat unit.

Klenova E. De Roeck A. Acta Neuropathol. Pacheco A. A VNTR regulates miR expression through novel alternative splicing and contributes to risk for schizophrenia.

Schoots O. The human dopamine D4 receptor repeat sequences modulate expression. Pharmacogenomics J. Xiao X. A carboxyl ester lipase CEL mutant causes chronic pancreatitis by forming intracellular aggregates that activate apoptosis.

Willems T. Yaniv and Genomes Project Consortium et al. The landscape of human STR variation. Genome-wide profiling of heritable and de novo STR variations.

Mallick S. The Simons genome diversity project: genomes from diverse populations. Gettings K. Unleashing novel STRs via characterization of genome in a bottle reference samples. Forensic Sci. Krishnan V. Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays. BMC Bioinformatics. Brouwer J. Microsatellite repeat instability and neurological disease. Rohilla K.

RNA biology of disease-associated microsatellite repeat expansions. Hannan A. Tandem repeats mediating genetic plasticity in health and disease. Rodriguez C. New pathologic mechanisms in nucleotide repeat expansion disorders.

Beck R. Cancer Genetics. Boulay G. Epigenome editing of microsatellite repeats defines tumor-specific enhancer functions and dependencies. Genes Dev. Nacev B. The epigenomics of sarcoma.

Antwi-Boasiako C. Ksiazek K. Oral Dis. Cong L. Bone Joint Res. Katsumata Y. Alzheimer disease pathology-associated polymorphism in a complex variable number of tandem repeat region within the MUC6 gene, near the AP2A2 gene.

Chang H. Scott H. Hoxha B. Psychiatria Danubina. Brain Funct. Van Assche E. Depressive symptoms in adolescence: The role of perceived parental support, psychological control, and proactive control in interaction with 5-HTTLPR. Stolf A. Korean Med. Vairaktaris E. AntiCancer Res. Sousa H. Safarinejad M. Urologic Oncology: Seminars and Original Investigations.

Ibrahimi M. Positive correlation between interleukin-1 receptor antagonist gene 86bp VNTR polymorphism and colorectal cancer susceptibility: a case-control study. Cui J. Differences of variable number tandem repeats in XRCC5 promoter are associated with increased or decreased risk of breast cancer in BRCA gene mutation carriers. Al-Eitan L.

Pharmacogenomics Personalized Med. Ahn E. Variants of MUC5B minisatellites and the susceptibility of bladder cancer. DNA Cell Biol. Kwon J. Short rare MUC6 minisatellites-5 alleles influence susceptibility to gastric carcinoma by regulating gene.

Weitzel J. The HRAS1 minisatellite locus and risk of ovarian cancer. Cancer Res. Wang L. Association of a functional tandem repeats in the downstream of human telomerase gene and lung cancer. Calvo R. JNCI: J. Cancer Inst. Yoon S. Genes Genomics. Batra S. Prognostic implications of chromosome 17p deletions in human medulloblastomas. Andersson U. MNS16A minisatellite genotypes in relation to risk of glioma and meningioma and to glioblastoma outcome.

Lim S. High-frequency minisatellite instability of the mitochondrial genome in colorectal cancer tissue associated with clinicopathological values. Xia X. MNS16A tandem repeats minisatellite of human telomerase gene and cancer risk: a meta-analysis.

PLoS One. Leem S. Diagnosis kits and method for detecting cancer using polymorphic minisatellite. Singh R. MUC1: a target molecule for cancer therapy. Cancer Biol. A polymorphic minisatellite region of BORIS regulates gene expression and its rare variants correlate with lung cancer susceptibility.

Rose A. Therapeutics and diagnostics based on minisatellite repeat element 1 msr1. Fondon J. Molecular origins of rapid and continuous morphological evolution. Laidlaw J. Elevated basal slippage mutation rates among the Canidae.

Sulovari A. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Gymrek M. A genomic view of short tandem repeats. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

Mousavi N. Profiling the genome-wide landscape of tandem repeat expansions. Dolzhenko E. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions.

Targeted genotyping of variable number tandem repeats with adVNTR. Gelfand Y.



0コメント

  • 1000 / 1000