The world's first wiki where authorship really matters (Nature Genetics, 2008). Due credit and reputation for authors. Imagine a global collaborative knowledge base for original thoughts. Search thousands of articles and collaborate with scientists around the globe.

wikigene or wiki gene protein drug chemical gene disease author authorship tracking collaborative publishing evolutionary knowledge reputation system wiki2.0 global collaboration genes proteins drugs chemicals diseases compound
Hoffmann, R. A wiki for the life sciences where authorship matters. Nature Genetics (2008)

Principles for the post-GWAS functional characterisation of risk loci


Authors of initial version

Alvaro N.A.Monteiro1, Gerhard A. Coetzee2, Matthew L. Freedman3, Mariella De Biasi4, Graham Casey5, Dave Duggan6, Angela Risch7, Christoph Plass7, Pengyuan Liu8, Michael James8, Haris G. Vikis8, Jay W. Tichelaar8, Ming You8, Simon A. Gayther5, Ian G. Mills9 on behalf of functional cancer genomics supported by the NIH Post-Genome Wide Association Initiative

  1. Risk Assessment, Detection, and Intervention Program, H.. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida, 33612
  2. Department of Preventive Medicine and Urology, Norris Cancer Center, University of Southern California, Los Angeles, CA 90033
  3. The Eli and Edythe L. Broad Institute of MIT and Harvard, Cambridge MA 02142 and Department of Medical Oncology, Dana-Farber Cancer Institute, Boston MA 02115
  4. Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030
  5. Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
  6. Translational Genomics Research Institute (TGen), Phoenix
  7. German Cancer Research Center, Division of Epigenomics and Cancer Risk Factors, Heidelberg, Germany
  8. Washington University, St. Louis, Missouri
  9. Centre for Molecular Medicine (Norway), Nordic EMBL Partnership, University of Oslo, Blindern N-0317 Oslo, Norway


[[ Dear all, thank you for the really great job! We’ve hit now the deadline of December 20th and also a technical limit of 80.000 characters ;-) The ball is now with the original authors and the editor ... We’ll keep you posted on the discussion page   ]]



The last few years have seen an explosion of activity in the identification of common, low-penetrance susceptibility alleles for a range of complex diseases and other traits using genome-wide association studies (GWAS) [1]. As of June 2010 , 904 genome-wide associations (at p<5x10-8) for 165 traits have been published [2]. Of these, 193 loci were associated with modified risk of 11 different cancer types [2]. Even though some risk-associated loci have been successfully identified [3][4][5] , these results are rarely followed by studies   on the molecular basis of risk etiology or on the identification of the causal variant [6] . A recent Nature Genetics editorial began to put this problem in sharper focus and suggested that there should be a more significant investment in functional characterization of identified risk loci [7]. Several factors might discourage the investment of such resources. Firstly the effects are so weak that many GWA SNP discoveries (e.g. in type 2 diabetes) increase the sibling recurrent risk ratio by less than 1 %, and there is little connection between the results of an abundance of earlier family based genetic linkage results and the current GWA results. Systematic comparison between the genome-wide linkage and association results has shown that the extent of genomic convergence between these two lines of hypothesis-free evidence is not beyond what is expected by chance for the majority of complex traits examined to date [8]. For example the strong cluster of positive genetic linkage results around 1q23 for type 2 diabetes has no corresponding signal in multiple GWA studies [9], whereas the replicated association between Crohn's disease and the IL23R gene was not supported by previous linkage data [8]. In the present paper , we introduce the term post-GWAS to refer to the analysis that follows the identification of a risk loci .   Importantly, the development of reliable biomarkers and effective preventive and therapeutic agents made possible by these discoveries is founded on a detailed understanding of biological function. The editorial made a distinction between two types of functional analysis. The first comprises a preliminary investigation that includes re-sequencing, association analysis of all variants within the region of linkage disequilibrium (LD), investigation of risk-associated SNPs as modifiers of monogenic traits, genomic analysis of gene expression in human tissues, screens for somatic mutations on risk haplotypes, and epigenetic analysis of human tissues in the regions of the GWAS SNPs. Similar guidelines have been proposed elsewhere, in the context of loci that underlie complex traits [10]. There is a second stage of functional analysis geared towards understanding the biological mechanism of risk enhancement and causality. Whereas in-depth analysis cannot be the subject of systematic evaluation due to the functional diversity across loci, the identification of general hypotheses that can be tested in a systematic way will improve and accelerate the analysis  . Hence, in this paper we propose principles for the initial functional characterization of cancer  risk loci to bridge this information gap. Several challenges lie ahead in assigning functionality to susceptibility SNPs. For example, most effect sizes are small relative to those seen in monogenic diseases, with per allele odds ratios usually ranging from 1.15 to 1.3, despite an occasional outlier such as KITLG in testicular germ cell cancer (OR=3.08) [11][12].   Thus, the functional effects of SNPs are likely to be subtle. It is unclear whether current molecular biology methods have enough resolution to differentiate such moderate effects. In addition, we anticipate that it will be difficult to address function for biological effects that are non-cell autonomous, are specific to certain developmental stages, or act at a site distant from the tissue-of-origin of the cancer. The study of such effects might benefit from in vivo models. An ultimate goal for the comprehension of a specific disease is to link genetic variation to causation : this is currently beyond the capability of modern molecular and logistical approaches, when it comes to analyzing complex syndromic   studies. Our aim here is , therefore, to provide a set of recommendations to optimize the allocation of efforts and resources in order to maximize the chances of elucidating the functional contribution of specific loci to a phenotype associated with a specific disease. Since it has been estimated that 88% of currently identified disease- and trait-associated SNPs are intronic or intergenic [4] , in this paper we will focus on the analysis of non-coding variants and outline a hierarchical approach for post-GWAS functional studies. Connecting a risk allele to a target gene(s) is particularly tractable intermediate point in this work. The unifying hypotheses that are applicable across these studies are:

  • There is a transcript , coding or not, that has not yet been annotated, and associates with a risk locus.
  • The risk locus is a regulatory element affecting the expression of one or more annotated transcribed regions of the genome.

Defining regional boundaries and the assembly of layered genomic data

The general approach underlying current GWAS is to identify disease association through surrogate SNP markers that capture linkage disequilibrium (LD) relationships across the genome. Taking this approach means that any GWAS data generated using commercial SNP arrays are limited by the depth of the genomic coverage and representation of LD structure on those arrays. Unfortunately, recent data suggest that only 60% or less of the common SNP information (>5 %), and less than 50% of variants with a frequency higher than 2.5% are found on SNP arrays used in the majority of GWAS conducted to date, when compared to the 1000 Genomes Project1 [13]. While a new generation of SNP arrays have been developed and addresses this issue of genome coverage (at least for European populations), the degree , or lack thereof , of common SNP content in GWAS studies has important implications not only for missing “dark matter” signals, but also for subsequent post GWAS functional characterization studies of known associations. Moreover, in the vast majority of cases any association identified through GWAS would be predicted to be between a surrogate marker (e.g . tagSNP) and the disease trait, rather than a surrogate marker and a causal variant, as SNP arrays were designed using surrogates chosen to capture LD structure on these arrays rather than for any functional reasons. Therefore, evaluation of all common SNPs across associated regions will be desirable to fully characterize the biological implications of disease associations. The clear implication is that it is not the SNP with the low p-value that is of interest but the genomic region surrounding it. Unfortunately, current HapMap information is incomplete with respect to identifying all common variant information and may not capture the genetic diversity of the populations being used in association studies. The ongoing 1000 Genomes Project seeks to address this missing data and will ultimately capture the majority of SNPs with an allele frequency of >1% across the genome [13], including intergenic regions and gene deserts where the majority of associations have been mapped. It is important to note that the recently published pilot data from the 1000 Genomes Project [13] captures the majority of common variants (>5% allele frequency) but also many less common or rare variants , and this resolution will improve strongly with the next data releases. Due to the used resequencing approach and the higher number of individuals per population, the 1000genomes project will provide information of genetic variants with much better resolution than the widely used HapMapII project   [14] . There is a growing interest in the role of multiple rare variants in disease risk, which will require targeted or whole genome sequencing of 1,000s to 10,000s of samples. An examination of the large number of disease causing allelic variants in well known genes (e.g. BRCA1 or CFTR,   ) makes it clear that complex diseases will not only be due to the multiple genes now under scrutiny, but also multiple variants within those genes, each variant with its own genealogy and genetic history. Increasing the sample size to 4000 and subjecting them to genome-wide sequencing, as is being envisaged in the UK10K project  ), will maximize the chances in detection in variation. Moreover, reducing cost, faster CPU processing power and increasing speed of sequencing will allow allele frequency as low as 0. 01 to be detected .

While genome-wide resequencing of thousands of cases and controls currently is not feasible for the vast majority of GWA studies and again, datasets the 1000genomes project will likely prove useful to alleviate this disadvantage. Interestingly, the information from cancer GWA studies can be augmented by genotype imputation based on reference haplotypes [15]. The striking advantages of this bioinformatic approach are (i) the potential identification of additional or more strongly cancer-associated SNPs and (ii) the identification of essentially all potential functional variants that are in strong LD with the originally genotyped SNPs. By using information from GWA studies in order to prioritize SNPs and target genes, such imputation methods are in a good position to respond to some of the challenges in the post-GWA studies. Furthermore, these methods allow for re-analysis of existing GWA datasets and meta-analysis across [16] different genotyping platforms.

Apart from the imperfect interrogation of rare variants, the first wave of GWAS largely discarded other important sources of variation, such as structural variants, noncoding RNAs or epigenetic changes [17]. The most common types of such variants are gains and losses of DNA, called copy number variants (CNVs), which likely exert important phenotypic effects on gene expression and function. Recently developed genotyping chips allow the simultaneous interrogation of SNPs and CNVs across the genome with improved coverage [18]. The genome-wide detection of rare CNVs, as well as the discrimination capacity for variants with wide range of variability in copy number, represent important challenges for genotyping technology research. Since the global inventory of CNVs remains incomplete, and with limited empirical data currently available, the extent to which current arrays capture the structural variome and enable a more comprehensive elucidation of the spectrum of genomic variability remains unclear [19]. Even under optimal design, genome-wide SNP screens will still not be informative about the role of other components of the regulatory architecture of the genome, such as noncoding RNAs or epigenetic modifications (somatically-acquired or trans-generationally inherited); complementary techniques, such as high-throughput methylome analysis, will be ultimately needed [17].

Understanding LD structure in the region across a risk locus will be critical to delimit the size of the target region and define the costs of the undertaking. It is still unclear which r2 threshold should be set in defining LD structure, as the causal SNP potentially could be in LD with the associated SNP at an r2 0.2 . Ideally, the LD structure should be defined using genotyping results from the GWAS population in whom original association was identified. For studies involving populations not included in the HapMap or pilot 1000 Genomes Project, defining LD relationships will potentially require variant discovery or additional genotyping . The latter is likely to be undertaken prior to defining LD relationships, due to the fact that it is not known how well the available SNP arrays capture the genomic diversity outside of these HapMap populations. The structure of LD blocks and the history of recent recombinations can also be predicted [20]. Alternatively, LD structure does not necessarily need to be considered and sequencing boundaries could be defined by arbitrary size (such as 1Mb across the associated SNP region) or by taking the most distal and proximal SNPs with r2> 0.1. Although somewhat arbitrary, these approaches represent a workable start point in the absence of data precisely defining LD structure as a start point. If planning to make use of next generation sequencing data to define regional boundaries, a second consideration is depth of sequencing coverage. For example, the pilot phase of the 1000 Genomes Project has completed low-coverage (4x) whole genome sequencing on 179 individuals (~60 from each of three populations of West African, East Asian, and Northern European ancestry) and deep-coverage (>30x) whole genome sequencing on 2 trios (with West African and Northern European ancestry; also re-sequenced at low-coverage). In addition, deep-coverage (>30x) exome re-sequencing data will soon be available on hundreds of samples, and eventually on all 1000. Remarkably, even at low-coverage it is expected that the majority of common variants, intra- as well as inter-genic, with a minor allele frequency (MAF) >5%, will be identified. Complete coverage is expected for those common variants, intra- and inter-genic once again, with MAF >10%. These data offer a substantially improved value proposition to genetic variation discovery when compared to genome-wide arrays while, at the same time, pushing deeper into the MAF spectrum. That said, considerably deeper coverage will be needed and in more populations, if rare variants are to be identified, let alone novel hypotheses tested. Pooling of subjects for sequencing can be used to reduce costs but could limit coverage for individual samples in the pool . (NOTE : I would remove this - , while another genotyping step may need to be introduced to obtain individual level data .) Moreover, the greater the target region being, the fewer individual DNA samples that can be pooled and the less depth of coverage that can be afforded. Additional biological data could also be used to define the region for analysis . For example, boundaries could be reduced due to a compelling candidate gene/ transcript mapping to a certain region. However, if any of these a priori hypotheses based on known biological data are used to reduce the extent of sequencing, it should be recognized that this decision is generally made for cost reduction purposes only. Relying on biological assumptions undermines the functionally agnostic approach, one of the main advantages of GWAS. Finally, the majority of published GWAS data comes from European populations, which introduces an ascertainment bias toward the variants identified in these populations (for a review on the topic see [21]). Incorporating GWAS information from other ethnic groups such as African-Americans could potentially reduce the target region if a similar association is found in this population, as the African-American population might have smaller LD block structure than the European population, as exemplified by a recent study on cocaine dependence [22] . Therefore, the extent of targeted sequencing will need to be a compromise between the size of the region to be sequenced, while incorporating information such as LD structure and the overall sequencing cost.

Computational methods for SNP prioritization, Databases and approaches to predict function


Different methods are available to identify candidates for the causal variant, to predict its function and to infer the biological mechanism behind the association. Computational approaches have the advantage of being quicker and cheaper to run, making them widely available ; moreover, a computational pipeline can be automatized and applied to several identified variants , and makes it easier to compare results from different studies. A successful computational prediction can save time and simplify the design of the subsequent experimental steps, but should be always accompanied by an experimental validation of the result. Samples of the types of methods and tools that may be used to accomplish these investigations are provided in Table 2.

As described in the previous section, the first step after having identified a tagSNP associated with cancer or another congenital disease is to identify the real causal SNP or at least to predict its position or the gene involved in the association. Two computational techniques developed for this task are generally known as SNP prioritization and digital candidate gene approach [23]. SNP prioritization methods aim at predicting the known SNP variants which are the most likely to be involved with the phenotype; generally, these approaches rely on comparison with microarray data [24], or the incorporation of information from different databases [25]. Candidate Gene Approaches also rely on the integration of different sources, such as expression, known association with other phenotypes, biological pathways, etc.. [26] [27] [28] [27] [28] [29]  

After having identified the association of a SNP or a variant with a disease, the first approach to determine its function is to consult the existing databases to determine whether other information are known about the SNP, or whether the SNP is known to be associated with other diseases or phenotypes. Useful resources for general information on SNPs and variants are listed in Table 2. General genome browsing tools can be employed to explore the context in which a variant is found , which can be used as a first step to delimit the potential functional effect of the SNP (Figure 1). Layers of annotation including known and predicted genes, promoters and other regulatory aspects, and comparative genomics can all be investigated to elucidate features. Identification and comparative analysis of patients suffering from same type of drug reaction could also be a useful tool in identifying SNPs [30]. Cancer-specific resources may offer insights on the presence or absence of variations in normal and/or tumor genomes. Finally, an innovative approach to the analysis of the literature on a region of a SNP is provided by GRAIL [31].


After having collected the information already annotated about the SNP identified, further analysis and predictions can be made. Most non-synonymous SNPs (nsSNPs) have little or no effect on the function of the proteins for which they code, and appear to be selectively neutral. However, a few SNP are regulatory in nature and may affect function directly by altering protein-binding sites, or indirectly by altering protein stability, aggregation or post-translational modifications. The sheer volume of SNP data generated by GWAS means that computational methods are useful for prioritizing SNPs for experimental characterization. In the case of variants falling within coding regions, it is possible to predict their effect on the protein structure [32]; a good example of an automated pipeline is described in Burke et al. [33], who presented a framework to predict the effects of known nsSNPs . They analysed over 6000 genes carrying almost 24,000 nsSNPs (2,249 disease-associated nsSNPs from OMIM and 21,471 nsSNPs from dbSNP). In order to increase the coverage of structural information, the authors implemented an automated, large-scale, structure prediction strategy using homology modeling, building models for almost 90% of the genes. These models were then used to generate predictions for 40% of the nsSNPs (using methods that depend on analyzing amino acid substitutions in orthologous families in terms of their local structural environments in the protein structure [34]  . The observed nsSNPs were analysed for affects on protein stability by several methods including SDM [35]   and for affects on protein interactions by Crescendo [36]. Structural databases that identify protein-protein (PICCOLO:   ), protein-nucleic acid ([37] BIPA:   ) and protein-ligand ([38] CREDO:   ) interactions along with a newer analysis tool ( [39],  ) and web interface (SAMUL:  ) in which users can browse structural and functional annotations of proteins at their residue level , have since been used to augment these predictions . Further work by the group demonstrated that combining the predictions from different structure-based methods increases the sensitivity of predictions whilst maintaining good specificity [40]. This methodology has been used to generate new hypotheses regarding the molecular etiology of renal cell carcinoma and pheochromocytoma in the cancer syndrome, von Hippel-Lindau disease ([41].Software that is currently available for testing the effect of nsSNPs in silico is described in Table 2.

Regarding the variants falling within intergenic regions, many approaches are available, depending on the position of the variant identified and the hypothesis of the researcher. A gene prediction tool can be employed to determine whether the region involved is an un-annotated gene or a pseudogene [42] and other Genomic Functional elements, such as splicing regulators, transcription factor binding sites and otehr (other) regulatory elements can be predicted as well [43].   Thus, several bioinformatics methods and tools have been developed for the analysis of putative promoters (Eponine [44], McPromoter [45], PromoterScan [46]), transcription factor binding sites (Match/Transfac [47], TFSEARCH  ), and both exonic splicing enhancers (ESEFinder [48], RESCUE-ESE [49]) and silencers [50]. Additionally, suites that include different prediction tools for both coding and regulatory SNP have been developed ([51], [52]), providing the user with an integrated interface that allows an easier and more comprehensive analysis. However, although the prediction of functional effects can help to prioritize SNPs for downstream analysis, it is often found that associations from individual markers show a weak effect because of the complex interactions among genes. It is likely that the combined use of experimental genotyping results and biological information on gene interactions (protein-protein interactions, transcriptional interactions, etc) will improve the power to detect associations of markers to complex traits. Hence, pathway-based approaches constitute a powerful tool to further analyze the results of a GWAS, not only to characterize the function of the variants for which an association is identified but also to possibly discover the effect of other variants with minor effects [53] [54] [55] [56] [57]. In any case, the annotation of biologically relevant pathways in scientific databases like KEGG/Pathways[58], Metacyc [59], Reactome [60] and the Pathway Interaction Database [61] makes it possible to understand the function of the genes in the nearby of the variant identified and to formulate hypothesis on the etiology of the disease .   A recent elegant report   demonstrates that network analysis using risk variant and gene expression data is proving to be a powerful and fruitful tool in the dissection of the pathways, driving the pathogenesis of the disease. This ranges from transcriptomic analysis to predict the regulatory influence of transcription factors over gene networks dependency using tools such as ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) [62], through Bayesian network approaches to identify predictive relationships between genes from a combination of expression and eQTL data [63]. Whilst these tools are elegant, the ability to translate their outputs into biological significance is heavily dependent on the availability of easy to handle and relevant model systems to test the predicted connectivity. These approaches clearly pose validation challenges for many diseases.


Section on "Integration of tools for enhanced annotation and analysis of variants" and "Visualization of Risk Variants" added to discussion page  Genetics of Gene Expression

  Having a defined target region, and identified all common variants (frequency > 1-5 %, depending on sequencing depth) within a risk locus by re-sequencing and/or in silico analysis, the next step to achieve functional characterization is to explore associations between statistically significant SNPs and gene expression. Such associations offer a thread from which to build functional validation, although they do not necessarily mean that the genes cause the clinical trait. Gene expression is a heritable trait. Transcript abundance varies in the human population (similar to height and blood pressure) and thus can be considered a trait that is amenable to genetic mapping. A number of landmark studies have unequivocally demonstrated that a substantial fraction of transcripts in the human genome are influenced by inherited variation [66][67][68][69] [70][71]. Genetic variants affecting transcript levels are often referred to as expression quantitative trait loci (eQTLs). eQTLs can be located near the gene they regulate or far away and thus have a local or a more distant effect . The distinction between the two is often arbitrary; however, for most studies local eQTLs have usually been defined as being within 1 mega base from the variant under consideration. Distant can involve interactions between an eQTL and a gene located in different non-homologous chromosomes. As has been previously pointed out , we prefer the terminology of local and distant rather than cis- and trans- which implies a mechanism [72]. Certain principles have emerged from these studies: (i) eQTLs tend to explain a greater proportion of trait variance than is typically seen for risk alleles and clinical traits; this observation translates into eQTLs and gene expression traits that can be discovered with smaller sample sizes than association studies between inherited variation and clinical traits (such as disease risk ); (ii) local eQTLs tend to have larger effects on gene expression than distant eQTLs and are , therefore, easier to discover ; (iii) expression phenotypes are primarily regulated by distant eQTLs [73] (comment: I prefr [74] and it says the opposite: local=cis-eQTL) (iv) the majority of local eQTLs in humans is clustered closely around the transcription start site or or just upstream the transcription end site [75].

Genetic variation on the short arm of chromosome 3 is a good example of a local eQTL which affects the expression of the CCR5 gene significantly. This genetic variation, known as the CCR5-∆32, consists of a 32bp deletion in the single coding exon of the gene causes production of a nonfunctional protein. Consequentially, this gives rise to a phenotype in the form of resistance to HIV infection in individuals who are homozygous and slow progression to AIDS in individuals who are heterozygous for this variation [76] [77] . However, closing the gap between genotype and phenotype in complex diseases is not always straightforward. Unlike Mendelian disorders, a large fraction of the associated loci are located outside known protein-coding regions. Both empirical and computational data support the notion that a considerable proportion of these loci will be eQTLs [78][79][80][81] (and Pomerantz et al., PLoS Genetics, in press). The strategy of applying the genetics of gene expression approach offers an appealing and straightforward way to initiate the complicated task of connecting risk variants to their target genes. Importantly , this strategy does not require knowledge of the actual causal allele. Many of the initial successful eQTL studies relied on the employment of patient-derived lymphoblastoid cell lines largely due to their easier availability [82] [79]. Recently, these anlayses have been performed in primary human tissues , and it has been demonstrated that some of the associations are tissue-specific [83][80] (and Pomerantz et al ., PLoS Genetics, in press). Alternative splicing also adds to the complexity of eQTL associations, allowing for cases where only a particular RNA isoform is affected by the genotype (Burd CE et al .  ).

Add  .

A complementary and powerful approach to defining local eQTLs is to measure allelic imbalance in individuals that are heterozygous for a risk allele. Any transcript demonstrating a deviation from a 1:1 ratio (as typically measured by a transcribed heterozygous marker) becomes a strong candidate gene [84][85][86]. What if the risk allele is not associated with the expression trait? False negatives can occur because gene expression varies in terms of time and space; therefore, the developmental time point and/or the cellular population(s) that it is being examined may not be entirely appropriate. As a rule of thumb, associations between the risk allele and its expression should initially be tested in the cell lineage that is believed to give rise to the tumor type (and subtypes) under study, and contrasted with the expression patterns and associations in other tissues. Effects on transcript abundance may be subtle and therefore below the sensitivity threshold of a particular platform and/or the sample size may not be adequate . Transcript abundance is usually evaluated under steady-state conditions. Lastly, the impact may only be revealed in certain contexts, such as the activation or the inhibition of a particular pathway. In these cases, alternative assays will be required to involve these genes . Future questions to be addressed in this field include: i) what are the appropriate target tissues to examine? Risk alleles may act in a non-cell or -tissue autonomous fashion and therefore may exert their effect through other cell types that act upon the target tissue under consideration. ii) Should the diseased tissue or the normal tissue or both be evaluated? We will touch some of these points later in the article . Finally, iii) RNA-sequencing (RNA-seq) using next generation sequencing platforms has the potential to reveal the transcriptome in its wholeness [87]. The RNA-seq platforms also possess suitable characteristics such as increased sensitivity for low abundance transcripts and a wider dynamic range than microarrays. The challenge shared by both RNA-seq and microarrays are the limited involvement of time points from the data gained. As cancer development is a temporal and wide process, dynamic monitoring risk allele's expression is necessary to validate its roles. Furthermore, as cancer will develop through a multi-step process, it would be very intriguing to detect the different expression levels of a certain risk allele during the shift between two phases of the disease (e.g. the transition between hyperplasia or dysplasia and carcinoma in situ). Critically, it is now becoming evident that data from functional studies heavily rely on compelling statistical inference. How to design and statistically interpret becomes important for the soundness of the results from the data analysis. Furthermore, the interpretation or biological inference often extrapolates beyond the whole metabolic pathway and even the phenotype (cancer or disease per se) involved. But, frequently, the RNA-Seq projects are only based on samples from one or several specific time points, how reliable for the causality related to the cancer or other disease characteristics is the critical concern for the results. From a physiological point of view, genomic functional data may show different activities of relevant alleles within a polymorphic site at different time points. These differences make challenge for reproducibility, and statistically convincing functional results for the candidate gene(s) being analyzed. Failure to observe a functional difference between alleles is also not a reliable indicator because there is very limited data available for the phenotype development. These concerns make dynamically and systematically necessary to monitor the expression of these selected important alleles .


Genetic variation and epigenetic control of gene expression

Promoter methylation, histone tail modifications and altered expression of non-coding RNAs, such as the large intergenic non-coding RNAs (lincRNAs) [88][89] [89] that associate with chromatin modifying complexes, contribute to gene regulation in normal development as well as to aberrant gene expression in tumorigenesis [90]. Furthermore, epigenetic mechanisms play an important role in mediating the influence of the environment on gene expression [91]. Epigenetic silencing has been shown to be the predominant mechanism of gene inactivation for a subset of genes [92], while for other genes genetic, as well as epigenetic, mechanisms have been shown to jointly contribute to tumor suppressor gene activation [93]. An important research direction in the field is to understand the interplay between environmental, genetic and epigenetic factors, both in healthy tissues as well as in the initiation and progression of malignancies. Here we focus specifically on two questions: (1) Do genetic variants alter the epigenetic landscape and in this way they increase the susceptibility to develop cancer? (2) Do genetic variants increase the risk of a locus to become epigenetically silenced in the tumor? Technologies, in particular for the assessment of DNA methylation, are now sufficiently advanced and affordable enough to screen large tissue collections at a single CpG resolution [94]. Methylation profiling within and around haplotype blocks associated with cancer risk thus represents one appropriate strategy. Mechanisms by which genetic variants have the ability to affect epigenetic marks are known from studies in hereditary non-polyposis colorectal cancer. Here constitutional epimutations exist, that predispose to the early onset of a colorectal cancer [95][96][95] : the MSH2 and MLH1 , present in somatic cells including peripheral blood lymphocytes, are examples of highly penetrant epigenetic predisposing factors. For example, mutation of TACSTD1 leads to transcriptional inactivation through promoter methylation of a single MSH2 allele in normal somatic tissues. It has been proposed that many epimutations are a consequence of cis or trans acting genetic variants (reviewed in [97]). In an elegant experiment Kerkel et al. [98] showed a sequence dependence of allele-specific methylation and demonstrated that cis-regulatory polymorphisms control gene expression and affect chromatin states. Recent ChIP-Seq data provides further compelling evidence that SNPs and structural variants frequently coincide with allele specific differences in transcription factor binding sites and chromatin structure. This can now be investigated using allele-specific sequence analysis approaches [99], although the depth of sequencing required for this approach is still an issue [84]. Genome-wide maps of allelic asymmetries are expected to identify functional regulatory polymorphisms [100]. The fact that SNPs can affect allelic imbalance in a tissue specific manner, as shown for UGT2B15 [101], is an important issue in the experimental design. Further epigenetic mechanisms modulating gene expression on transcriptional and post-transcriptional levels involve long non-coding RNA and small regulatory RNA [102] [103]. For example, piRNA have been implicated in de novo DNA methylation and transposon silencing in mice [104], and several oncogenic miRNAs and their binding sites , were shown to be directly be affected by SNPs and CNVs [105]  . Tandem repeats can also impact gene expression , for instance, by altering transcription factor binding sites and also by affecting chromatin structure (reviewed in [106]). Despite the focus above on enhancers and insulators as genotype-specific, long-distance regulators of gene expression, we are also well aware of that chromatin fibres dynamically explore the nuclear space to establish meta-stable, long-range interactions with other chromatin fibres [107]. Although the functional outcomes of such interactions remain largely unchartered, it has been demonstrated that they are capable of transferring epigenetic marks to modulate transcriptional processes both in cis [108] and in trans [109] [110]. Hence, chromosome crosstalk sets the stage for the spreading and propagation of pleiotropic epigenetic effects in all likelihood in a manner reflecting the topology of the involved network [107]. As sequence polymorphisms can influence communication between different parts of the genome [111] it is probable that some SNPs can generate genotype-specific chromatin networks. This scenario includes the possibility that the genotype promotes or antagonizes the formation of chromatin networks in combinations that for some regions are individual-specific to predispose for various human diseases. It is thus conceivable that the functional annotation of loci identified in GWAS screens will benefit from the incorporation of the chromatin/chromosomal network perspective. Once used the approaches described above to generate a focused list of polymorphisms for functional follow-up, the subsequent challenge is to examine their impact in appropriate in vivo model systems. The principal criterion for taking a polymorphism forward is the association of the eQTL with a disease.So far, epigenetic modifications play important roles in controlling gene expression. But epigenetics is much more diverse than canonical gengetics, related studies are far behind those on the latter. In the future, certain epigenetic modifications might be considered as another type of genetic variations, and could be independently investigated in GWAS studies.


Models for testing function in vitro and in vivo

Gaining a better understanding of the biological mechanisms of cancer development often relies on the analysis of models that reflect the human disease and the application of technologies that facilitate the analysis of these models (Table 1). It is likely that establishing a functional rationale underlying the significance of allelic variation and the identification of candidate genes at common low penetrance susceptibility loci in biologically relevant disease models, will become a major component of following-up the data emerging from GWAS. Disease models can be based on either the in vitro characterization of human tissues (primary tissues or cells in culture) or in vivo models of disease development. However, models are generally limited to studying one variant/ gene at a time. It remains technically challenging to study co-founder effects, the possibility that multiple genes in a locus cooperate in the development of a particular disease . This mirrors challenges of assessing individual risk based on a statistical model in which SNPs interact, and , indeed, the statistical significance of associations between SNPs and disease is based on the idea of independent contributions from each SNP, rather than a complex interplay.  Genome-wide SNP genotyping could be exploited as a 'fingerprinting' tool for finding disease-causing genes in both congenital [112][113] and sporadic diseases, such as cancer [114] [115]. Whereas, copy number variation (CNV) provides gene dosage information which can be used for measuring global genomic instability status and identify novel disease biomarkers during disease progression [116][114][116][115]. The basic hypothesis underpinning these studies is that genetic variation at susceptibility loci influences the initiation of the disease phenotype. Although the expression of several highly penetrant disease genes (e.g. BRCA1, BRCA2, APC) is ubiquitous, the functional effects of genetic variation are reflected in a tissue-specific manner. Therefore, it will be necessary to evaluate the functional impact of disease-associated SNPs in the precursor tissue. Another issue is the size of the functional effects expected for common genetic variants. Whereas the functional effects of several sequence alterations in genes such as BRCA1 and APC are clearly detrimental (they are usually coding sequence changes leading to defective protein products or abrogated expression), the functional effects of SNPs are likely to be subtle, and therefore likely to require large sample sizes in order to achieve sufficient power to detect associations. Thus, large bio-banks of normal tissues may need to be established to evaluate functional differences between the different alleles of a SNP may require establishing large biobanks of normal tissues. Establishing such biobanks will be a significant part of the challenge; whereas extensive efforts within the cancer research community have established tumor tissue bio-repositories, it has been less common to do so for normal tissues from the cells of origin of cancers. This issue is particularly problematic for tumor subtypes in which the cell of origin is still debated. Studies of normal tissues will need to be complemented by in vitro analyses using cell culture models. The location of SNPs with respect to candidate genes will provide multiple testable hypotheses about the functional consequence of genetic variants - for example whether or not SNPs lie in coding sequences, intronic regions, in well-defined gene promoters or stretches of chromatin enhancer sequences that may suggest a role in transcriptional regulation . Progress in establishing good models of normal tissues has been hampered by difficulties in accessing specimens, and the challenges of culturing primary cells. For example prostate epithelial cells are dependent on the presence of a co-cultured stromal component for establishing the secretory cell phenotype and functional differentiation. For the normal colon, most commercially available normal epithelial cell lines are fetal in origin, and differences in fetal and adult cell biology limits the translational potential of work using fetal cells to model adult epithelial cancer genesis. There are exceptions - in breast, well-characterized commercially available cell lines exist that represent good models of normal breast tissue, e.g. MCF10A cells and immortalized HMECs. Three-dimensional (3D) cultures of MCF10As form polarized cystic structures that closely reflect the architecture and molecular features of breast acini in vivo. Using this system, a link between loss of BRCA1 function and impaired luminal differentiation of mammary epithelia was established [117]. The fidelity of this association was later confirmed through analysis of epithelial subpopulations in BRCA1 mutation carriers [118] . Since BRCA1 basal-like breast cancers may originate from luminal epithelial progenitors [119], impaired luminal differentiation may be principal to tumorigenesis associated with BRCA1 mutations. If so, quantitation of multi-cellular phenotypes resulting from BRCA1 depletion in the MCF10A 3D system may serve as an in vitro tool through which genetic associations, such as those identified through GWAS [120], can be validated. For instance, a locus on 19p13 modifies risk of breast cancer in BRCA1 mutation carriers [120]. A smaller 13Kb region, defined by the most strongly associated SNPs, contains three genes, including ABHD8, ANKLE1 and C19orf62 (aka MERIT40 ) [120]. MERIT40 is a component of the BRCA1 A complex and may be the most plausible genetic modifier in this cluster [120], however, variation in other genes may contribute to risk. To test these putative associations, matrix format screens could be used to concurrently deplete each candidate gene with BRCA1 depletion to measure the attenuation or augmentation of impaired luminal differentiation within the MCF10A 3D system. By using such 3D models, it is possible to dissect subtle phenotypes, such as changes associated with gene dosage. For example , a recent study using primary human oral epithelial mucosal keratinocytes on 3D cultures has unraveled an early mechanism of how aberrant expression of FOXM1 transcription factor hijacks adult stem cells to induce cancer initiation [121].

The development of in vitro models of human tumors, which represent the quickest and most accessible way to test the function of candidate genes located at susceptibility loci, are more advanced, but functional effects may be masked by an aberrant genetic background and limited diversity of cell types. Importantly, most SNPs are reported to confer a predisposition to a particular disease and , consequently, we must expect that the greatest functional impact will be achieved in an essential healthy, non-aberrant tissue/background. Such a context in which to study function is perhaps the hardest context to replicate and maintain in a laboratory setting, meaning that there must be a continuous drive for improvements in the models used. Evaluating the functional effects of SNPs using in vivo models represents an even greater challenge. Several significant limitations in the biological validation of candidate genes remain by employing genetically engineered animal models, including:

  • Relatively short duration of experimental models compared to human tumorigenesis that typically develops over several decades.
  • Important differences in human vs. animal physiology. For example, animal models of HIV, psychiatric illnesses and leprosy hardly exist and no standardized phenotype ontology to map complex human traits in animal model phenotypes exists [122].    
  • Important differences in the structure and sequence of non-coding regions  
  • Limited modelling of gene-environment interactions.
  • Sensitivity of animal modelling to confirm function of low penetrance alleles when studied in isolation.
  • High frequency (35%) of perinatal lethality in animal models when the GWAS-implicated loci are knockedout, thus preventing further elucidation of the responsible molecular mechanisms [122].
  • Difference in the environment where the animal grows, as the cage where the animals live can influence their behavior and health. Toys or social interactions can reduce these difficulties. [123]  
  • Possible bias toward males , and intrinsic difficulty in designing experiments with females, due to the menstrual cycle. [124]
  • Differences in N-Glycosylation and protein modification .
  • Different Alternative Splicing regulation.
  • Epigenetic differences.
  • Differences in the karyotype and chromatin interactions.  
Despite these limitations, animal models remain a vital tool for post-GWAS validation, particularly when considering quantitative phenotypes. The multi-tumour APC-mutant mice, for example, provide power to detect quite subtle effects of variants. Another interesting approach would be to harness on the regenerative power of planarians or flatworms, which have orthologs corresponding to humans. These worms regenerate into separate organisms when cut longitudinally or transversely. By manipulating the SNPs of one pair (test set), we could compare the effects with respect to the control set (other of the pair) to gain an insight into the biochemical pathways of governing the changes  .

Exploring the functions of SNPs in regulatory sequences/regions: drawing the themes together.

Functional SNPs found within large, non-coding intergenic intervals highlight the challenges of defining LD structure and pursuing functional validation. One of the hypotheses to explain this is that the risk locus contains regulatory region(s) with element(s) affecting the expression of one or more distantly transcribed region(s) of the genome. Here the starting point is to explore whether they are components of regulatory elements and what their distal effects may be. Although the most abundant of these regulatory sequences are enhancers, they may also include other regulators such as insulators and silencers. Unlike promoters (at transcription start sites of genes), distal regulatory sequences such as enhancers are often cell-type specific [125] and thus may be targets for tissue-specific risk-SNP effects. Annotating such regulatory sequences using chromatin marks or DNase sensitivity has proven to be a powerful method [126][127][127], and is more informative than the alternative approach of using evolutionary conservation, since regulatory elements tend to be unconstrained across mammalian evolution [128][129][130]. Specifically, identifying regulatory sequences containing functional SNPs within response elements by using chromatin annotations has been proposed recently[131]. Additionally, demarcation of such regulatory regions is even more precisely achieved by assessing the association of candidate transcription factors with response elements. Both histone modifications and transcription factor occupied regions are currently identified using ChIP-seq methodologies and signals yield short DNA stretches (~1kb) amenable to detailed analyses. Enhancer activity in such regulatory regions can be assayed using reporter genes in vitro [5] and/or in vivo[126]. Additionally, resequencing the target regulatory regions should then be prioritized to capture all the variation within them. Once activity is demonstrated, identified SNPs within the regions, especially ones within known transcription factor response elements and in LD with a tag-SNP, may be analyzed using biochemical methods for differential transcription factor binding and activity. Resequencing, targeting the regulatory regions, then should be prioritized to capture all the variation within them. Finally, regulatory sequences containing functional SNPs determined in this way can be matched with their physiological target genes to probe functional significance . It is possible that multiple SNPs may function coordinately at a particular locus and each should be taken forward in subsequent mechanistic analyses. Three approaches are conceptually available to identify targets of regulatory sequences: (i) Target genes may be identified in regulatory sequences knockout mouse models (using cre-lox for tissue-specific and timing purposes) by genome-wide gene expression analyses after the knockout; and (ii) Target genes may also be identified by using the regulatory sequences as baits in chromatin conformation capture (3C) based studies [132][133], including genome-wide 3C-seq, to identify pleiotropic chromatin crosstalk; and (iii) Targeted editing using somatic cell knock-in technology, although technically demanding, is another approach. Allelic series in isogenic settings may be created and gene expression differences measured - either in naturally growing cells or in cells that are perturbed in some manner (e.g., radiation, hormones, etc ). Finally, these screening approaches should be followed by matching results from them, creating a list of robust targets that can each specifically be studied further. Finally, these screening approaches should allow the matching of results from and between them, creating a priority list of robust potential targets that might each specifically be studied further. Thus, target genes under control of functional SNP-containing regulatory regions may have important roles in the cancer phenotype, such as proliferation, migration and apoptosis. Endpoints of the cancer phenotype, such as cell division, migration and apoptosis rates and protease secretion may be measured in cultured cancer cells and mouse xenografts after the overexpression of the genes of interest, or their selected siRNA/shRNA knockdown. It will also be important to develop ways to make this fine functional annotation data generated in these analyses freely accessible, because in many cases while they may not constitute publishable data they could be useful for other post-GWAS efforts. In summary the order of investigation is: identify chromatin architecture in tag-SNPS (+LD blocks) -> assay for regulatory activity, such as enhancers -> determine how SNPs in LD with the tag-SNP effect such activity -> identify regulated target genes -> understand biology/cancer risk -> understand causal predisposition.

A role for Digital Life Information? See discussion page  .



The GWAS community has arrived at an important crossroads. As resources are limited, the debate revolves around whether enough progress has been made towards identifying the SNPs that are likely to contribute most to disease causation or to invest in functional post-GWAS analysis. Consensus on a single gene not being the only causative factor in complex diseases like cancer is emerging. Similarly, it is not surprising that functional SNPs affecting a single gene or allele may have a subtle effect on their own. As sequencing technologies become cheaper and more accessible, we argue that this will evolve rapidly as datasets expand and will afford greater confidence in defining both SNPs and LD structure within the region where they lie. This will require detailed mapping and annotation of epigenetic and transcriptomic landscapes within which a major limitation may prove to be the sample collections themselves. Whilst this progresses, which it will be hopefully achieved through consortia assembled from academia and industry, it is vital that proof-of-principle studies take forward the strongest candidate SNPs available so far, not in this case necessarily to test their causative association with disease, but rather to understand their functional impact. What makes for strong candidates are significant associations with transcript expression ( overlap between eQTL and GWA signal ; chromosome conformation capture), tissue specificity and the phenotypic impacts of these transcript associations on model systems in downstream experiments. Successfully making the experimental transitions to progress through this process will require collective working at a consortia/multi-group level and a clear decision tree. It is essential for the field that this overrides the temptation to publish fragmentary work capturing only sub-steps in this sequence. Over time, integration of the re-sequenced, epigenetic and molecular-epidemiological data within different ethnic groups (and thus within different linkage disequilibrium structures) will help localizing causal variants. If we begin to consider how to explore the functional impact of SNPs now, we will, as a community, be in good shape to rise to the challenge of testing causation in the future. The field is still making the first forays into the functional characterization of SNPs and is many steps away from proving causality. Nonetheless, our view is that causality can only be inferred if the eQTL is associated with disease and a SNP leads to expression differences in reliable in vitro and perhaps in vivo assays. Naturally, our ability to get close to this goal must be assessed in the context of current technologies and knowledge on a disease-by-disease basis, but we hope that this article will help to frame the developing debate and the emerging research that seeks to rise to this great challenge.



We would like to thank Dr. Fred Bunz and all the members of the NIH Post-Genome Wide Association Initiative for helpful discussions and in particular Dr. Ian Tomlinson (Wellcome Trust Centre for Human Genetics, Oxford). The contributing groups are supported by funding made available through the NIH Post-Genome Wide Association Initiative in response to Call  . This Call sustains research across five cancer organ sites (prostate: 1U19CA148537-01, breast: 1U19CA148065-01, ovarian: 1U19CA148112-01, colorectal: 1U19CA148107-01 and lung: 1U19CA148127-01). For further information on this Initiative please refer to the website:  



3C, 4C, 5C, Hi-C, 3C-seq: chromosome conformation capture (3C) is a technique is used to identify interactions between genes and long-range regulatory domains made possible by chromosome loops that bring the two regions to physical proximity. Developments on this method include circularization (4C) of the genomic fragments with the use of inverse PCR primers, carbon copy (5C) technology for multiplexed ligation-mediated amplification, and high throughput analysis by massively parallel sequencing between many baits and targets (Hi-C), or many targets from a single locus (3C-seq).

Supporting references: [134] [135][136][137]

Causal variant: In the context of GWAS it represents the SNP that is mechanistically linked to risk enhancement. This is distinct from SNPs that do not have any functional impact but are statistically associated with the disease phenotype due to its linkage disequilibrium with the causal variant.

Supporting evidence: [138]

ChIP-Seq: Chromatin Immunoprecipitation (ChIP) is a method to study protein-DNA interactions. It identifies genomic regions that are binding sites for a known protein. Analysis of these regions is typically performed by PCR, when there is a hypothesized known binding site, or through the use of genomic microarrays (ChIP-chip). Alternatively, analysis can be done using next-generation sequencing (Seq) technology to analyze DNA fragments.

Supporting references: [139][140]

CNV: Copy Number Variation is a type of structural variation in which a particular segment of the genome, typically larger than 1kb, is found to have a variable copy number from a reference genome.

Supporting references: [141][142][143]

Deep sequencing: a sequencing strategy used to reveal variations present at extremely low levels in a sample. For example, to identify rare somatic mutations found in a small number of cells in a tumor, or low abundance transcripts in transcriptome analysis.

Supporting references: [87]

DNA Methylation: A modification of the DNA that involves predominantly the addition of a methyl group to the C5 position of the pyrimidine ring of a cytosine found in a CpG dinucleotide sequence.

Supporting references: [144]

eQTL: expression Quantitative Trait Locus. A genetic variant underlying the individual differences in quantitative levels of gene (or protein) expression.

Supporting references: [82]

Epigenetic markers: an array of modifications to DNA and histones independent of changes in nucleotide sequence but rather the addition of a methyl group to cytosine and a series of post- translational modifications of histone including methylation, acetylation, and phosphorylation.

Supporting references: [145]

Fine mapping: a strategy to identify other lower frequency variants in a disease-associated region (typically spanning a haplotype block) not represented in the initial genotyping platform with the goal of uncovering candidate causal variants. It can include data mining of publicly available sequencing efforts, such as the 1000 Genomes Project and targeted re-sequencing. Supporting references: [138][146]

Functional variant: a variant that confers a detectable functional impact on the locus. It can represent a change in coding region but also changes in regulatory regions that have an impact on function.

Supporting references: [5]

GWAS: Genome-Wide Association Study is a case-control study design in which most loci in the genome are interrogated for association with a trait (disease) through the use of SNPs, by comparing allele frequencies in cases and controls.

Supporting references: [1]

Haplotype block: linear segments of the genome comprising of co-inherited alleles in the same chromatid.

Supporting references: [147][148]

Homologous recombination: an error-free recombination mechanism that exchanges genetic sequences between homologous loci during meiosis, and utilizes homologous sequences such as the sister-chromatid to promote DNA repair during mitosis.

Supporting references: [148]

Linkage disequilibrium: a nonrandom association between two markers (e.g. SNPs ).

Supporting references: [149][150][147][151].

MicroRNAs: endogenous short (~23 nt) RNAs involved in gene regulation by pairing to mRNAs of protein coding mRNAs.

Supporting references: [152]

Next generation sequencing: a technology to sequence DNA in a massively parallel fashion, therefore sequencing is achieved at a much faster speed and lower cost than traditional methods.

Supporting references: [145]

Non-coding variant: a variant that is located outside of the coding region of a certain locus.

Tagging variant: a variant (SNP) that defines most of the haplotype diversity of a haplotype block.

Supporting references: [147]

Transcriptome: The complete set of transcripts in a cell. In some cases it can also include quantitative data about the amount of individual transcripts.

Supporting references: [87]

r2: square of the correlation coefficient (r) between loci/markers, which is a normalized value for the deviation of the observed frequency of a haplotype from the expected frequency.

RNA-Seq: a method to obtain genome-wide transcription map using deep sequencing technologies to generate short sequence reads (30-400 bp). It reveals a transcriptional profile and levels of expression for each gene.

Supporting references: [87]

SNP: single nucleotide polymorphism


Figure 1

The genomic context in which a variant is found can be used as preliminary functional analysis. Figure shows the genomic context distribution of disease- and trait-associated SNPs annotated in the Catalog of Genome-Wide Association Studies [2] as of December 9th, 2010. Most of the SNPs are located in intergenic and intronic positions, but a small percentage are located upstream and downstream of genes, as well as in regulatory regions and splice sites. SNPs in these locations can be analyzed in more detail using more specific bioinformatics tools.

Table 1. Methods for functional validation/enabling technologies



Host/ Applications

Advantages/ disadvantages

Technical challenges, References, and Resources

Homologous recombination

Modification of cell’s or organism’s genotype to assess allele-specific effects using targeted recombination vectors

Cell lines: Manipulation of cell’s genotype by introduction/changes in specific SNPs into cell lines (e.g. colorectal cancer HCT116 cell line).

Cell lines can be used for biochemistry studies, drug response screens, short and long term cell biological assays (e.g. apoptosis, survival fraction assays, soft agar growth assays).

Limitations include the inability to assess effects from the in vivo milieu or developmentally-restricted effects.

Relatively low frequency of integration necessitating the screening of a large number of clones. Efficiency of integration may be gene specific as it depends on the structure of the homology fragments flanking the target locus. While a targeting vector with a G418r (or Hygr) cassette driven by the phosphoglycerate kinase promoter can be effectively used to target regulatory regions, it is expected to increase the number of random events

References and resources: [153][154][155][156][157] _ENREF_2[158][159][160][161]



Mouse models: Generation of transgenic mice for tissue-specific and/or doxycycline regulated expression of target genes. Gene replacement and knockouts in animals (Cre recombinase-mediated).

Allows to study the effects of associated SNPs or associated genes in development and in the context of the whole organisms.

Ability to assess effects of the microenvironment and paracrine effects.

Mouse physiology not always similar to human. In particular some tumor types and precursor lesions may be different in humans. Up to 35% of knocked-out GWAS-implicated loci display perinatal lethality, thus preventing evaluation of the pertubated pathophysiological mechanism.

Recombinant techniques for the production of BAC transgenes and gene targeting constructs allow faster production of mouse models for functional genomic studies. Tissue specific expression depends on availability of tissue-specific promoters to drive Cre expression. Standardized, agnostic phenotypic screens in knockout models are lacking; thus, reported effects in mouse models do not follow the "hypothes-free" approach introduced by GWAS.

References and resources: [122] 


Gene targeting by zinc finger nucleases

DNA-binding nucleases that can be used for genomic editing. It creates DSBs at specified sites which are normally repaired by error-prone NHEJ. A transfected template including a mutation favors homology-directed recombination leading to a knock-in of the desired mutation.

Cell lines: Manipulation of cell’s genotype by introduction/changes in specific SNPs into cell lines (e.g. myelogenous leukaemia K562 cell line).

Same as described for cell lines (see above)

The generation of a DSB leads to a high frequency of modified clones. Challenges include limited availability of off-the-shelf cell lines and very high cost.

References and resources: [162][163]





in ' Rat models: Rapid and targeted gene knockouts in ES cells enables the development of transgenic rats.

Advantages include several listed for mouse models. In addition, the anatomy and physiology of certain organs (e.g. prostate) is closer to humans than the mouse.

Animal models are expensive to make and to maintain , even though the Zinc Finger technology is usually less expensive than traditional knock-out .

References and resources: [164]



Other organisms

Zinc Finger technology can be used to introduce knockouts in organisms where traditional knock-out techniques are difficult or impossible to apply, as in Zebrafish [165]

Same as described for Rat models (see above)

RNA interference

Silencing of gene expression using plasmid-encoded short hairpin RNAs (shRNAs) or double stranded RNA (dsRNAs)

Cell lines: Manipulation of expression levels of associated genes to monitor its effects.

Easier and faster than homologous recombination. Lentiviral/adenoviral delivery generates high transduction efficiencies and can be done transiently or stably. Results may depend on the effectiveness of shRNA. While there are clear advantages of using stable cell lines they may adapt to the knock down by acquiring additional mutations or the stable knock down might be incompatible with viability. Thus, both approaches should be exploited.

There are now several commercially available sources of validated shRNAs, developed by the RNAi Consortium, cloned in lentiviral vectors that can be packaged by 293FT into replication-defective lentivirus. Thoughtful controls should be applied to rule out off-target effects.

References and resources: [166]



Artificially synthesized molecule with redesign of natural nucleic acid structure that is used to modify gene expression.

Embryos, organs or eggs manipulation, from Zebrafish, mouse and other organisms

Very convenient techniques for developmental studies on embryonic stage, or in vitro organ. much more stable than conventional SiRNA techniques.

Morpholino oligomers block small (~25 base) regions of the base-pairing surfaces of ribonucleic acid (RNA).

References and resources:



Ectopic gene over-expression

Transient or stable transfection mediated by plasmid or lentiviral expression vectors.

Cell lines: Modulation of expression levels achieved often using constructs with strong promoters (e.g. CMV) but tissue-specific and inducible promoters can also be used to modulate gene expression.

Fast, easy, economical and reproducible. However, it is often difficult to obtain an expression comparable to physiologic levels or in a cell cycle-specific pattern. There may be artifactual effects resulting from over-expression.

Analysis should be performed in pools as well as in multiple clonal isolates to control for clonal variation or adaptation.

References and resources: [168]



Multiplex miniaturized technology consisting of an arrayed series of thousands of microscopic spots (probes); they can be small regions of a gene used to hybridize a DNA or a RNA sample quantified by the detection of fluorescence.

Sources: DNA or RNA from any biological sources (cell lines, biopsies, blood or other body fluids). Applications: Gene expression profiles, Comparative Genomic Hybridisation, epigenetic modification, SNPs analysis and Aternative Splicing detection at the genome scale.

Fast and genome-wide application but require different controls to avoid the occurrence of false positive or negative data. Deep bioinformatic analyses are also needed.

According to the type of the array, quality composition of the slides must be carefully designed. Some additional specific experiments are often required to validate the results obtained. References and resources: [169]


High throughput next-generation sequencing of cDNA libraries generated either from the entire set of cellular steady-state RNA or its subsets.

Applications: (i) Discovery of novel RNA species including long non-coding RNA and small regulatory RNA, as well as new RNA isoforms originating from known loci; (ii) whole-transcriptome expression profiling including eQTL discovery; (iii) targeted transcription profiling within GWAS regions.

Advantages: Eliminates dependence on probes used in Microarray applications; enables quantitative profiling of the entire transcriptome and its subsets. Higher dynamic range and sensitivity. Higher power to distinguish smaller changes in gene expression. Capacity for detecting allele-specific expression.

Enables direct detection of RNA splice isoforms. Genetic variations including SNPs, CNVs and translocations canbe detected in the same experiment.

Disadvantages: High costs prohibits large scale eQTL projects.

Although RNA-seq allows for potentially limitless dynamic range there is a strong effect of diminishing return for detection of RNA with low abundance, as very large portion of sequence reads is taken up by most abundant RNA transcripts. Transcriptome partitioning utilizing either enrichment or depletion techniques (e.g. rRNA depletion or small RNA enrichment) largely alleviates this problem.

Data analysis methodologies are rapidly maturing but still represent a significant challenge. [170] [87] [171] [172]

Transcription factors


c-Myc, KLF4, Oct4, Sox2 and possibly others could transform cells, via retroviral/adenoviral/plasmid, into ES-like stem cells (iPSC), enabling experimentation in these cell lines.Cell lines. Manipulation of cells' genotype by introducing specific SNPs and consequent observation.

Can be studied for apoptosis, tumor formation etc.


Of uncertain yield.

Pharmacologic intervention of epigenetic control

By using histone deacetylase inhibitors (HDAC inhibitors) like all-trans retinoic acid (ATRA) and DNA methyltransferase inhibitors.

Cell lines and animal models. Retinoic acid induces epigenetic modification.




Table 2. Computational methods to characterize risk loci function


Method or list of resources

Description and References



SNP prioritization


- FitSNP







MaCH [173] and others

Imputation of genotypes using a reference panel of known haplotypes




Improve power in GWA study Meta-analysis of GWA studies Also useful for fine-mapping of functional variant

Reference dataset and follow-up by genotyping needed

databases for general SNP /variant annotation [174]

- the Single Nucleotide Polymorphism database (dbSNP;   [175])

- the Database of Genomic Variants Archive (DGVa;  ; [176] )

- the Database of genomic structural variation (dbVar;  )

- SNPedia(  ), Wiki-based community annotation on SNPs


consulting the general information known about the SNP or the variant involved allows to identify eventual known associations with other phenotypes and the distribution of alleles in other populations.

This should be the first step before proceeding with further analysis. Spending more time consulting the known information about the variant can save time later and avoid unnecessary analysis.

Limited by the quality of the annotations available. Errors in the annotations are possible and difficult to spot.

databases or tools for known associations between SNP/variant and phenotypes

- the European Genome-phenome Archive (EGA;  )

- the Database of Genotype and Phenotype (dbGaP ;  )

- Human Genome Variation Database  

- Varietas [177]


A multiple indexed database (HGVRD)

in Japan  

- PharmacoGenomics Knowledge Base (PharmaKGB,  )

- HuGeNavigator ( )

Many public archives or tools allow to verify whether the variant to be characterized has already been involved in other Genome Wide Association Studies.

Can provide an hint on the function of the variant and the region involved. Since most GWAS are made with commercial arrays, consulting previous literature can provide good controls for further analysis.

Limited by the quality of the data available . .


Genome Graphs   in the UCSC Genome Browser  

Visualization tool enabling exploration of genomic region of variants, in a general genome browsing framework. [178]

Provides extensive context including genes, expression and regulatory data.

Requires that data sets have been mapped to UCSC Genome Browser; varies by assembly.

cancer-specific resources

BioMart   for the International Cancer Genome Consortium  

The Data Coordination Center (DCC) for the ICGC project [179] distributes the data on the web with a BioMart query interface. [180]

Access to normal and tumor genome data is provided for numerous cancers.

A fairly new project, data is currently flowing in but is not completely available for all sets at this time .


The Cancer Genome Workbench (CGWB)  

A web-based tool that provides an integrated display of many informative data types such as mutation, copy-number variation, expression microRNA and methylation.  

Provides a range of data types with emphasis on cancer project data sets.

Exploration of the underlying data sets at the original source sites may provide additional details.

analysis of the literature


Gene Relationships Among Implicated Loci combines the regional assessment of genes in a SNP location with text-mining of research papers about those genes to highlight possible significant terms. [31]

Offers a method to explore the literature by genomic region.

Relies on literature completeness, and may bias towards more characterized genes. Can be fooled by genes with similar names

eQTL resources

Genome-wide identification and analysis of eQTLs

eQTL browser from Jonathan Pritchard's lab   [75]


Starting either from a SNP or a gene, these resources allow to identify SNP vs. gene expression associations based on extensive, genome-wide datasets.

Limited transferability of eQTLs between different cell lines and tissues.


resources Epigenomics   database at NCBI

The Epigenomics resource can be used to explore genomic regions of interest for DNA methylation, histone modification and chromatin structure data.

Various epigenomics data types have been integrated from various sample types and can be browsed and searched.

Relies on the curation and availability of data into the collection.


microRNA resources microRNA tools

TargetScan   and  -

web-based resources containing information about miRNA targets that can be used to asses function of non-coding SNPs located within 3'UTRs.
Easy to use pre-compiled and pre-computed sets of data cataloging miRNAs, their expression, and target sites.

miRNA target prediction algorithms that rely on evolutionary sequence conservation can not be applied to recently evolved miRNA (e.g. human- or primate-specific)

prediction of SNP function







Identifies functional sites in proteins by comparing observed substitution patterns in multiple sequence alignment with the expected pattern expected due purely to the structural environment. of the protein. [36]

Does not require prior knowledge of functional site of protein.


predicting the effects of non-synonymous SNPs [32] [33]


A Graph theoretic measure for estimation of structural and pathological impacts of non-synonymous SNP [39].  





A statistical potential energy function developed to predict the effect that mutations have on the stability of proteins.   [181].






An algorithm taking a query sequence and using multiple alignment information to predict tolerant and deleterious substitutions for every position of the query sequence [182] [183] [184]





A tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. [185]






A statistical potential approach for the computer-aided design of mutant proteins with controlled stability properties. It evaluates the changes in stability of a given protein or peptide under single-site mutations, on the basis of the protein's structure. [186]






A neural network method or SVN method that can be used to predict whether a mutation is stabilizing or destabilizing.   [187]






A decision tree with the SVM-based classifier coupled to the SVM-Profile trained on sequence profile information. [189]  





A machine-learning approach based on support vector machines (SVMs) to predict the stability changes for single site mutations.  









  1. Genomewide association studies and assessment of the risk of disease. Manolio, T.A. N. Engl. J. Med (2010)
  2. A Catalog of Published Genome-Wide Association Studies. Hindorff LA, J.H., Hall PN, Mehta JP, and Manolio TA.  
  3. Genome-wide association study identifies novel breast cancer susceptibility loci. Easton, D.F., Pooley, K.A., Dunning, A.M., Pharoah, P.D., Thompson, D., Ballinger, D.G., Struewing, J.P., Morrison, J., Field, H., Luben, R., Wareham, N., Ahmed, S., Healey, C.S., Bowman, R., Meyer, K.B., Haiman, C.A., Kolonel, L.K., Henderson, B.E., Le, M.L., Brennan, P., Sangrajrang, S., Gaborieau, V., Odefrey, F., Shen, C.Y., Wu, P.E., Wang, H.C., Eccles, D., Evans, D.G., Peto, J., Fletcher, O., Johnson, N., Seal, S., Stratton, M.R., Rahman, N., Chenevix-Trench, G., Bojesen, S.E., Nordestgaard, B.G., Axelsson, C.K., Garcia-Closas, M., Brinton, L., Chanock, S., Lissowska, J., Peplonska, B., Nevanlinna, H., Fagerholm, R., Eerola, H., Kang, D., Yoo, K.Y., Noh, D.Y., Ahn, S.H., Hunter, D.J., Hankinson, S.E., Cox, D.G., Hall, P., Wedren, S., Liu, J., Low, Y.L., Bogdanova, N., Schurmann, P., Dork, T., Tollenaar, R.A., Jacobi, C.E., Devilee, P., Klijn, J.G., Sigurdson, A.J., Doody, M.M., Alexander, B.H., Zhang, J., Cox, A., Brock, I.W., MacPherson, G., Reed, M.W., Couch, F.J., Goode, E.L., Olson, J.E., Meijers-Heijboer, H., van den Ouweland, A., Uitterlinden, A., Rivadeneira, F., Milne, R.L., Ribas, G., Gonzalez-Neira, A., Benitez, J., Hopper, J.L., McCredie, M., Southey, M., Giles, G.G., Schroen, C., Justenhoven, C., Brauch, H., Hamann, U., Ko, Y.D., Spurdle, A.B., Beesley, J., Chen, X., Mannermaa, A., Kosma, V.M., Kataja, V., Hartikainen, J., Day, N.E., Cox, D.R., Ponder, B.A. Nature (2007)
  4. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., Manolio, T.A. Proc. Natl. Acad. Sci. U. S. A (2009)
  5. Functional enhancers at the gene-poor 8q24 cancer-linked locus. Jia, L., Landan, G., Pomerantz, M., Jaschek, R., Herman, P., Reich, D., Yan, C., Khalid, O., Kantoff, P., Oh, W., Manak, J.R., Berman, B.P., Henderson, B.E., Frenkel, B., Haiman, C.A., Freedman, M., Tanay, A., Coetzee, G.A. PLoS. Genet (2009)
  6. Genome-wide association studies: potential next steps on a genetic journey. McCarthy, M.I., Hirschhorn, J.N. Hum. Mol. Genet. (2008) [Pubmed]
  7. On beyond GWAS.  Nat. Genet (2010)
  8. Genomic convergence of genome-wide investigations for complex traits. Kitsios, G.D., Zintzaras, E. Ann. Hum. Genet. (2009) [Pubmed]
  9. Agreement among type 2 diabetes linkage studies but a poor correlation with results from genome-wide association studies. Lillioja, S., Wilton, A. Diabetologia. (2009) [Pubmed]
  10. Finding genes that underlie complex traits. Glazier, A.M., Nadeau, J.H., Aitman, T.J. Science (2002)
  11. Genome-wide association studies in cancer. Easton, D.F., Eeles, R.A. Human. Molecular. Genetics (2008)
  12. Common variation in KITLG and at 5q31.3 predisposes to testicular germ cell cancer. Kanetsky, P.A., Mitra, N., Vardhanabhuti, S., Li, M., Vaughn, D.J., Letrero, R., Ciosek, S.L., Doody, D.R., Smith, L.M., Weaver, J., Albano, A., Chen, C., Starr, J.R., Rader, D.J., Godwin, A.K., Reilly, M.P., Hakonarson, H., Schwartz, S.M., Nathanson, K.L. Nat. Genet (2009)
  13. A map of human genome variation from population-scale sequencing. Durbin, R.M., Abecasis, G.R., Altshuler, D.L., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., McVean, G.A. Nature. (2010) [Pubmed]
  14. A second generation human haplotype map of over 3.1 million SNPs. WikiGenes. Article
  15. Genotype imputation for genome-wide association studies. Marchini, J., Howie, B. Nat. Rev. Genet. (2010) [Pubmed]
  16. Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Liu, J.Z., Tozzi, F., Waterworth, D.M., Pillai, S.G., Muglia, P., Middleton, L., Berrettini, W., Knouff, C.W., Yuan, X., Waeber, G., Vollenweider, P., Preisig, M., Wareham, N.J., Zhao, J.H., Loos, R.J., Barroso, I., Khaw, K.T., Grundy, S., Barter, P., Mahley, R., Kesaniemi, A., McPherson, R., Vincent, J.B., Strauss, J., Kennedy, J.L., Farmer, A., McGuffin, P., Day, R., Matthews, K., Bakke, P., Gulsvik, A., Lucae, S., Ising, M., Brueckl, T., Horstmann, S., Wichmann, H.E., Rawal, R., Dahmen, N., Lamina, C., Polasek, O., Zgaga, L., Huffman, J., Campbell, S., Kooner, J., Chambers, J.C., Burnett, M.S., Devaney, J.M., Pichard, A.D., Kent, K.M., Satler, L., Lindsay, J.M., Waksman, R., Epstein, S., Wilson, J.F., Wild, S.H., Campbell, H., Vitart, V., Reilly, M.P., Li, M., Qu, L., Wilensky, R., Matthai, W., Hakonarson, H.H., Rader, D.J., Franke, A., Wittig, M., Schäfer, A., Uda, M., Terracciano, A., Xiao, X., Busonero, F., Scheet, P., Schlessinger, D., St Clair, D., Rujescu, D., Abecasis, G.R., Grabe, H.J., Teumer, A., Völzke, H., Petersmann, A., John, U., Rudan, I., Hayward, C., Wright, A.F., Kolcic, I., Wright, B.J., Thompson, J.R., Balmforth, A.J., Hall, A.S., Samani, N.J., Anderson, C.A., Ahmad, T., Mathew, C.G., Parkes, M., Satsangi, J., Caulfield, M., Munroe, P.B., Farrall, M., Dominiczak, A., Worthington, J., Thomson, W., Eyre, S., Barton, A., Mooser, V., Francks, C., Marchini, J. Nat. Genet. (2010) [Pubmed]
  17. Genome-wide association studies: hypothesis-"free" or "engaged"?. Kitsios, G.D., Zintzaras, E. Transl. Res. (2009) [Pubmed]
  18. Integrated detection and population-genetic analysis of SNPs and copy number variation. McCarroll, S.A., Kuruvilla, F.G., Korn, J.M., Cawley, S., Nemesh, J., Wysoker, A., Shapero, M.H., de Bakker, P.I., Maller, J.B., Kirby, A., Elliott, A.L., Parkin, M., Hubbell, E., Webster, T., Mei, R., Veitch, J., Collins, P.J., Handsaker, R., Lincoln, S., Nizzari, M., Blume, J., Jones, K.W., Rava, R., Daly, M.J., Gabriel, S.B., Altshuler, D. Nat. Genet. (2008) [Pubmed]
  19. Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. Estivill, X., Armengol, L. PLoS. Genet. (2007) [Pubmed]
  20. A New Method to Reconstruct Recombination Events at a Genomic Scale. Melé, M., Javed, A., Pybus, M., Calafell, F., Parida, L., Bertranpetit, J. PLoS. Comput. Biol. (2010) [Pubmed]
  21. Genome-wide association studies in diverse populations. Rosenberg, N.A., Huang, L., Jewett, E.M., Szpiech, Z.A., Jankovic, I., Boehnke, M. Nat. Rev. Genet. (2010) [Pubmed]
  22. In search of causal variants: refining disease association signals using cross-population contrasts. Saccone, N.L., Saccone, S.F., Goate, A.M., Grucza, R.A., Hinrichs, A.L., Rice, J.P., Bierut, L.J. BMC. Genet (2008) [Pubmed]
  23. Bioinformatic tools for identifying disease gene and SNP candidates. Mooney, S.D., Krishnan, V.G., Evani, U.S. Methods. Mol. Biol. (2010) [Pubmed]
  24. FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease. Chen, R., Morgan, A.A., Dudley, J., Deshpande, T., Li, L., Kodama, K., Chiang, A.P., Butte, A.J. Genome. Biol. (2008) [Pubmed]
  25. SPOT: a web-based tool for using biological databases to prioritize SNPs after a genome-wide association study. Saccone, S.F., Bolze, R., Thomas, P., Quan, J., Mehta, G., Deelman, E., Tischfield, J.A., Rice, J.P. Nucleic. Acids. Res. (2010) [Pubmed]
  26. The power of protein interaction networks for associating genes with diseases. Navlakha, S., Kingsford, C. Bioinformatics. (2010) [Pubmed]
  27. Candidate-gene approaches for studying complex genetic traits: practical considerations. Tabor, H.K., Risch, N.J., Myers, R.M. Nat. Rev. Genet. (2002) [Pubmed]
  28. Candidate gene identification approach: progress and challenges. Zhu, M., Zhao, S. Int. J. Biol. Sci. (2007) [Pubmed]
  29. Protein interactions and disease: computational approaches to uncover the etiology of diseases. Kann, M.G. Brief. Bioinform. (2007) [Pubmed]
  30. CYP1A2 genetic polymorphisms are associated with treatment response to the antidepressant paroxetine. Lin, K.M., Tsou, H.H., Tsai, I.J., Hsiao, M.C., Hsiao, C.F., Liu, C.Y., Shen, W.W., Tang, H.S., Fang, C.K., Wu, C.S., Lu, S.C., Kuo, H.W., Liu, S.C., Chan, H.W., Hsu, Y.T., Tian, J.N., Liu, Y.L. Pharmacogenomics. (2010) [Pubmed]
  31. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. Raychaudhuri, S., Plenge, R.M., Rossin, E.J., Ng, A.C., Purcell, S.M., Sklar, P., Scolnick, E.M., Xavier, R.J., Altshuler, D., Daly, M.J. PLoS. Genet. (2009) [Pubmed]
  32. Predicting the effects of amino acid substitutions on protein function. Ng, P.C., Henikoff, S. Annu. Rev. Genomics. Hum. Genet. (2006) [Pubmed]
  33. Genome bioinformatic analysis of nonsynonymous SNPs. Burke, D.F., Worth, C.L., Priego, E.M., Cheng, T., Smink, L.J., Todd, J.A., Blundell, T.L. BMC. Bioinformatics. (2007) [Pubmed]
  34. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Overington, J., Donnelly, D., Johnson, M.S., Sali, A., Blundell, T.L. Protein. Sci. (1992) [Pubmed]
  35. Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Topham, C.M., Srinivasan, N., Blundell, T.L. Protein. Eng. (1997) [Pubmed]
  36. Distinguishing structural and functional restraints in evolution in order to identify interaction sites. Chelliah, V., Chen, L., Blundell, T.L., Lovell, S.C. J. Mol. Biol. (2004) [Pubmed]
  37. BIPA: a database for protein-nucleic acid interaction in 3D structures. Lee, S., Blundell, T.L. Bioinformatics. (2009) [Pubmed]
  38. CREDO: a protein-ligand interaction database for drug discovery. Schreyer, A., Blundell, T. Chem. Biol. Drug. Des. (2009) [Pubmed]
  39. Prediction by graph theoretic measures of structural effects in proteins arising from non-synonymous single nucleotide polymorphisms. Cheng, T.M., Lu, Y.E., Vendruscolo, M., Lio', P., Blundell, T.L. PLoS. Comput. Biol. (2008) [Pubmed]
  40. A structural bioinformatics approach to the analysis of nonsynonymous single nucleotide polymorphisms (nsSNPs) and their relation to disease. Worth, C.L., Bickerton, G.R., Schreyer, A., Forman, J.R., Cheng, T.M., Lee, S., Gong, S., Burke, D.F., Blundell, T.L. J. Bioinform. Comput. Biol. (2007) [Pubmed]
  41. Structural bioinformatics mutation analysis reveals genotype-phenotype correlations in von Hippel-Lindau disease and suggests molecular mechanisms of tumorigenesis. Forman, J.R., Worth, C.L., Bickerton, G.R., Eisen, T.G., Blundell, T.L. Proteins. (2009) [Pubmed]
  42. Computational methods for ab initio and comparative gene finding. Picardi, E., Pesole, G. Methods. Mol. Biol. (2010) [Pubmed]
  43. Prediction of genomic functional elements. Jones, S.J. Annu. Rev. Genomics. Hum. Genet. (2006) [Pubmed]
  44. Computational detection and location of transcription start sites in mammalian genomic DNA. Down, T.A., Hubbard, T.J. Genome. Res. (2002) [Pubmed]
  45. Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Ohler, U., Niemann, H., Liao, G.c., Rubin, G.M. Bioinformatics. (2001) [Pubmed]
  46. Predicting Pol II promoter sequences using transcription factor binding sites. Prestridge, D.S. J. Mol. Biol. (1995) [Pubmed]
  47. The TRANSFAC system on gene expression regulation. Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhäuser, R., Prüss, M., Schacherer, F., Thiele, S., Urbach, S. Nucleic. Acids. Res. (2001) [Pubmed]
  48. An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Smith, P.J., Zhang, C., Wang, J., Chew, S.L., Zhang, M.Q., Krainer, A.R. Hum. Mol. Genet. (2006) [Pubmed]
  49. Predictive identification of exonic splicing enhancers in human genes. Fairbrother, W.G., Yeh, R.F., Sharp, P.A., Burge, C.B. Science. (2002) [Pubmed]
  50. Systematic identification and analysis of exonic splicing silencers. Wang, Z., Rolish, M.E., Yeo, G., Tung, V., Mawson, M., Burge, C.B. Cell. (2004) [Pubmed]
  51. PupaSuite: finding functional single nucleotide polymorphisms for large-scale genotyping purposes. Conde, L., Vaquerizas, J.M., Dopazo, H., Arbiza, L., Reumers, J., Rousseau, F., Schymkowitz, J., Dopazo, J. Nucleic. Acids. Res. (2006) [Pubmed]
  52. SNP Function Portal: a web database for exploring the function implication of SNP alleles. Wang, P., Dai, M., Xuan, W., McEachin, R.C., Jackson, A.U., Scott, L.J., Athey, B., Watson, S.J., Meng, F. Bioinformatics. (2006) [Pubmed]
  53. Pathway analysis by adaptive combination of P-values. Yu, K., Li, Q., Bergen, A.W., Pfeiffer, R.M., Rosenberg, P.S., Caporaso, N., Kraft, P., Chatterjee, N. Genet. Epidemiol. (2009) [Pubmed]
  54. Analysing biological pathways in genome-wide association studies. Wang, K., Li, M., Hakonarson, H. Nat. Rev. Genet. (2010) [Pubmed]
  55. Pathway-Based Approaches for Analysis of Genomewide Association Studies. Wang, K., Li, M., Bucan, M. Am. J. Hum. Genet. (2007) [Pubmed]
  56. Biological pathway-based genome-wide association analysis identified the vasoactive intestinal peptide (VIP) pathway important for obesity. Liu, Y.J., Guo, Y.F., Zhang, L.S., Pei, Y.F., Yu, N., Yu, P., Papasian, C.J., Deng, H.W. Obesity. (Silver. Spring). (2010) [Pubmed]
  57. Gene, region and pathway level analyses in whole-genome studies. De la Cruz, O., Wen, X., Ke, B., Song, M., Nicolae, D.L. Genet. Epidemiol. (2010) [Pubmed]
  58. KEGG for representation and analysis of molecular networks involving diseases and drugs. Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., Hirakawa, M. Nucleic. Acids. Res. (2010) [Pubmed]
  59. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Caspi, R., Altman, T., Dale, J.M., Dreher, K., Fulcher, C.A., Gilham, F., Kaipa, P., Karthikeyan, A.S., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L.A., Paley, S., Popescu, L., Pujar, A., Shearer, A.G., Zhang, P., Karp, P.D. Nucleic. Acids. Res. (2010) [Pubmed]
  60. Reactome: a database of reactions, pathways and biological processes. Croft, D., O'Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., Jupe, S., Kataskaya, I., Mahajan, S., May, B., Ndegwa, N., Schmidt, E., Shamovsky, V., Yung, C., Birney, E., Hermjakob, H., D'Eustachio, P., Stein, L. Nucleic. Acids. Res. (2010) [Pubmed]
  61. PID: the Pathway Interaction Database. Schaefer, C.F., Anthony, K., Krupa, S., Buchoff, J., Day, M., Hannay, T., Buetow, K.H. Nucleic. Acids. Res. (2009) [Pubmed]
  62. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A. BMC. Bioinformatics (2006) [Pubmed]
  63. Methods for the inference of biological pathways and networks. Bumgarner, R.E., Yeung, K.Y. Methods. Mol. Biol (2009) [Pubmed]
  64. Single nucleotide polymorphism characterization by mRNA expression imbalance assessment. Wojnowski, L., Brockmöller, J. Pharmacogenetics. (2004) [Pubmed]
  65. An integration of genome-wide association study and gene expression profiling to prioritize the discovery of novel susceptibility Loci for osteoporosis-related traits. Hsu, Y.H., Zillikens, M.C., Wilson, S.G., Farber, C.R., Demissie, S., Soranzo, N., Bianchi, E.N., Grundberg, E., Liang, L., Richards, J.B., Estrada, K., Zhou, Y., van Nas, A., Moffatt, M.F., Zhai, G., Hofman, A., van Meurs, J.B., Pols, H.A., Price, R.I., Nilsson, O., Pastinen, T., Cupples, L.A., Lusis, A.J., Schadt, E.E., Ferrari, S., Uitterlinden, A.G., Rivadeneira, F., Spector, T.D., Karasik, D., Kiel, D.P. PLoS. Genet. (2010) [Pubmed]
  66. Genetic inheritance of gene expression in human cell lines. Monks, S.A., Leonardson, A., Zhu, H., Cundiff, P., Pietrusiak, P., Edwards, S., Phillips, J.W., Sachs, A., Schadt, E.E. Am. J. Hum. Genet (2004) [Pubmed]
  67. Genetic analysis of genome-wide variation in human gene expression. Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S., Cheung, V.G. Nature (2004) [Pubmed]
  68. Genome-wide associations of gene expression variation in humans. Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., Lyle, R., Hunt, S., Kahl, B., Antonarakis, S.E., Tavare, S., Deloukas, P., Dermitzakis, E.T. PLoS. Genet (2005) [Pubmed]
  69. Genetics of gene expression surveyed in maize, mouse and man. Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G., Linsley, P.S., Mao, M., Stoughton, R.B., Friend, S.H. Nature (2003) [Pubmed]
  70. Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Johnson, J.M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt, E.E., Stoughton, R., Shoemaker, D.D. Science (2003) [Pubmed]
  71. Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Göring, H.H., Curran, J.E., Johnson, M.P., Dyer, T.D., Charlesworth, J., Cole, S.A., Jowett, J.B., Abraham, L.J., Rainwater, D.L., Comuzzie, A.G., Mahaney, M.C., Almasy, L., MacCluer, J.W., Kissebah, A.H., Collier, G.R., Moses, E.K., Blangero, J. Nat. Genet. (2007) [Pubmed]
  72. Genetics of global gene expression. Rockman, M.V., Kruglyak, L. Nat. Rev. Genet (2006) [Pubmed]
  73. Genetics of human gene expression: mapping DNA variants that influence gene expression. Cheung, V.G., Spielman, R.S. Nat. Rev. Genet (2009) [Pubmed]
  74. Population genomics of human gene expression. Stranger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Koller, D., Montgomery, S., Tavaré, S., Deloukas, P., Dermitzakis, E.T. Nat. Genet. (2007) [Pubmed]
  75. High-resolution mapping of expression-QTLs yields insight into human gene regulation. Veyrieras, J.B., Kudaravalli, S., Kim, S.Y., Dermitzakis, E.T., Gilad, Y., Stephens, M., Pritchard, J.K. PLoS. Genet. (2008) [Pubmed]
  76. Evidence for the cure of HIV infection by CCR5{Delta}32/{Delta}32 stem cell transplantation. Allers, K., Hütter, G., Hofmann, J., Loddenkemper, C., Rieger, K., Thiel, E., Schneider, T. Blood. (2010) [Pubmed]
  77. Genetics of HIV-1 infection: chemokine receptor CCR5 polymorphism and its consequences. Carrington, M., Dean, M., Martin, M.P., O'Brien, S.J. Hum. Mol. Genet. (1999) [Pubmed]
  78. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. Nicolae, D.L., Gamazon, E., Zhang, W., Duan, S., Dolan, M.E., Cox, N.J. PLoS. Genet () [Pubmed]
  79. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Moffatt, M.F., Kabesch, M., Liang, L., Dixon, A.L., Strachan, D., Heath, S., Depner, M., von Berg, A., Bufe, A., Rietschel, E., Heinzmann, A., Simma, B., Frischer, T., Willis-Owen, S.A., Wong, K.C., Illig, T., Vogelberg, C., Weiland, S.K., von Mutius, E., Abecasis, G.R., Farrall, M., Gut, I.G., Lathrop, G.M., Cookson, W.O. Nature (2007) [Pubmed]
  80. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Musunuru, K., Strong, A., Frank-Kamenetsky, M., Lee, N.E., Ahfeldt, T., Sachs, K.V., Li, X., Li, H., Kuperwasser, N., Ruda, V.M., Pirruccello, J.P., Muchmore, B., Prokunina-Olsson, L., Hall, J.L., Schadt, E.E., Morales, C.R., Lund-Katz, S., Phillips, M.C., Wong, J., Cantley, W., Racie, T., Ejebe, K.G., Orho-Melander, M., Melander, O., Koteliansky, V., Fitzgerald, K., Krauss, R.M., Cowan, C.A., Kathiresan, S., Rader, D.J. Nature (2010) [Pubmed]
  81. Liver and adipose expression associated SNPs are enriched for association to type 2 diabetes. Zhong, H., Beaulaurier, J., Lum, P.Y., Molony, C., Yang, X., Macneil, D.J., Weingarth, D.T., Zhang, B., Greenawalt, D., Dobrin, R., Hao, K., Woo, S., Fabre-Suver, C., Qian, S., Tota, M.R., Keller, M.P., Kendziorski, C.M., Yandell, B.S., Castro, V., Attie, A.D., Kaplan, L.M., Schadt, E.E. PLoS. Genet (2010) [Pubmed]
  82. Mapping complex disease traits with global gene expression. Cookson, W., Liang, L., Abecasis, G., Moffatt, M., Lathrop, M. Nat. Rev. Genet (2009) [Pubmed]
  83. Mapping the genetic architecture of gene expression in human liver. Schadt, E.E., Molony, C., Chudin, E., Hao, K., Yang, X., Lum, P.Y., Kasarskis, A., Zhang, B., Wang, S., Suver, C., Zhu, J., Millstein, J., Sieberts, S., Lamb, J., GuhaThakurta, D., Derry, J., Storey, J.D., Avila-Campillo, I., Kruger, M.J., Johnson, J.M., Rohl, C.A., van Nas, A., Mehrabian, M., Drake, T.A., Lusis, A.J., Smith, R.C., Guengerich, F.P., Strom, S.C., Schuetz, E., Rushmore, T.H., Ulrich, R. PLoS. Biol (2008) [Pubmed]
  84. Genome-wide allele-specific analysis: insights into regulatory variation. Pastinen, T. Nat. Rev. Genet () [Pubmed]
  85. Transcriptome genetics using second generation sequencing in a Caucasian population. Montgomery, S.B., Sammeth, M., Gutierrez-Arcelus, M., Lach, R.P., Ingle, C., Nisbett, J., Guigo, R., Dermitzakis, E.T. Nature (2010) [Pubmed]
  86. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Pickrell, J.K., Marioni, J.C., Pai, A.A., Degner, J.F., Engelhardt, B.E., Nkadori, E., Veyrieras, J.B., Stephens, M., Gilad, Y., Pritchard, J.K. Nature (2010) [Pubmed]
  87. RNA-Seq: a revolutionary tool for transcriptomics. Wang, Z., Gerstein, M., Snyder, M. Nat. Rev. Genet (2009)
  88. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Gupta, R.A., Shah, N., Wang, K.C., Kim, J., Horlings, H.M., Wong, D.J., Tsai, M.C., Hung, T., Argani, P., Rinn, J.L., Wang, Y., Brzoska, P., Kong, B., Li, R., West, R.B., van de Vijver, M.J., Sukumar, S., Chang, H.Y. Nature () [Pubmed]
  89. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Khalil, A.M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B.E., van Oudenaarden, A., Regev, A., Lander, E.S., Rinn, J.L. Proc. Natl. Acad. Sci. U. S. A (2009) [Pubmed]
  90. The epigenomics of cancer. Jones, P.A., Baylin, S.B. Cell (2007) [Pubmed]
  91. Environmental epigenomics and disease susceptibility. Jirtle, R.L., Skinner, M.K. Nat. Rev. Genet (2007) [Pubmed]
  92. Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia. Raval, A., Tanner, S.M., Byrd, J.C., Angerman, E.B., Perko, J.D., Chen, S.S., Hackanson, B., Grever, M.R., Lucas, D.M., Matkovic, J.J., Lin, T.S., Kipps, T.J., Murray, F., Weisenburger, D., Sanger, W., Lynch, J., Watson, P., Jansen, M., Yoshinaga, Y., Rosenquist, R., de Jong, P.J., Coggill, P., Beck, S., Lynch, H., de la Chapelle, A., Plass, C. Cell (2007) [Pubmed]
  93. Epigenetic regulation of the tumor suppressor gene TCF21 on 6q23-q24 in lung and head and neck cancer. Smith, L.T., Lin, M., Brena, R.M., Lang, J.C., Schuller, D.E., Otterson, G.A., Morrison, C.D., Smiraglia, D.J., Plass, C. Proc. Natl. Acad. Sci. U. S. A (2006) [Pubmed]
  94. Human DNA methylomes at base resolution show widespread epigenomic differences. Lister, R., Pelizzola, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery, J.R., Lee, L., Ye, Z., Ngo, Q.M., Edsall, L., Antosiewicz-Bourget, J., Stewart, R., Ruotti, V., Millar, A.H., Thomson, J.A., Ren, B., Ecker, J.R. Nature (2009) [Pubmed]
  95. Heritable germline epimutation of MSH2 in a family with hereditary nonpolyposis colorectal cancer. Chan, T.L., Yuen, S.T., Kong, C.K., Chan, Y.W., Chan, A.S., Ng, W.F., Tsui, W.Y., Lo, M.W., Tam, W.Y., Li, V.S., Leung, S.Y. Nat. Genet (2006) [Pubmed]
  96. Germline epimutation of MLH1 in individuals with multiple cancers. Suter, C.M., Martin, D.I., Ward, R.L. Nat. Genet (2004) [Pubmed]
  97. Epimutations and cancer predisposition: importance and mechanisms. Hesson, L.B., Hitchins, M.P., Ward, R.L. Curr. Opin. Genet. Dev () [Pubmed]
  98. Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation. Kerkel, K., Spadola, A., Yuan, E., Kosek, J., Jiang, L., Hod, E., Li, K., Murty, V.V., Schupf, N., Vilain, E., Morris, M., Haghighi, F., Tycko, B. Nat.Genet. (2008)
  99. Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Ge, B., Pokholok, D.K., Kwan, T., Grundberg, E., Morcos, L., Verlaan, D.J., Le, J., Koka, V., Lam, K.C., Gagne, V., Dias, J., Hoberman, R., Montpetit, A., Joly, M.M., Harvey, E.J., Sinnett, D., Beaulieu, P., Hamon, R., Graziani, A., Dewar, K., Harmsen, E., Majewski, J., Goring, H.H., Naumova, A.K., Blanchette, M., Gunderson, K.L., Pastinen, T. Nat. Genet (2009) [Pubmed]
  100. Allele-specific DNA methylation: beyond imprinting. Tycko, B. Hum. Mol. Genet (2010) [Pubmed]
  101. Allelic imbalance (AI) identifies novel tissue-specific cis-regulatory variation for human UGT2B15. Sun, C., Southard, C., Witonsky, D.B., Olopade, O.I., Di Rienzo, A. Hum. Mutat (2010) [Pubmed]
  102. RNA regulation of epigenetic processes. Mattick, J.S., Amaral, P.P., Dinger, M.E., Mercer, T.R., Mehler, M.F. Bioessays. (2009) [Pubmed]
  103. Non-coding RNAs: regulators of disease. Taft, R.J., Pang, K.C., Mercer, T.R., Dinger, M., Mattick, J.S. J. Pathol. (2010) [Pubmed]
  104. A piRNA pathway primed by individual transposons is linked to de novo DNA methylation in mice. Aravin, A.A., Sachidanandam, R., Bourc'his, D., Schaefer, C., Pezic, D., Toth, K.F., Bestor, T., Hannon, G.J. Mol. Cell. (2008) [Pubmed]
  105. Genetic variation in microRNA networks: the implications for cancer research. Ryan, B.M., Robles, A.I., Harris, C.C. Nat. Rev. Cancer. (2010) [Pubmed]
  106. Variable Tandem Repeats Accelerate Evolution of Coding and Regulatory Sequences. Gemayel, R., Vinces, M.D., Legendre, M., Verstrepen, K.J. Annu. Rev. Genet () [Pubmed]
  107. Chromosome crosstalk in three dimensions. Göndör, A., Ohlsson, R. Nature. (2009) [Pubmed]
  108. CTCF binding at the H19 imprinting control region mediates maternally inherited higher-order chromatin conformation to restrict enhancer access to Igf2. Kurukuti, S., Tiwari, V.K., Tavoosidana, G., Pugacheva, E., Murrell, A., Zhao, Z., Lobanenkov, V., Reik, W., Ohlsson, R. Proc. Natl. Acad. Sci. U. S. A. (2006) [Pubmed]
  109. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Zhao, Z., Tavoosidana, G., Sjölinder, M., Göndör, A., Mariano, P., Wang, S., Kanduri, C., Lezcano, M., Sandhu, K.S., Singh, U., Pant, V., Tiwari, V., Kurukuti, S., Ohlsson, R. Nat. Genet. (2006) [Pubmed]
  110. Nonallelic transvection of multiple imprinted loci is organized by the H19 imprinting control region during germline development. Sandhu, K.S., Shi, C., Sjölinder, M., Zhao, Z., Göndör, A., Liu, L., Tiwari, V.K., Guibert, S., Emilsson, L., Imreh, M.P., Ohlsson, R. Genes. Dev. (2009) [Pubmed]
  111. A distal single nucleotide polymorphism alters long-range regulation of the PU.1 gene in acute myeloid leukemia. Steidl, U., Steidl, C., Ebralidze, A., Chapuy, B., Han, H.J., Will, B., Rosenbauer, F., Becker, A., Wagner, K., Koschmieder, S., Kobayashi, S., Costa, D.B., Schulz, T., O'Brien, K.B., Verhaak, R.G., Delwel, R., Haase, D., Trümper, L., Krauter, J., Kohwi-Shigematsu, T., Griesinger, F., Tenen, D.G. J. Clin. Invest. (2007) [Pubmed]
  112. The gene encoding R-spondin 4 (RSPO4), a secreted protein implicated in Wnt signaling, is mutated in inherited anonychia. Blaydon, D.C., Ishii, Y., O'Toole, E.A., Unsworth, H.C., Teh, M.T., Rüschendorf, F., Sinclair, C., Hopsu-Havu, V.K., Tidman, N., Moss, C., Watson, R., de Berker, D., Wajid, M., Christiano, A.M., Kelsell, D.P. Nat. Genet. (2006) [Pubmed]
  113. Mutations in ABCA12 underlie the severe congenital skin disease harlequin ichthyosis. Kelsell, D.P., Norgett, E.E., Unsworth, H., Teh, M.T., Cullup, T., Mein, C.A., Dopping-Hepenstal, P.J., Dale, B.A., Tadini, G., Fleckman, P., Stephens, K.G., Sybert, V.P., Mallory, S.B., North, B.V., Witt, D.R., Sprecher, E., Taylor, A.E., Ilchyshyn, A., Kennedy, C.T., Goodyear, H., Moss, C., Paige, D., Harper, J.I., Young, B.D., Leigh, I.M., Eady, R.A., O'Toole, E.A. Am. J. Hum. Genet. (2005) [Pubmed]
  114. Genomewide single nucleotide polymorphism microarray mapping in basal cell carcinomas unveils uniparental disomy as a key somatic event. Teh, M.T., Blaydon, D., Chaplin, T., Foot, N.J., Skoulakis, S., Raghavan, M., Harwood, C.A., Proby, C.M., Philpott, M.P., Young, B.D., Kelsell, D.P. Cancer. Res. (2005) [Pubmed]
  115. FOXM1 upregulation is an early event in human squamous cell carcinoma and it is enhanced by nicotine during malignant transformation. Gemenetzidis, E., Bose, A., Riaz, A.M., Chaplin, T., Young, B.D., Ali, M., Sugden, D., Thurlow, J.K., Cheong, S.C., Teo, S.H., Wan, H., Waseem, A., Parkinson, E.K., Fortune, F., Teh, M.T. PLoS. One. (2009) [Pubmed]
  116. Upregulation of FOXM1 induces genomic instability in human epidermal keratinocytes. Teh, M.T., Gemenetzidis, E., Chaplin, T., Young, B.D., Philpott, M.P. Mol. Cancer. (2010) [Pubmed]
  117. Depletion of BRCA1 impairs differentiation but enhances proliferation of mammary epithelial cells. Furuta, S., Jiang, X., Gu, B., Cheng, E., Chen, P.L., Lee, W.H. Proc. Natl. Acad. Sci. U. S. A. (2005) [Pubmed]
  118. Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers. Lim, E., Vaillant, F., Wu, D., Forrest, N.C., Pal, B., Hart, A.H., Asselin-Labat, M.L., Gyorki, D.E., Ward, T., Partanen, A., Feleppa, F., Huschtscha, L.I., Thorne, H.J., Fox, S.B., Yan, M., French, J.D., Brown, M.A., Smyth, G.K., Visvader, J.E., Lindeman, G.J. Nat. Med. (2009) [Pubmed]
  119. BRCA1 basal-like breast cancers originate from luminal epithelial progenitors and not from basal stem cells. Molyneux, G., Geyer, F.C., Magnay, F.A., McCarthy, A., Kendrick, H., Natrajan, R., Mackay, A., Grigoriadis, A., Tutt, A., Ashworth, A., Reis-Filho, J.S., Smalley, M.J. Cell. Stem. Cell. (2010) [Pubmed]
  120. A locus on 19p13 modifies risk of breast cancer in BRCA1 mutation carriers and is associated with hormone receptor-negative breast cancer in the general population. Antoniou, A.C., Wang, X., Fredericksen, Z.S., McGuffog, L., Tarrell, R., Sinilnikova, O.M., Healey, S., Morrison, J., Kartsonaki, C., Lesnick, T., Ghoussaini, M., Barrowdale, D., Peock, S., Cook, M., Oliver, C., Frost, D., Eccles, D., Evans, D.G., Eeles, R., Izatt, L., Chu, C., Douglas, F., Paterson, J., Stoppa-Lyonnet, D., Houdayer, C., Mazoyer, S., Giraud, S., Lasset, C., Remenieras, A., Caron, O., Hardouin, A., Berthet, P., Hogervorst, F.B., Rookus, M.A., Jager, A., van den Ouweland, A., Hoogerbrugge, N., van der Luijt, R.B., Meijers-Heijboer, H., Gómez García, E.B., Devilee, P., Vreeswijk, M.P., Lubinski, J., Jakubowska, A., Gronwald, J., Huzarski, T., Byrski, T., Górski, B., Cybulski, C., Spurdle, A.B., Holland, H., Goldgar, D.E., John, E.M., Hopper, J.L., Southey, M., Buys, S.S., Daly, M.B., Terry, M.B., Schmutzler, R.K., Wappenschmidt, B., Engel, C., Meindl, A., Preisler-Adams, S., Arnold, N., Niederacher, D., Sutter, C., Domchek, S.M., Nathanson, K.L., Rebbeck, T., Blum, J.L., Piedmonte, M., Rodriguez, G.C., Wakeley, K., Boggess, J.F., Basil, J., Blank, S.V., Friedman, E., Kaufman, B., Laitman, Y., Milgrom, R., Andrulis, I.L., Glendon, G., Ozcelik, H., Kirchhoff, T., Vijai, J., Gaudet, M.M., Altshuler, D., Guiducci, C., Loman, N., Harbst, K., Rantala, J., Ehrencrona, H., Gerdes, A.M., Thomassen, M., Sunde, L., Peterlongo, P., Manoukian, S., Bonanni, B., Viel, A., Radice, P., Caldes, T., de la Hoya, M., Singer, C.F., Fink-Retter, A., Greene, M.H., Mai, P.L., Loud, J.T., Guidugli, L., Lindor, N.M., Hansen, T.V., Nielsen, F.C., Blanco, I., Lazaro, C., Garber, J., Ramus, S.J., Gayther, S.A., Phelan, C., Narod, S., Szabo, C.I., Benitez, J., Osorio, A., Nevanlinna, H., Heikkinen, T., Caligo, M.A., Beattie, M.S., Hamann, U., Godwin, A.K., Montagna, M., Casella, C., Neuhausen, S.L., Karlan, B.Y., Tung, N., Toland, A.E., Weitzel, J., Olopade, O., Simard, J., Soucy, P., Rubinstein, W.S., Arason, A., Rennert, G., Martin, N.G., Montgomery, G.W., Chang-Claude, J., Flesch-Janys, D., Brauch, H., Severi, G., Baglietto, L., Cox, A., Cross, S.S., Miron, P., Gerty, S.M., Tapper, W., Yannoukakos, D., Fountzilas, G., Fasching, P.A., Beckmann, M.W., Dos Santos Silva, I., Peto, J., Lambrechts, D., Paridaens, R., Rüdiger, T., Försti, A., Winqvist, R., Pylkäs, K., Diasio, R.B., Lee, A.M., Eckel-Passow, J., Vachon, C., Blows, F., Driver, K., Dunning, A., Pharoah, P.P., Offit, K., Pankratz, V.S., Hakonarson, H., Chenevix-Trench, G., Easton, D.F., Couch, F.J. Nat. Genet. (2010) [Pubmed]
  121. Induction of Human Epithelial Stem/Progenitor Expansion by FOXM1. Gemenetzidis, E., Elena-Costea, D., Parkinson, E.K., Waseem, A., Wan, H., Teh, M.T. Cancer. Res. (2010) [Pubmed]
  122. Laboratory mouse models for the human genome-wide associations. Kitsios, G.D., Tangri, N., Castaldi, P.J., Ioannidis, J.P. PLoS. One. (2010) [Pubmed]
  123. Environmental and genetic activation of a brain-adipocyte BDNF/leptin axis causes cancer remission and inhibition. Cao, L., Liu, X., Lin, E.J., Wang, C., Choi, E.Y., Riban, V., Lin, B., During, M.J. Cell. (2010) [Pubmed]
  124. Males still dominate animal studies. Zucker, I., Beery, A.K. Nature. (2010) [Pubmed]
  125. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Heintzman, N.D., Hon, G.C., Hawkins, R.D., Kheradpour, P., Stark, A., Harp, L.F., Ye, Z., Lee, L.K., Stuart, R.K., Ching, C.W., Ching, K.A., Antosiewicz-Bourget, J.E., Liu, H., Zhang, X., Green, R.D., Lobanenkov, V.V., Stewart, R., Thomson, J.A., Crawford, G.E., Kellis, M., Ren, B. Nature (2009) [Pubmed]
  126. ChIP-seq accurately predicts tissue-specific activity of enhancers. Visel, A., Blow, M.J., Li, Z., Zhang, T., Akiyama, J.A., Holt, A., Plajzer-Frick, I., Shoukry, M., Wright, C., Chen, F., Afzal, V., Ren, B., Rubin, E.M., Pennacchio, L.A. Nature (2009) [Pubmed]
  127. Genomic views of distant-acting enhancers. Visel, A., Rubin, E.M., Pennacchio, L.A. Nature (2009) [Pubmed]
  128. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Birney, E., Stamatoyannopoulos, J.A., Dutta, A., Guigo, R., Gingeras, T.R., Margulies, E.H., Weng, Z., Snyder, M., Dermitzakis, E.T., Thurman, R.E., Kuehn, M.S., Taylor, C.M., Neph, S., Koch, C.M., Asthana, S., Malhotra, A., Adzhubei, I., Greenbaum, J.A., Andrews, R.M., Flicek, P., Boyle, P.J., Cao, H., Carter, N.P., Clelland, G.K., Davis, S., Day, N., Dhami, P., Dillon, S.C., Dorschner, M.O., Fiegler, H., Giresi, P.G., Goldy, J., Hawrylycz, M., Haydock, A., Humbert, R., James, K.D., Johnson, B.E., Johnson, E.M., Frum, T.T., Rosenzweig, E.R., Karnani, N., Lee, K., Lefebvre, G.C., Navas, P.A., Neri, F., Parker, S.C., Sabo, P.J., Sandstrom, R., Shafer, A., Vetrie, D., Weaver, M., Wilcox, S., Yu, M., Collins, F.S., Dekker, J., Lieb, J.D., Tullius, T.D., Crawford, G.E., Sunyaev, S., Noble, W.S., Dunham, I., Denoeud, F., Reymond, A., Kapranov, P., Rozowsky, J., Zheng, D., Castelo, R., Frankish, A., Harrow, J., Ghosh, S., Sandelin, A., Hofacker, I.L., Baertsch, R., Keefe, D., Dike, S., Cheng, J., Hirsch, H.A., Sekinger, E.A., Lagarde, J., Abril, J.F., Shahab, A., Flamm, C., Fried, C., Hackermuller, J., Hertel, J., Lindemeyer, M., Missal, K., Tanzer, A., Washietl, S., Korbel, J., Emanuelsson, O., Pedersen, J.S., Holroyd, N., Taylor, R., Swarbreck, D., Matthews, N., Dickson, M.C., Thomas, D.J., Weirauch, M.T., Gilbert, J., Drenkow, J., Bell, I., Zhao, X., Srinivasan, K.G., Sung, W.K., Ooi, H.S., Chiu, K.P., Foissac, S., Alioto, T., Brent, M., Pachter, L., Tress, M.L., Valencia, A., Choo, S.W., Choo, C.Y., Ucla, C., Manzano, C., Wyss, C., Cheung, E., Clark, T.G., Brown, J.B., Ganesh, M., Patel, S., Tammana, H., Chrast, J., Henrichsen, C.N., Kai, C., Kawai, J., Nagalakshmi, U., Wu, J., Lian, Z., Lian, J., Newburger, P., Zhang, X., Bickel, P., Mattick, J.S., Carninci, P., Hayashizaki, Y., Weissman, S., Hubbard, T., Myers, R.M., Rogers, J., Stadler, P.F., Lowe, T.M., Wei, C.L., Ruan, Y., Struhl, K., Gerstein, M., Antonarakis, S.E., Fu, Y., Green, E.D., Karaoz, U., Siepel, A., Taylor, J., Liefer, L.A., Wetterstrand, K.A., Good, P.J., Feingold, E.A., Guyer, M.S., Cooper, G.M., Asimenos, G., Dewey, C.N., Hou, M., Nikolaev, S., Montoya-Burgos, J.I., Loytynoja, A., Whelan, S., Pardi, F., Massingham, T., Huang, H., Zhang, N.R., Holmes, I., Mullikin, J.C., Ureta-Vidal, A., Paten, B., Seringhaus, M., Church, D., Rosenbloom, K., Kent, W.J., Stone, E.A., Batzoglou, S., Goldman, N., Hardison, R.C., Haussler, D., Miller, W., Sidow, A., Trinklein, N.D., Zhang, Z.D., Barrera, L., Stuart, R., King, D.C., Ameur, A., Enroth, S., Bieda, M.C., Kim, J., Bhinge, A.A., Jiang, N., Liu, J., Yao, F., Vega, V.B., Lee, C.W., Ng, P., Shahab, A., Yang, A., Moqtaderi, Z., Zhu, Z., Xu, X., Squazzo, S., Oberley, M.J., Inman, D., Singer, M.A., Richmond, T.A., Munn, K.J., Rada-Iglesias, A., Wallerman, O., Komorowski, J., Fowler, J.C., Couttet, P., Bruce, A.W., Dovey, O.M., Ellis, P.D., Langford, C.F., Nix, D.A., Euskirchen, G., Hartman, S., Urban, A.E., Kraus, P., Van Calcar, S., Heintzman, N., Kim, T.H., Wang, K., Qu, C., Hon, G., Luna, R., Glass, C.K., Rosenfeld, M.G., Aldred, S.F., Cooper, S.J., Halees, A., Lin, J.M., Shulha, H.P., Zhang, X., Xu, M., Haidar, J.N., Yu, Y., Ruan, Y., Iyer, V.R., Green, R.D., Wadelius, C., Farnham, P.J., Ren, B., Harte, R.A., Hinrichs, A.S., Trumbower, H., Clawson, H., Hillman-Jackson, J., Zweig, A.S., Smith, K., Thakkapallayil, A., Barber, G., Kuhn, R.M., Karolchik, D., Armengol, L., Bird, C.P., de Bakker, P.I., Kern, A.D., Lopez-Bigas, N., Martin, J.D., Stranger, B.E., Woodroffe, A., Davydov, E., Dimas, A., Eyras, E., Hallgrimsdottir, I.B., Huppert, J., Zody, M.C., Abecasis, G.R., Estivill, X., Bouffard, G.G., Guan, X., Hansen, N.F., Idol, J.R., Maduro, V.V., Maskeri, B., McDowell, J.C., Park, M., Thomas, P.J., Young, A.C., Blakesley, R.W., Muzny, D.M., Sodergren, E., Wheeler, D.A., Worley, K.C., Jiang, H., Weinstock, G.M., Gibbs, R.A., Graves, T., Fulton, R., Mardis, E.R., Wilson, R.K., Clamp, M., Cuff, J., Gnerre, S., Jaffe, D.B., Chang, J.L., Lindblad-Toh, K., Lander, E.S., Koriabine, M., Nefedov, M., Osoegawa, K., Yoshinaga, Y., Zhu, B., de Jong, P.J. Nature (2007) [Pubmed]
  129. ChIP-Seq identification of weakly conserved heart enhancers. Blow, M.J., McCulley, D.J., Li, Z., Zhang, T., Akiyama, J.A., Holt, A., Plajzer-Frick, I., Shoukry, M., Wright, C., Chen, F., Afzal, V., Bristow, J., Ren, B., Black, B.L., Rubin, E.M., Visel, A., Pennacchio, L.A. Nat. Genet (2010) [Pubmed]
  130. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Kunarso, G., Chia, N.Y., Jeyakani, J., Hwang, C., Lu, X., Chan, Y.S., Ng, H.H., Bourque, G. Nat. Genet (2010) [Pubmed]
  131. A systematic approach to understand the functional consequences of non-protein coding risk regions. Coetzee, G.A., Jia, L., Frenkel, B., Henderson, B.E., Tanay, A., Haiman, C.A., Freedman, M.L. Cell. Cycle (2010) [Pubmed]
  132. 8q24 prostate, breast, and colon cancer risk loci show tissue-specific long-range interaction with MYC. Ahmadiyeh, N., Pomerantz, M.M., Grisanzio, C., Herman, P., Jia, L., Almendro, V., He, H.H., Brown, M., Liu, X.S., Davis, M., Caswell, J.L., Beckwith, C.A., Hills, A., Macconaill, L., Coetzee, G.A., Regan, M.M., Freedman, M.L. Proc. Natl. Acad. Sci. U. S. A (2010) [Pubmed]
  133. An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Wasserman, N.F., Aneas, I., Nobrega, M.A. Genome. Res (2010) [Pubmed]
  134. The three 'C' s of chromosome conformation capture: controls, controls, controls. Dekker, J. Nat. Methods (2006)
  135. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B., de Laat, W. Nat. Genet (2006)
  136. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Zhao, Z., Tavoosidana, G., Sjolinder, M., Gondor, A., Mariano, P., Wang, S., Kanduri, C., Lezcano, M., Sandhu, K.S., Singh, U., Pant, V., Tiwari, V., Kurukuti, S., Ohlsson, R. Nat. Genet (2006)
  137. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., Sandstrom, R., Bernstein, B., Bender, M.A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L.A., Lander, E.S., Dekker, J. Science (2009)
  138. Statistical false positive or true disease pathway? Todd, J.A. Nat. Genet (2006)
  139. ChIP-seq: welcome to the new frontier. Mardis, E.R. Nat. Methods (2007)
  140. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A., Thiessen, N., Griffith, O.L., He, A., Marra, M., Snyder, M., Jones, S. Nat. Methods (2007)
  141. Detection of large-scale variation in the human genome. Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W., Lee, C. Nat Genet. (2004)
  142. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Lee, C., Iafrate, A.J., Brothman, A.R. Nat. Genet (2007)
  143. Large-Scale Copy Number Polymorphism in the Human Genome. Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi, M., Navin, N., Lucito, R., Healy, J., Hicks, J., Ye, K., Reiner, A., Gilliam, T.C., Trask, B., Patterson, N., Zetterberg, A., Wigler, M. Science (2004)
  144. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Jaenisch, R., Bird, A. Nat. Genet (2003)
  145. Covalent histone modifications--miswritten, misinterpreted and mis-erased in human cancers. Chi, P., Allis, C.D., Wang, G.G. Nat. Rev. Cancer (2010)
  146. Common genetic variants for breast cancer: 32 largely refuted candidates and larger prospects. Ioannidis, J.P. J Natl.Cancer Inst. (2006)
  147. Human evolutionary genetics : origins, peoples & disease. Jobling, M.A., Hurles, M., Tyler-Smith, C.  (2004)
  148. Introduction to Genetic Analysis. Griffiths, A.J.F., Wessler, S.R., Lewontin, R.C., Gelbart, W.M., Suzuki, D.T., Miller, J.H.  (2005)
  149. Genetic mapping in human disease. Altshuler, D., Daly, M.J., Lander, E.S. Science (2008)
  150. The dictionary of cell biology. Lackie, J.M., Dow, J.A.T., Blackshaw, S.E.  (1995)
  151. Linkage disequilibrium in the human genome. Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R., Lander, E.S. Nature (2001)
  152. MicroRNAs: target recognition and regulatory functions. Bartel, D.P. Cell (2009)
  153. Replacement of normal with mutant alleles in the genome of normal human cells unveils mutation-specific drug responses. Di Nicolantonio, F., Arena, S., Gallicchio, M., Zecchin, D., Martini, M., Flonta, S.E., Stella, G.M., Lamba, S., Cancelliere, C., Russo, M., Geuna, M., Appendino, G., Fantozzi, R., Medico, E., Bardelli, A. Proc. Natl. Acad. Sci. U. S. A (2008)
  154. Requirement for p53 and p21 to sustain G2 arrest after DNA damage. Bunz, F., Dutriaux, A., Lengauer, C., Waldman, T., Zhou, S., Brown, J.P., Sedivy, J.M., Kinzler, K.W., Vogelstein, B. Science (1998)
  155. Targeted inactivation of p53 in human cells does not result in aneuploidy. Bunz, F., Fauth, C., Speicher, M.R., Dutriaux, A., Sedivy, J.M., Kinzler, K.W., Vogelstein, B., Lengauer, C. Cancer. Res (2002)
  156. Human cancer cells require ATR for cell cycle progression following exposure to ionizing radiation. Hurley, P.J., Wilsker, D., Bunz, F. Oncogene (2007)
  157. The Chk2 tumor suppressor is not required for p53 responses in human cancer cells. Jallepalli, P.V., Lengauer, C., Vogelstein, B., Bunz, F. J Biol.Chem. (2003)
  158. Genetic knockouts and knockins in human somatic cells. Rago, C., Vogelstein, B., Bunz, F. Nat Protoc. (2007)
  159. Improved methods for the generation of human gene knockout and knockin cell lines. Topaloglu, O., Hurley, P.J., Yildirim, O., Civin, C.I., Bunz, F. Nucleic. Acids. Res (2005)
  160. The nuclear function of p53 is required for PUMA-mediated apoptosis induced by DNA damage. Wang, P., Yu, J., Zhang, L. Proc. Natl. Acad. Sci. U. S. A (2007)
  161. Facile methods for generating human somatic cell gene knockouts using recombinant adeno-associated viruses. Kohli, M., Rago, C., Lengauer, C., Kinzler, K.W., Vogelstein, B. Nucleic. Acids. Res (2004)
  162. An improved zinc-finger nuclease architecture for highly specific genome editing. Miller, J.C., Holmes, M.C., Wang, J., Guschin, D.Y., Lee, Y.L., Rupniewski, I., Beausejour, C.M., Waite, A.J., Wang, N.S., Kim, K.A., Gregory, P.D., Pabo, C.O., Rebar, E.J. Nat Biotechnol. (2007)
  163. Targeted gene addition into a specified location in the human genome using designed zinc finger nucleases. Moehle, E.A., Rock, J.M., Lee, Y.L., Jouvenot, Y., DeKelver, R.C., Gregory, P.D., Urnov, F.D., Holmes, M.C. Proc. Natl. Acad. Sci. U. S. A (2007)
  164. Knockout rats via embryo microinjection of zinc-finger nucleases. Geurts, A.M., Cost, G.J., Freyvert, Y., Zeitler, B., Miller, J.C., Choi, V.M., Jenkins, S.S., Wood, A., Cui, X., Meng, X., Vincent, A., Lam, S., Michalkiewicz, M., Schilling, R., Foeckler, J., Kalloway, S., Weiler, H., Menoret, S., Anegon, I., Davis, G.D., Zhang, L., Rebar, E.J., Gregory, P.D., Urnov, F.D., Jacob, H.J., Buelow, R. Science (2009)
  165. Targeted gene inactivation in zebrafish using engineered zinc-finger nucleases. Meng, X., Noyes, M.B., Zhu, L.J., Lawson, N.D., Wolfe, S.A. Nat. Biotechnol. (2008) [Pubmed]
  166. Biobanks. Population databases boom, from Iceland to the U.S. Kaiser, J. Science (2002) [Pubmed]
  167. Morpholino oligos: making sense of antisense?. Heasman, J. Dev. Biol. (2002) [Pubmed]
  168. A Third-Generation Lentivirus Vector with a Conditional Packaging System. Dull, T., Zufferey, R., Kelly, M., Mandel, R.J., Nguyen, M., Trono, D., Naldini, L. The. Journal. of. Virology (1998)
  169. GWAS meets microarray: are the results of genome-wide association studies and gene-expression profiling consistent? Prostate cancer as an example. Gorlov, I.P., Gallick, G.E., Gorlova, O.Y., Amos, C., Logothetis, C.J. PLoS. One. (2009) [Pubmed]
  170. Computation for ChIP-seq and RNA-seq studies. Pepke, S., Wold, B., Mortazavi, A. Nat. Methods. (2009) [Pubmed]
  171. Alternative expression analysis by RNA sequencing. Griffith, M., Griffith, O.L., Mwenifumbo, J., Goya, R., Morrissy, A.S., Morin, R.D., Corbett, R., Tang, M.J., Hou, Y.C., Pugh, T.J., Robertson, G., Chittaranjan, S., Ally, A., Asano, J.K., Chan, S.Y., Li, H.I., McDonald, H., Teague, K., Zhao, Y., Zeng, T., Delaney, A., Hirst, M., Morin, G.B., Jones, S.J., Tai, I.T., Marra, M.A. Nat. Methods. (2010) [Pubmed]
  172. Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Cirulli, E.T., Singh, A., Shianna, K.V., Ge, D., Smith, J.P., Maia, J.M., Heinzen, E.L., Goedert, J.J., Goldstein, D.B. Genome. Biol. (2010) [Pubmed]
  173. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R. Genet. Epidemiol. (2010) [Pubmed]
  174. Human variation databases. Küntzer, J., Eggle, D., Klostermann, S., Burtscher, H. Database. (Oxford). (2010) [Pubmed]
  175. dbSNP: the NCBI database of genetic variation. Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K. Nucleic. Acids. Res. (2001) [Pubmed]
  176. Public data archives for genomic structural variation. Church, D.M., Lappalainen, I., Sneddon, T.P., Hinton, J., Maguire, M., Lopez, J., Garner, J., Paschall, J., DiCuccio, M., Yaschenko, E., Scherer, S.W., Feuk, L., Flicek, P. Nat. Genet. (2010) [Pubmed]
  177. Varietas: a functional variation database portal. Paananen, J., Ciszek, R., Wong, G. Database. (Oxford). (2010) [Pubmed]
  178. The UCSC Genome Browser Database: 2008 update. Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., Kober, K.M., Miller, W., Pedersen, J.S., Pohl, A., Raney, B.J., Rhead, B., Rosenbloom, K.R., Smith, K.E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A.S., Haussler, D., Kent, W.J. Nucleic. Acids. Res. (2008) [Pubmed]
  179. International network of cancer genome projects. Hudson, T.J., Anderson, W., Artez, A., Barker, A.D., Bell, C., Bernabé, R.R., Bhan, M.K., Calvo, F., Eerola, I., Gerhard, D.S., Guttmacher, A., Guyer, M., Hemsley, F.M., Jennings, J.L., Kerr, D., Klatt, P., Kolar, P., Kusada, J., Lane, D.P., Laplace, F., Youyong, L., Nettekoven, G., Ozenberger, B., Peterson, J., Rao, T.S., Remacle, J., Schafer, A.J., Shibata, T., Stratton, M.R., Vockley, J.G., Watanabe, K., Yang, H., Yuen, M.M., Knoppers, B.M., Bobrow, M., Cambon-Thomsen, A., Dressler, L.G., Dyke, S.O., Joly, Y., Kato, K., Kennedy, K.L., Nicolás, P., Parker, M.J., Rial-Sebbag, E., Romeo-Casabona, C.M., Shaw, K.M., Wallace, S., Wiesner, G.L., Zeps, N., Lichter, P., Biankin, A.V., Chabannon, C., Chin, L., Clément, B., de Alava, E., Degos, F., Ferguson, M.L., Geary, P., Hayes, D.N., Hudson, T.J., Johns, A.L., Kasprzyk, A., Nakagawa, H., Penny, R., Piris, M.A., Sarin, R., Scarpa, A., Shibata, T., van de Vijver, M., Futreal, P.A., Aburatani, H., Bayés, M., Botwell, D.D., Campbell, P.J., Estivill, X., Gerhard, D.S., Grimmond, S.M., Gut, I., Hirst, M., López-Otín, C., Majumder, P., Marra, M., McPherson, J.D., Nakagawa, H., Ning, Z., Puente, X.S., Ruan, Y., Shibata, T., Stratton, M.R., Stunnenberg, H.G., Swerdlow, H., Velculescu, V.E., Wilson, R.K., Xue, H.H., Yang, L., Spellman, P.T., Bader, G.D., Boutros, P.C., Campbell, P.J., Flicek, P., Getz, G., Guigó, R., Guo, G., Haussler, D., Heath, S., Hubbard, T.J., Jiang, T., Jones, S.M., Li, Q., López-Bigas, N., Luo, R., Muthuswamy, L., Ouellette, B.F., Pearson, J.V., Puente, X.S., Quesada, V., Raphael, B.J., Sander, C., Shibata, T., Speed, T.P., Stein, L.D., Stuart, J.M., Teague, J.W., Totoki, Y., Tsunoda, T., Valencia, A., Wheeler, D.A., Wu, H., Zhao, S., Zhou, G., Stein, L.D., Guigó, R., Hubbard, T.J., Joly, Y., Jones, S.M., Kasprzyk, A., Lathrop, M., López-Bigas, N., Ouellette, B.F., Spellman, P.T., Teague, J.W., Thomas, G., Valencia, A., Yoshida, T., Kennedy, K.L., Axton, M., Dyke, S.O., Futreal, P.A., Gerhard, D.S., Gunter, C., Guyer, M., Hudson, T.J., McPherson, J.D., Miller, L.J., Ozenberger, B., Shaw, K.M., Kasprzyk, A., Stein, L.D., Zhang, J., Haider, S.A., Wang, J., Yung, C.K., Cross, A., Liang, Y., Gnaneshan, S., Guberman, J., Hsu, J., Bobrow, M., Chalmers, D.R., Hasel, K.W., Joly, Y., Kaan, T.S., Kennedy, K.L., Knoppers, B.M., Lowrance, W.W., Masui, T., Nicolás, P., Rial-Sebbag, E., Rodriguez, L.L., Vergely, C., Yoshida, T., Grimmond, S.M., Biankin, A.V., Bowtell, D.D., Cloonan, N., deFazio, A., Eshleman, J.R., Etemadmoghadam, D., Gardiner, B.A., Kench, J.G., Scarpa, A., Sutherland, R.L., Tempero, M.A., Waddell, N.J., Wilson, P.J., McPherson, J.D., Gallinger, S., Tsao, M.S., Shaw, P.A., Petersen, G.M., Mukhopadhyay, D., Chin, L., DePinho, R.A., Thayer, S., Muthuswamy, L., Shazand, K., Beck, T., Sam, M., Timms, L., Ballin, V., Lu, Y., Ji, J., Zhang, X., Chen, F., Hu, X., Zhou, G., Yang, Q., Tian, G., Zhang, L., Xing, X., Li, X., Zhu, Z., Yu, Y., Yu, J., Yang, H., Lathrop, M., Tost, J., Brennan, P., Holcatova, I., Zaridze, D., Brazma, A., Egevard, L., Prokhortchouk, E., Banks, R.E., Uhlén, M., Cambon-Thomsen, A., Viksna, J., Ponten, F., Skryabin, K., Stratton, M.R., Futreal, P.A., Birney, E., Borg, A., Børresen-Dale, A.L., Caldas, C., Foekens, J.A., Martin, S., Reis-Filho, J.S., Richardson, A.L., Sotiriou, C., Stunnenberg, H.G., Thoms, G., van de Vijver, M., van't Veer, L., Calvo, F., Birnbaum, D., Blanche, H., Boucher, P., Boyault, S., Chabannon, C., Gut, I., Masson-Jacquemier, J.D., Lathrop, M., Pauporté, I., Pivot, X., Vincent-Salomon, A., Tabone, E., Theillet, C., Thomas, G., Tost, J., Treilleux, I., Calvo, F., Bioulac-Sage, P., Clément, B., Decaens, T., Degos, F., Franco, D., Gut, I., Gut, M., Heath, S., Lathrop, M., Samuel, D., Thomas, G., Zucman-Rossi, J., Lichter, P., Eils, R., Brors, B., Korbel, J.O., Korshunov, A., Landgraf, P., Lehrach, H., Pfister, S., Radlwimmer, B., Reifenberger, G., Taylor, M.D., von Kalle, C., Majumder, P.P., Sarin, R., Rao, T.S., Bhan, M.K., Scarpa, A., Pederzoli, P., Lawlor, R.A., Delledonne, M., Bardelli, A., Biankin, A.V., Grimmond, S.M., Gress, T., Klimstra, D., Zamboni, G., Shibata, T., Nakamura, Y., Nakagawa, H., Kusada, J., Tsunoda, T., Miyano, S., Aburatani, H., Kato, K., Fujimoto, A., Yoshida, T., Campo, E., López-Otín, C., Estivill, X., Guigó, R., de Sanjosé, S., Piris, M.A., Montserrat, E., González-Díaz, M., Puente, X.S., Jares, P., Valencia, A., Himmelbaue, H., Quesada, V., Bea, S., Stratton, M.R., Futreal, P.A., Campbell, P.J., Vincent-Salomon, A., Richardson, A.L., Reis-Filho, J.S., van de Vijver, M., Thomas, G., Masson-Jacquemier, J.D., Aparicio, S., Borg, A., Børresen-Dale, A.L., Caldas, C., Foekens, J.A., Stunnenberg, H.G., van't Veer, L., Easton, D.F., Spellman, P.T., Martin, S., Barker, A.D., Chin, L., Collins, F.S., Compton, C.C., Ferguson, M.L., Gerhard, D.S., Getz, G., Gunter, C., Guttmacher, A., Guyer, M., Hayes, D.N., Lander, E.S., Ozenberger, B., Penny, R., Peterson, J., Sander, C., Shaw, K.M., Speed, T.P., Spellman, P.T., Vockley, J.G., Wheeler, D.A., Wilson, R.K., Hudson, T.J., Chin, L., Knoppers, B.M., Lander, E.S., Lichter, P., Stein, L.D., Stratton, M.R., Anderson, W., Barker, A.D., Bell, C., Bobrow, M., Burke, W., Collins, F.S., Compton, C.C., DePinho, R.A., Easton, D.F., Futreal, P.A., Gerhard, D.S., Green, A.R., Guyer, M., Hamilton, S.R., Hubbard, T.J., Kallioniemi, O.P., Kennedy, K.L., Ley, T.J., Liu, E.T., Lu, Y., Majumder, P., Marra, M., Ozenberger, B., Peterson, J., Schafer, A.J., Spellman, P.T., Stunnenberg, H.G., Wainwright, B.J., Wilson, R.K., Yang, H. Nature. (2010) [Pubmed]
  180. BioMart Central Portal--unified access to biological data. Haider, S., Ballester, B., Smedley, D., Zhang, J., Rice, P., Kasprzyk, A. Nucleic. Acids. Res. (2009) [Pubmed]
  181. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. Topham, C.M., McLeod, A., Eisenmenger, F., Overington, J.P., Johnson, M.S., Blundell, T.L. J. Mol. Biol. (1993) [Pubmed]
  182. SIFT: Predicting amino acid changes that affect protein function. Ng, P.C., Henikoff, S. Nucleic. Acids. Res. (2003) [Pubmed]
  183. Predicting deleterious amino acid substitutions. Ng, P.C., Henikoff, S. Genome. Res. (2001) [Pubmed]
  184. Accounting for human polymorphisms predicted to affect protein function. Ng, P.C., Henikoff, S. Genome. Res. (2002) [Pubmed]
  185. Human non-synonymous SNPs: server and survey. Ramensky, V., Bork, P., Sunyaev, S. Nucleic. Acids. Res. (2002) [Pubmed]
  186. Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Dehouck, Y., Grosfils, A., Folch, B., Gilis, D., Bogaerts, P., Rooman, M. Bioinformatics. (2009) [Pubmed]
  187. A neural-network-based method for predicting protein stability changes upon single point mutations. Capriotti, E., Fariselli, P., Casadio, R. Bioinformatics. (2004) [Pubmed]
  188. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Capriotti, E., Fariselli, P., Casadio, R. Nucleic. Acids. Res. (2005) [Pubmed]
  189. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Capriotti, E., Calabrese, R., Casadio, R. Bioinformatics. (2006) [Pubmed]
  190. Prediction of protein stability changes for single-site mutations using support vector machines. Cheng, J., Randall, A., Baldi, P. Proteins. (2006) [Pubmed]
WikiGenes - Universities