A State of Dis”array”: Two Recent Studies Highlight Common Problems Associated with Using Illumina’s 450k Array for Epigenetics Research
Epigenetic analysis is one of the hottest areas in all of biological research. Covalent modifications to DNA, proteins, and RNA, without changing the primary sequence of these molecules, are known to epigenetically regulate numerous cellular processes and contribute to many important human disease phenotypes. One of the most intensely studied epigenetic modifications is DNA methylation due to its relatively stable nature and the numerous tools and technologies available to investigate its levels. Many researchers use the Illumina Infinium HumanMethylation450 BeadChip Array (450k array) for genome-wide studies of DNA methylation levels because it is relatively cheap and covers many previously identified sites of DNA methylation. However, the 450k array also carries some major disadvantages. For example, it is only available for human samples, does not allow investigation outside of the pre-designed probe set, and does not allow simultaneous interrogation of both genetic mutations and epigenetic modifications. A pair of recent publications highlight additional technical problems with the probe design and analysis methods used by the Illumina 450k array and are described below.
First, a report from the laboratory of Rosanna Weksberg and colleagues demonstrated that many of the probes in the Illumina 450k array either cross-hybridize to non-targeted genomic regions or target loci that contain known single nucleotide polymorphisms (SNPs), both of which interfere with analysis of DNA methylation levels. The probes showing cross-reactivity were discovered in a study designed to investigate sex-specific differences in DNA methylation patterns in a cohort of control patients in the Assessment of Risk for Colorectal Cancers Tumours in Canada (ARCTIC) study. There were 16,532 autosomal CpG sites that showed significant differences in DNA methylation between males and females. Closer investigation revealed that sequences homologous to the autosomal loci targeted by the probes were also present on the sex (X and Y) chromosomes, suggesting that the observed gender-specific autosomal differences could be attributed to cross-reactivity of the probes used in the study. The authors identified a total of 29,233 probes that have a high likelihood of cross-hybridizing to regions of the genome other than their specific target. Furthermore, the researchers identified probes targeting loci with known SNPs and found that nearly half (49.3%) overlap at least one SNP, and 13.8% of the probes have known SNPs within the targeted CpG site (mutations at either the cytosine or guanine positions). The authors concluded that sites of differential DNA methylation identified using the Illumina 450k array should be confirmed using a second independent assay, such as a Next-Gen sequencing-based approach, especially the sites targeted by the cross-reactive probes or that contain known SNPs.
In a separate study from Columbia University, researchers demonstrated that 450k arrays may produce spurious results stemming from batch effects and use of common pathway analysis software. In their report, Harper et al. analyzed blood samples collected from groups exposed to either high or low arsenic concentrations in their drinking water. Researchers then prepared DNA from the two sample groups and plated them either sequentially according to exposure (Run One) or randomly (Run Two) onto 450k arrays thus introducing a potential source of batch effects to confound downstream analysis. Depending on whether or not the investigators controlled for batch effects using advanced computational tools, the study’s authors found that Run One readings produced hundreds to thousands more differentially methylated CpG sites between the exposure groups relative to Run Two. Furthermore, the researchers identified just 25 sites that were differentially methylated between the high and low arsenic exposure groups in both Run 1 and Run 2. Finally, the correlation between b values used to calculate the extent of methylation at each CpG site in both Run One and Run Two was very poor (r= ~0.3). A parallel result from the study revealed that common pathway analysis software designed to analyze data from gene expression arrays can further confuse results when applied to methylation array data. The authors ran simulations from 100 random CpG sites represented on the Illumina 450k arrays using the popular Ingenuity IPA software. Even though the authors chose random sites for analysis, the software returned significant associations between various diseases and gene pathways, for example cancer and cellular development. Authors attributed their findings to the fact that many of the array’s selected CpG sites are associated with gene loci of particular interest to researchers studying more common diseases rather than being equally distributed to represent global genome CpG methylation.
These reports clearly highlight the need for caution when interpreting studies using the Illumina Infinium HumanMethylation450 BeadChip Array. One notion that is gaining traction in the research community is to use bisulfite sequencing, including genome-wide Next-Gen sequencing-based approaches, rather than the Illumina 450k array, for DNA methylation studies. The plummeting costs of Next-Gen sequencing for epigenetic analysis, together with its benefits over array-based methods, such as the ability to investigate DNA methylation in any species with a reference genome, the ability to simultaneously identify both genetic mutations and epigenetic modifications, and the lack of confinement to pre-determined regions, might outweigh the currently lower cost of the array-based approaches. What do you think? Which methods do you prefer to use for your DNA methylation studies? Does learning about the inherent technical issues with the Illumina 450k array make you question studies that use this method?
Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, & Weksberg R (2013). Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics : official journal of the DNA Methylation Society, 8 (2), 203-9 PMID: 23314698
Harper KN, Peters B, & Gamble MV (2013). Batch Effects and Pathway Analysis: Two Potential Perils in Cancer Studies Involving DNA Methylation Array Analysis. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology PMID: 23629520