Batch correction in DESeq2 10 Entering edit mode AB ▴ 110 @ab-8975 Last seen 17 months ago United States Hi Everyone, I am working on RNA-Seq data. I'm using DESeq2 for my analysis.I have 20 samples from 3 batches. I am testing for 2 conditions, cond1 and cond2. dds<-DESeqDataSetFromMatrix(countData=countTable3,colData=coldata,design = ~cond1*cond2) When i performed PCA, I could clearly see some batch effect. I read in the forum that adding batch to the design in DESeq removes the batch effect. But I am not sure if this is the right way to go about it because I can still see the same batch effect dds<-DESeqDataSetFromMatrix(countData=countTable3,colData=coldata,design = ~batch+cond1*cond2) I tried using Combat but I'm unable to use combat results in DESeq. It gives me the following error. Error in DESeqDataSet(se, design = design, ignoreRank) : I am not sure if removeBatchEffects() function can be used with DESeq. Can someone please help me out here. Thanks rnaseq batch effect deseq2 combat removebatcheffect() • 43k views ADD COMMENT • link updated 7.7 years ago by adam • 0 • written 8.2 years ago by AB ▴ 110 21 Entering edit mode Ryan C. Thompson ★ 7.9k @ryan-c-thompson-5618 Last seen 6 months ago Scripps Research, La Jolla, CA Just to be clear, there's an important difference between removing a batch effect and modelling a batch effect. Including the batch in your design formula will model the batch effect in the regression step, which means that the raw data are not modified (so the batch effect is not removed), but instead the regression will estimate the size of the batch effect and subtract it out when performing all other tests. In addition, the model's residual degrees of freedom will be reduced appropriately to reflect the fact that some degrees of freedom were "spent" modelling the batch effects. This is the preferred approach for any method that is capable of using it (this includes DESeq2). You would only remove the batch effect (e.g. using limma's I don't have experience with ComBat, but I would expect that you run it on log-transformed CPM values, while DESeq2 expects raw counts as input. I couldn't tell you how to properly use the two methods together. ADD COMMENT • link 8.2 years ago Ryan C. Thompson ★ 7.9k Entering edit mode I agree, that is the way DESeq handles batch effects. Is there a way to visualize or diagnose how the batch effects were modeled? For example, I can do I am thinking the some transform of ADD REPLY • link 7.0 years ago ysdel ▴ 40 Entering edit mode You should always put the condition of interest at the end of the design formula for safety (see vignette). You can just use ADD REPLY • link 7.0 years ago Michael Love 41k 1 Entering edit mode Thank you. Well, plotCounts plots the counts for a single gene, so yes, you can see the batch effects and the treatment effect in the counts. But I was thinking of a way to look at the (all/top) genes like plotPCA does and conclude aha! DESeq2 correctly modeled/removed the batch effects so the log2fc/pvalues we are getting are due to the treatment and not the batch effects. ADD REPLY • link 7.0 years ago ysdel ▴ 40 10 Entering edit mode Batch effects are gene-specific, and DESeq2 fits gene-specific coefficients for the batch term. If you want to get an idea how much batch variability contributes to a PCA plot, I've recommended the following approach on the support site before: variance stabilize the counts, apply limma's removeBatchEffect to assay(vsd), then use plotPCA to plot the residuals. An example: Make some simulated data with a batch effect: VST, remove batch effect, then plotPCA: ADD REPLY • link 7.0 years ago Michael Love 41k Entering edit mode Hi, Michael. Can I used the batch removed data in the DESeq2 function? If I can, how should I run it? Using vsd instead of dds? ADD REPLY • link 6.4 years ago ksuasteguic • 0 3 Entering edit mode DESeq() only takes as input the original counts, and this is on purpose. This is the optimal statistical approach. To account for batch, you put the variables at the beginning of the design, e.g. ~batch + condition ADD REPLY • link 6.4 years ago Michael Love 41k Entering edit mode Hi Michael, Could you please state how I can get DEG from the data where we have removed batch effect using VST. Hoping for prompt response. Thanks. ADD REPLY • link 5.0 years ago bansal.ankush13 • 0 Entering edit mode See this answer directly above your comment from 17 months ago: https://support.bioconductor.org/p/76099/#101057 ADD REPLY • link 5.0 years ago Michael Love 41k Entering edit mode Hi Michael, There is a great difference between the normalized count table when using:vsd <- vst(dds)assay(vsd)WITHOUT batch removal compared to the normalized count table generated by DESeq2 using:dds <- estimateSizeFactors(dds); counts(dds, normalized=TRUE) In order to compare the two normalized count tables I use:library(dataCompareR) rCompare(normalizedcounts, normalizedtransformed_counts) What is essentially the great difference between the two normalizations? Thanks ADD REPLY • link 4.2 years ago Matan G. ▴ 50 1 Entering edit mode This is answered below in my reply to Adam’s answer. ADD REPLY • link 4.2 years ago Michael Love 41k Entering edit mode Thank you. By saying "This is just normalization for sequencing depth" you mean when using the following DESeq function:DESeq2 using: dds <- estimateSizeFactors(dds); counts(dds, normalized=TRUE) ? So the other normalization (ie vst(dds)) takes other things into consideration? Best ADD REPLY • link 4.2 years ago Matan G. ▴ 50 Entering edit mode Hello Michael,I am sorry for reviving an old post, but there seems to be an small error in the answer. As Gordon Smith recommends the design matrix should be included in the call to removeBatchEffect, so that the third line should be Hope I got it right... ADD REPLY • link 4.0 years ago Sam ▴ 10 Entering edit mode Sure, I’ll add this to the documentation for unbalanced batch designs. ADD REPLY • link 4.0 years ago Michael Love 41k Entering edit mode One more question removal the batch effect : Which is more correct : to remove the batch effect from the normalized counts and to perform vst on the corrected results, or to transform the normalized counts via vst, and remove the batch effect from the vst transformed results, like in the code above? ADD REPLY • link 4.0 years ago Sam ▴ 10 1 Entering edit mode Using the approach in the vignette (the latter). ADD REPLY • link 4.0 years ago Michael Love 41k 2 Entering edit mode chris.gene ▴ 20 @chrisgene-9766 Last seen 8.0 years ago I used the sva package. It defines surrogate variables which I then included in the analysis. https://www.bioconductor.org/packages/release/bioc/html/sva.html "The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment." ADD COMMENT • link 8.0 years ago chris.gene ▴ 20 Entering edit mode adam • 0 @adam-10826 Last seen 7.6 years ago I believe the "counts" function in DESeq2 will provide the type of normalization you you looking for. Make sure to use the "normalized=TRUE" flag. ADD COMMENT • link 7.7 years ago adam • 0 3 Entering edit mode This is just normalization for sequencing depth. I agree with Ryan's answer here, which is to add the known batches to the design formula. (And if the batches are not known, they can be estimated using the sva or RUVseq packages.) ADD REPLY • link 7.7 years ago Michael Love 41k
some values in assay are negativeremoveBatchEffect
function) if you were going to do some kind of downstream analysis that can't model the batch effects, such as training a classifier.plotPCA(rlog(dds))
to plot the PCA showing the batch effects. I model the effects as ~ condition + replicate
and compute the results. Now I'd like some way to see how the resulting log2fc was only due to the condition and the replicate/batch effects are "separate".coef(dds)
provides what I am thinking of.plotCounts
with intgroup=c("replicate","condition") to split apart replicate effects and condition effects. The batch term fit by DESeq2 will be the mean of the normalized count which are plotted.dds <- makeExampleDESeqDataSet(betaSD=1,interceptMean=10)dds$batch <- factor(rep(c("A","B"),each=6))
vsd <- vst(dds)plotPCA(vsd, "batch")assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$batch)plotPCA(vsd, "batch")
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$batch, design=~Condition)
Login before adding your answer.