Background: A recent large-scale analysis of Gene Expression Omnibus (GEO) data found evidence for spatial defects in a substantial fraction of Affymetrix microarrays in the GEO. Nevertheless, in contrast to quality assessment artefact detection is not widely used in standard gene expression analysis pipelines. Furthermore, although approaches have been proposed to detect diverse types of spatial noise on arrays, the correction of these artefacts is usually left to either summarization methods or the corresponding arrays are completely discarded.
Results: We show that state-of-the-art robust summarization procedures are vulnerable to artefacts on arrays and cannot appropriately correct for these. To address this problem, we present a simple approach to detect artefacts with high recall and precision which we further improve by taking into account the spatial layout of arrays. Finally, we propose a correction method for these artefacts which substitutes values of defective probes using probeset information. We show that this approach can correct defective probe measurements appropriately.
Conclusions: While summarization is insufficient to correct for defective probes, this problem can be addressed in a straightforward way by the methods we present for identification and correction of defective probes. As these methods output CEL files with corrected probe values that serve as input to standard normalization and summarization procedures, they can be easily integrated into existing microarray analysis pipelines as an extra step.Software download