ContextMap 2.0

Sequencing of RNA (RNA-seq) using next generation sequencing has become the standard approach for profiling the transcriptomic state of a cell. This requires mapping of the sequencing reads to determine their transcriptomic origin.
Recently, we developed a context-based mapping approach, ContextMap, which determines the most likely origin of a read by evaluating the context of the read in the form of alignments of other reads to the same genomic region. In the original implementation, the focus was on improving initial mappings provided by other mapping tools.

Here, we present ContextMap 2.0, an extension of the original ContextMap method, which can be used as a standalone tool without relying on initial mappings by other tools. We show that it yields highly accurate read mappings and is very robust against sequencing errors. The design of ContextMap 2.0 allows for massively parallelized data processing, resulting in reasonable running times despite the higher complexity of the context-based approach.

ContextMap 2.0 is an open source software project and released under the Artistic software License.

Methods

The standalone ContextMap 2.0 algorithm consists of five major steps (see figure below):

(A) In the first step, ContextMap 2.0 aligns reads to a given reference genome using a modified Bowtie version that performs alignments in forward and backward direction to also detect reads from exon-exon junctions (split reads).

(B) These initial alignments are then used to calculate contexts, defined as reads originating from the same stretch of genome. For this purpose, ContextMap clusters read alignments based on their genomic starting position, allowing multiple alignments of reads. Extension of alignments and identification of the most likely mapping for each read is then performed independently for each context (steps C-D) with integration performed only in the last step (E). This strategy allows ContextMap to make heavy use of multi-core machines by processing many contexts in parallel.

(C) Furthermore, a large number of additional candidate alignments can be created for each read with only little influence on runtime. Here, ContextMap creates all possible alignments for each read satisfying the maximum mismatch criterion, including e.g. additional split read alignments derived from full read alignments that overlap with a previously identified splice site.

(D) + (E) Resolution of the many ambiguous alignments for each read is performed in steps (D) and (E), first within contexts (D) and subsequently between contexts (E). Both of these resolution steps are based on a scoring scheme that takes into account the number of reads aligned within and around a particular read alignment. If a transcriptome annotation is provided, ContextMap prefers candidate split read alignments corresponding to known junctions. In both (D) and (E) the alignment with the highest support score is chosen for each read instead of simply the alignment with the minimum number of mismatches, resulting in a unique mapping for each read first within each context (step D) and finally across all contexts (E).

Publications

Thomas Bonfert, Evelyn Kirner, Gergely Csaba, Ralf Zimmer, Caroline C. Friedel, ContextMap 2: fast and accurate context-based RNA-seq mapping., BMC bioinformatics, vol. 16, pp. 122, 2015.

Thomas Bonfert, Gergely Csaba, Ralf Zimmer, Caroline C. Friedel, Mining RNA-Seq Data for Infections and Contaminations, PloS one, vol. 8, pp. e73071, 2013.

Thomas Bonfert, Gergely Csaba, Ralf Zimmer, Caroline C. Friedel, A context-based approach to identify the most likely mapping for RNA-seq experiments, BMC Bioinformatics, vol. 13(Suppl 6), pp. S9, 2012.

Downloads

Current release version: 2.7.9
Latest changes:

Prediction of polyA cleavage sites from RNAseq data (manuscript in preparation)
Support of soft-clipped alignments
ContextMap works with unmodified versions of BWA, Bowtie 1 and Bowtie 2
Detection of reads spanning over an arbitrary number of exons

ContextMap v2.7.9	contextmap_v2.7.9.zip
ContextMap v2.7.9 Source	contextmap_source_v2.7.9.zip
Manual	Manual
In silico data sets	in_silico_sets.tar.gz
Supplementary Material	Supplementary Material