Hauptnavigation

druckenprint

Inhaltsbereich

Algorithmic Bioinformatics

Efficient algorithms and space-efficient data structures play an important role in Bioinformatics. Without new, clever, and ingenious algorithmic ideas the recent challenges from modern Life Sciences could not been handled. Thus, the recent progress in Bioinformatics and System Biology relies heavily on these algorithmic concepts. Clearly, the problems covered by Algorithmic Bioinformatics and which we have addressed stem from various areas of Life Sciences.

Research Team

Researchers

Prof. Dr. Volker Heun

Alumni

Succinct Data Structures in Bioinformatics

Range Minimum Queries and Suffix Arrays

Project Website

The Range-Minimum-Query-Problem (RMQ) is to preprocess an array in linear time such that all subsequent queries asking for the position of a minimal element between two specified indices can be obtained in constant time. The first algorithm that never uses more than 2n+o(n) bits has been developed, which does not rely on rank- and select-queries or other succinct data structures. The importance of this result is stressed by simplifying and reducing the space consumption of the Enhanced Suffix Array, while retaining all its capabilities to simulate suffix trees.

Johannes Fischer, Volker Heun, Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays, SIAM Journal on Computing, vol. 40, no. 2, pp. 465-492, 2011.

Compressed Range Minimum Queries

It is shown that for compressible input arrays the RMQ-information can be compressed as well. In particular, the information for RMQs can be stored within the same entropy bounds that are achieved by the currently best schemes for storing the underlying array itself in compressed form, while still being able to access a logarithmic number of contiguous bits in constant time. Evaluations show that the practical space consumptions of the non-compressed schemes scale surprisingly well with their theoretical guarantees, and for compressible input arrays the new compressed schemes can indeed reduce the space, with little or no slowdown in query time.

Johannes Fischer, Volker Heun, Horst Martin Stühler, Practical Entropy-Bounded Schemes for O(1)-Range Minimum Queries, Proceedings of the 2008 Data Compression Conference (DCC’08) (J.A. Storer, M.W. Marcellin, eds.), IEEE Computer Society, Snowbird, Utah, U.S.A., March 25-27, 2008, pp. 272-281, 2008.

Range Median of Minima Queries

A natural extension of RMQ is considered, where a further constraint is that if the minimum in a query interval is not unique, then the query should return an approximation of the median position among all positions that attain this minimum. A succinct preprocessing scheme using Dn+o(n) bits in addition to the static input array (for a small constant D) has been developed, such that subsequent range median of minima queries can be answered in constant time. This data structure can be built in linear time, with little extra space needed at construction time. Several new combinatorial concepts are introduced such as Super-Cartesian Trees and Super-Ballot Numbers, which are believed to have other interesting applications in the future.

Johannes Fischer, Volker Heun, Range Median of Minima Queries, Super-Cartesian Trees, and Text Indexing, Proceedings of the International Workshop on Combinatorial Algorithms (IWOCA’08) (Mirka Miller, Koichi Wada, eds.), College Publications, vol. 12, September 13-15, 2008, Nagoya, Japan, pp. 239-252, 2010.

BiBTex

Johannes Fischer, Volker Heun, Finding Range Minima in the Middle: Approximations and Applications, Mathematics in Computer Science, vol. 3, no. 1, pp. 17-30, 2010.

String Mining in Bioinformatics

String Mining

Project Website

A new algorithmic framework has been developed for solving frequency-related data mining queries on databases of strings in optimal time, i.e., in time linear in the input and the output size. The additional space is linear in the input size. This framework can be used to mine frequent strings, emerging strings, and strings that pass other statistical tests, e.g., the 2-test. The advantages of array-based data structures (compared with dynamic data structures such as trees) are good locality behavior and extensibility to secondary memory. The algorithm is tested on real-world biological data and demonstrates that the approach also works well in practice.

Johannes Fischer, Volker Heun, Stefan Kramer, Optimal String Mining Under Frequency Constraints, Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’06) (J. Fürnkranz, T. Scheffer, M. Spiliopoulou, eds.), Springer-Verlag, vol. 4213, Berlin, Germany, September 18-22, 2006, pp. 139-150, 2006.

Algorithms for Evolutionary Aspects

Hybroscale - Computation of hybridization networks

Project Website

Hybroscale is developed specifically for the research of hybridization networks including its computation and visualization.

Celine Scornavacca, Simone Linz, Benjamin Albrecht, A First Step Toward Computing All Hybridization Networks For Two Rooted Binary Phylogenetic Trees, Journal of Computational Biology, vol. 19, no. 11, pp. 1227-1242, 2012.

Benjamin Albrecht, Celine Scornavacca, Alberto Cenci, Daniel H Huson, Fast computation of minimum hybridization networks, Bioinformatics, vol. 28, no. 2, pp. 191-197, 2011.

Sorting by Prefix reversals

Sorting by Prefix Reversals, also known as Pancake Flipping, is the problem of transforming a given permutation into the identity permutation, where the only allowed operations are reversals of a prefix of the permutation. The problem complexity is still unknown. The first polynomial-time 2-approximation algorithm to solve this problem has been developed. Empirical tests suggest that the average performance is in fact better than 2.

Johannes Fischer, Simon W. Ginzinger, A 2-Approximation Algorithm for Sorting by Prefix Reversals, Proceedings of the 13th Annual European Symposium on Algorithms (ESA’05) (Gerth Stølting Brodal, Stefano Leonardi, eds.), Springer-Verlag, vol. 3669, Mallorca, Spain, October 3-6, 2005, pp. 415-425,, 2005.

Jeremias Weihmann, Genome-Rearrangements: Sortieren mit erweiterten Transreversals, Diploma Thesis, LFE Bioinformatik / LMU München, 2009.

BiBTex

Reconstructing Ultrametric Trees

The reconstruction of an ultrametric tree from a distance matrix is a very frequent subproblem in clustering or reconstructing evolutionary trees, both are common problems in Bioinformatics. In his famous book, Gusfield presented a very simple, but not time-optimal recursive algorithm for this problem (see also errata of Gusfield's book). It has been shown that a simple modification of Gusfield's algorithm allows a time-optimal solution, which is the first simple optimal algorithm for this problem.

Volker Heun, Analysis of a Modification of Gusfield’s Recursive Algorithm for Reconstructing Ultrametric Trees, Information Processing Letters, vol. 108, no. 4, pp. 222-225, 2008.

Protein Folding

Combinatorial Protein Folding

The extended cubic lattice is a natural extension of the cubic lattice for the HP model proposed by Dill et al that bypasses its major drawback, its bipartiteness, by adding plane diagonals. In this model, general folding algorithms which achieve an approximation ratio of 59/70 for all protein sequences and an approximation ratio of 37/42 for a restricted but quite natural subset of HP-sequences have been proposed.

Volker Heun, Approximate Protein Folding in the HP Side Chain Model on Extended Cubic Lattices, Discrete Applied Mathematics, vol. 127, no. 1, pp. 163-177, 2003.

Experimental Protein Structure Determination

Analysis of Chemical Shift Data for NMR Structure Determination

Project Website

The goal of this project is to achieve a considerable speed-up in the NMR structure solving process. Therefore computational methods are developed which allow the creation of models for a protein structure at an experimental stage in which this is not possible by hand. These models may then be refined and tested for correctness using the "standard" way. The time saving results from the need for fewer experiments in when building the initial model.

Automatic Correction of Inconsistent Chemical Shift Referencing

Project Website

The construction of a consistent protein chemical shift database is an important step toward making more extensive use of this data in structural studies. Nowadays available data is frequently impurified, in particular with respect to chemical shift referencing, which is often either inaccurate or inconsistently annotated. CheckShift is at tool for preprocessing chemical shift data to detect and correct referencing errors.

druckenprint

nach obentop

Servicebereich

Beliebte LinksQuick Links

Informationen fürInformation for

Schüler

Studiengang Bioinformatik

Studierende

Hiwi-Jobs

Suche

Links und Funktionen

Navigationspfad

Hauptnavigation

Inhaltsbereich

Algorithmic Bioinformatics

Researchers

Alumni

Range Minimum Queries and Suffix Arrays

Compressed Range Minimum Queries

Range Median of Minima Queries

String Mining

Hybroscale - Computation of hybridization networks

Sorting by Prefix reversals

Reconstructing Ultrametric Trees

Combinatorial Protein Folding

Analysis of Chemical Shift Data for NMR Structure Determination

Automatic Correction of Inconsistent Chemical Shift Referencing

Servicebereich

Beliebte LinksQuick Links

Informationen fürInformation for

Schüler

Studierende

Fußzeile