The world's first wiki where authorship really matters (Nature Genetics, 2008). Due credit and reputation for authors. Imagine a global collaborative knowledge base for original thoughts. Search thousands of articles and collaborate with scientists around the globe.
wikigene or wiki gene protein drug chemical gene disease author authorship tracking collaborative publishing evolutionary knowledge reputation system wiki2.0 global collaboration genes proteins drugs chemicals diseases compound
Hoffmann, R. A wiki for the life sciences where authorship matters. Nature Genetics (2008)
 
 
 
 
 

Short superstrings and the structure of overlapping strings.

Given a collection of strings S = [s1,...,sn] over an alphabet sigma, a superstring alpha of S is a string containing each si as a substring, that is, for each i, 1 < or = i < or = n, alpha contains a block of magnitude of si consecutive characters that match si exactly. The shortest superstring problem is the problem of finding a superstring alpha of minimum length. The shortest superstring problem has applications in both computational biology and data compression. The shortest superstring problem is NP-hard (Gallant et al., 1980); in fact, it was recently shown to be MAX SNP-hard (Blum et al., 1994). Given the importance of the applications, several heuristics and approximation algorithms have been proposed. Constant factor approximation algorithms have been given in Blum et al. (1994) (factor of 3), Teng and Yao (1993) (factor of 2 8/9), Czumaj et al. (1994) (factor of 2 5/6), and Kosaraju et al. (1994) (factor of 2 50/63). Informally, the key to any algorithm for the shortest superstring problem is to identify sets of strings with large amounts of similarity, or overlap. Although the previous algorithms and their analyses have grown increasingly sophisticated, they reveal remarkably little about the structure of strings with large amounts of overlap. In this sense, they are solving a more general problem than the one at hand. In this paper, we study the structure of strings with large amounts of overlap and use our understanding to give an algorithm that finds a superstring whose length is no more than 2 3/4 times that of the optimal superstring. Our algorithm runs in O(magnitude of S + n3) time, which matches that of previous algorithms. We prove several interesting properties about short periodic strings, allowing us to answer questions of the following form: Given a string with some periodic structure, characterize all the possible periodic strings that can have a large amount of overlap with the first string.[1]

References

  1. Short superstrings and the structure of overlapping strings. Armen, C., Stein, C. J. Comput. Biol. (1995) [Pubmed]
 
WikiGenes - Universities