An essential introduction to the building blocks of modern text processing
String algorithms make it possible to process, store, and manipulate text with computational efficiency, with applications ranging from search engines and social networks that regularly process terabytes of information to areas like genomics, where the genome of an organism can be encoded as a long string of letters. This book provides an incisive introduction to the concepts and applications that every practitioner in the field needs to know. Ideal for the classroom and self-study, it guides readers from the fundamentals of string processing to advanced computational methods, presenting useful data structures and proof techniques for strings and other data and serving as an on-ramp to doing cutting-edge research in string algorithms.
- Discusses topics ranging from exact string matching and efficient edit distance computation to modern string data structures, sketching methods, and generative models of strings
- Covers data structures such as suffix trees, suffix arrays, wavelet trees, the Burrows-Wheeler transform, the FM index, and compressed bit vectors
- Presents an array of algorithms along with their proofs of correctness and running time
- Develops the skills needed to design and implement new string algorithms as well as various algorithmic techniques that are applicable beyond string algorithms
- Invaluable for anyone interested in processing large collections of string data, including genomic sequences and text for training large language models
- Includes hundreds of exercises and explanatory figures
- An indispensable resource for graduate students, advanced undergraduates, researchers, and practitioners