In its introduction, the Python 3 Text Processing with NLTK 3 Cookbook claims to skip the preamble and ignore pedagogy, letting you jump straight into text processing. Although it does skip the preamble, I would argue that this statement is false – it definitely does not skip the pedagogy. The examples this book shows you are practical, understandable and well-explained.
The book is intended for those familiar with Python who want to use it in order to process natural language. Following this credo, there is no discussion about software design and no attempt to make especially elegant code. I tend to nitpick at code quality, and although there was nothing that upset me in the code examples here, they didn’t awe me with their subtle beauty. However, the raw power of NLTK, combined with the flexibility of Python, impressed me deeply.
The author takes you on a trip through a large section of natural language processing, starting with text tokenization and using Wordnet. I really enjoyed ideas on computing the semantic “distance” between different words by traversing subset trees. It then continues on to show you how to replace and correct words, tag parts of speech intexts, chunk texts and transform text chunks, and how to classify text. The whole thing is rounded off by a discussion on distributed processing with some nice examples of how to use execnet as a simple but effective message passing interface.
Reading all these examples made me want to go out and write a search engine or a text classifier – with NLTK, daunting tasks in this field become easy.
Above and beyond the practical text processing material in this book, what I enjoyed most was its coverage of various machine learning algorithms. The book definitely is not about machine learning, but it affords you a glimpse into the world of machine learning in a way that you can understand what you’re doing if you’re just using what different libraries give you out of the box. I appreciated these more extended explanations, which I often miss in texts involving machine learning.