Review: Python 3 Text Processing with NLTK 3 Cookbook

In its introduction, the Python 3 Text Processing with NLTK 3 Cookbook claims to skip the preamble and ignore pedagogy, letting you jump straight into text processing. Although it does skip the preamble, I would argue that this statement is false – it definitely does not skip the pedagogy. The examples this book shows you are practical, understandable and well-explained.

Book cover

The book is intended for those familiar with Python who want to use it in order to process natural language. Following this credo, there is no discussion about software design and no attempt to make especially elegant code. I tend to nitpick at code quality, and although there was nothing that upset me in the code examples here, they didn’t awe me with their subtle beauty. However, the raw power of NLTK, combined with the flexibility of Python, impressed me deeply.

The author takes you on a trip through a large section of natural language processing, starting with text tokenization and using Wordnet. I really enjoyed ideas on computing the semantic “distance” between different words by traversing subset trees. It then continues on to show you how to replace and correct words, tag parts of speech intexts, chunk texts and transform text chunks, and how to classify text. The whole thing is rounded off by a discussion on distributed processing with some nice examples of how to use execnet as a simple but effective message passing interface.

Reading all these examples made me want to go out and write a search engine or a text classifier – with NLTK, daunting tasks in this field become easy.

Above and beyond the practical text processing material in this book, what I enjoyed most was its coverage of various machine learning algorithms. The book definitely is not about machine learning, but it affords you a glimpse into the world of machine learning in a way that you can understand what you’re doing if you’re just using what different libraries give you out of the box. I appreciated these more extended explanations, which I often miss in texts involving machine learning.


My name’s Daniel Lee. I’m an enthusiast for open source and sharing. I grew up in the United States and did my doctorate in Germany. I've founded a company for planning solar power. I've worked on analog space suit interfaces, drones and a bunch of other things in my free time. I'm also involved in standards work for meteorological data. I worked for a while German Weather Service on improving forecasts for weather and renewable power production. I later led the team for data ingest there before I started my current job, engineering software and data formats at EUMETSAT.

Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

From the archive
%d bloggers like this: