What is Natural Language Processing?

What it is and why use it.

Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software.

The study of natural language processing has been around for more than 50 years and grew out of the field of linguistics with the rise of computers.

In this post, you will discover what natural language processing is and why it is so important.

After reading this post, you will know (hopefully):

  • What natural language is and how it is different from other types of data.
  • What makes working with natural language so challenging.
  • Where the field of NLP came from and how it is defined by modern practitioners.

Natural Language

Natural language refers to the way we, humans, communicate with each other.

Namely, speech and text. We are surrounded by text.

Think about how much text you see each day:

  • Signs
  • Menus
  • Email
  • SMS
  • Web Pages

and obviously so much more…

The list is endless.

Now think about speech.

We may speak to each other, as a species, more than we write. It may even be easier to learn to speak than to write.

Voice and text are how we communicate with each other.

Given the importance of this type of data, we must have methods to understand and reason about natural language, just like we do for other types of data.

Linguistics

Linguistics is the scientific study of language, including its grammar, semantics, and phonetics.

Classical linguistics involved devising and evaluating rules of language. Great progress was made on formal methods for syntax and semantics, but for the most part, the interesting problems in natural language understanding resist clean mathematical formalisms.

Broadly, a linguist is anyone who studies language, but perhaps more colloquially, a self-defining linguist may be more focused on being out in the field.

Mathematics is the tool of science. Mathematicians working on natural language may refer to their study as mathematical linguistics, focusing exclusively on the use of discrete mathematical formalisms and theory for natural language (e.g. formal languages and automata theory).

Statistical Natural Language Processing

Computational linguistics also became known by the name of natural language process, or NLP, to reflect the more engineer-based or empirical approach of the statistical methods.

The statistical dominance of the field also often leads to NLP being described as Statistical Natural Language Processing, perhaps to distance it from the classical computational linguistics methods.

Natural Language Processing

As machine learning practitioners interested in working with text data, we are concerned with the tools and methods from the field of Natural Language Processing.

We have seen the path from linguistics to NLP in the previous section. Now, let’s take a look at how modern researchers and practitioners define what NLP is all about.

In perhaps one of the more widely textbooks written by top researchers in the field, they refer to the subject as “linguistic science,” permitting discussion of both classical linguistics and modern statistical methods.

Touching on the points above, I hope you’re able to understand a bit more clearly why natural language processing is so important.

I began a small project integrating NLP with Ruby using this great resource: http://rubynlp.org/.

Recent software engineering graduate who enjoys exploring the intersection between business and code.