Difficulty: beginner
Estimated Time: 20 minutes

Natural Language Processing aggregates several tasks that can be performed, like:

  • Part of speech tagging
  • Word segmentation
  • Named entity recognition
  • Machine translation
  • Question answering
  • Sentiment analysis
  • Topic segmentation and recognition
  • Natural language generation

It all starts though with preparing text for further processing. In this lab you will learn how to use n-grams in the task of text classification.

Introduction to n-grams

Step 1 of 4


To start working with Python use the following command:


Instead of using words as features when building the vocabulary one can use n-grams. Let's start simple and create bigrams for the following array of words:

sentence_tokens = ['This', 'is', 'where', 'all', 'the', 'people', 'are', 'going', 'on', 'Friday']

To create a bigram use zip to go through two lists having different starting point:

def bigrams(tokens):
    return zip(tokens, tokens[1:])

Let's have a look:


Task: Create Trigrams function

Create function for trigram generation. Use zip function to go through three lists with different starting points.

def trigrams(tokens):

Once done print out the trigrams: