Natural Language Processing aggregates several tasks that can be performed, like:
- Part of speech tagging
- Word segmentation
- Named entity recognition
- Machine translation
- Question answering
- Sentiment analysis
- Topic segmentation and recognition
- Natural language generation
One of them is classifying the text based on the content. In this scenario you will learn how to use Bag of Words and td-idf models to perform the task.
Text Classification tasks starts with providing training set: documents and categories (labels) to the Machine Learning algorithm. After the model is trained it can be used to categorize new examples.
Text representation brings some complexity when forming machine learning problem. Usually the dataset has the form of rows organized into features.
In our case every document is a data point, label is a category, but what would features be?