DATA 410 - Natural Language Processing

Upper Division

Prerequisites
DATA 305, DATA 320 ; Minimum grade C-.

This course is intended as a practical introduction to the most widely used techniques, strategies and toolkits for natural language processing. The text classification task is one of the most popular tasks that we deal with in real life. We use it in classifying news, spam filtering, sentiment analysis, etc. You will learn how to go from raw texts to predicted classes both with traditional methods (e.g. Linear Classifiers) and deep learning techniques (e.g. Convolutional Neural Nets). In addition, you will learn how to treat texts as a sequence of words, which is called the language modeling task in NLP. In particular, how to predict next words given some previous words. This is used for suggestions in searches, machine translation, chatbots, and so on. Finally, students will learn about vectors that represent meanings using modern tools for word and sentence embeddings, such as word2vec and will discuss how to embed the whole documents with topic models.

Repeatable
No

Additional Notes
Previous course number: DATA 152

Course credits: 4