Python+Machine Learning tutorial – Working with textual data
Machine Learning with text data can be very useful for social networks analytics for instance to perform sentiment analysis. Extracting a “machine learnable” representation from raw text is an art in itself. In this session we will introduce the bag of words representation and its implementation in scikit-learn via its text vectorizers. We will discuss preprocessing with NLTK, n-grams extractions, TF-IDF weighting and the use of SciPy sparse matrices. Finally we will use that data to train and evaluate a Naive Bayes classifier and a Linear Support Vector Machine.
Speaker Details
Olivier Grisel is a software engineer in the Parietal team of Inria. He works to improve the speed and scalability of the scikit-learn machine learning library for the Python / Numpy / Scipy ecosystem. He also likes to share interesting Machine Learning papers and tricks on twitter: @ogrisel
- Date:
- Speakers:
- Olivier Grisel
- Affiliation:
- INRIA
-
-
Jeff Running
-