Python+Machine Learning tutorial – Working with textual data

Machine Learning with text data can be very useful for social networks analytics for instance to perform sentiment analysis. Extracting a “machine learnable” representation from raw text is an art in itself. In this session we will introduce the bag of words representation and its implementation in scikit-learn via its text vectorizers. We will discuss preprocessing with NLTK, n-grams extractions, TF-IDF weighting and the use of SciPy sparse matrices. Finally we will use that data to train and evaluate a Naive Bayes classifier and a Linear Support Vector Machine.

Speaker Details

Olivier Grisel is a software engineer in the Parietal team of Inria. He works to improve the speed and scalability of the scikit-learn machine learning library for the Python / Numpy / Scipy ecosystem. He also likes to share interesting Machine Learning papers and tricks on twitter: @ogrisel

Date:
Speakers:
Olivier Grisel
Affiliation:
INRIA
    • Portrait of Jeff Running

      Jeff Running