Python+Machine Learning tutorial – Data munging for predictive modeling with pandas and scikit-learn

Building predictive models first requires shaping the data into the right format to meet the mathematical assumptions of machine learning algorithms. In this session we will introduce the pandas data frame data structure for munging heterogeneous data into a representation that is suitable for most scikit-learn models. In particular we address problems such as missing value imputation and categorical variables. We will illustrate those concepts by combining pandas-based feature engineering with scikit-learn Logistic Regression, Random Forests and Gradient Boosted Trees.

Speaker Details

Olivier Grisel is a software engineer in the Parietal team of Inria. He works to improve the speed and scalability of the scikit-learn machine learning library for the Python / Numpy / Scipy ecosystem. He also likes to share interesting Machine Learning papers and tricks on twitter: @ogrisel

Date:
Speakers:
Olivier Grisel
Affiliation:
French Institute for Research in Computer Science and Automation (INRIA)
    • Portrait of Jeff Running

      Jeff Running