Two views on regression with PyMC3 and scikit-learn

Colin Carroll

Contents:

  1. Introduction
  2. Installation
  3. Also See
  4. Linear Regression
  5. Probabilistic Programming
  6. Bayesian Linear Regression

Introduction

This is a series of three essays, based on my notes from a 2017 PyData NYC tutorial. The first two essays are completely independent, and may be used as in introduction to linear regression or probabilistic programming, respectively. The third builds on the knowledge of those two, and uses the data set introduced in the first, but is fine to read if you are familiar with linear regression and probabilistic programming.

The essays go along with Jupyter notebooks and exercises. You can install the requirements by following instructions below. All material is hosted on github, and comments may be posted there as issues.

These talks cover a reasonable portion of three undergraduate courses in math, but requires only a hazy memory of the subjects to follow. We derive linear regression, along with regularization, from the standpoint of

The goal of this talk is to help participants understand the math underlying so much of modern machine learning. I do not expect that attendees (or readers) will have new models or libraries to try, but I do expect that they will be better at tasks like diagnosing problems in the linear parts of their neural networks, explaining why logistic regression will be good (or bad) for a task, and giving colleagues some intuition for regularization.

Installation

If you wish to follow along in the essays and exercises (which I would recommend), the easiest way to install the requirements is using conda, which is fastest to get via Miniconda. Should also work via pip and the supplied requirements.txt file.

  1. Clone the repository from https://github.com/ColCarroll/pydata_nyc2017:
    git clone https://github.com/ColCarroll/pydata_nyc2017.git
  2. Navigate to the folder:
    cd pydata_nyc2017
  3. Create the conda environment:
    conda env create -f environment.yml 
  4. Activate the environment with one of:
    conda activate pydata_nyc20173.6  # new conda
    source activate pydata_nyc20173.6  # OSX/Linux
    activate pydata_nyc20173.6  # Windows
  5. Start the jupyter notebook server
    jupyter notebook

Also See

There were a number of other talks and workshops at PyData NYC 2017 covering similar Bayesian approaches. To mention a few talks, along with links to their videos (coming soon! placeholder for now):

There were also three other workshops which (like this one) were not videotaped.