Two views on regression with PyMC3 and scikit-learn

Introduction

This is a series of three essays, based on my notes from a 2017 PyData NYC tutorial. The first two essays are completely independent, and may be used as in introduction to linear regression or probabilistic programming, respectively. The third builds on the knowledge of those two, and uses the data set introduced in the first, but is fine to read if you are familiar with linear regression and probabilistic programming.

The essays go along with Jupyter notebooks and exercises. You can install the requirements by following instructions below. All material is hosted on github, and comments may be posted there as issues.

These talks cover a reasonable portion of three undergraduate courses in math, but requires only a hazy memory of the subjects to follow. We derive linear regression, along with regularization, from the standpoint of

calculus, as the minimizer of a cost function,
linear algebra, as a projection onto a subspace, and
statistics, as the maximum a posteriori likelihood.

The goal of this talk is to help participants understand the math underlying so much of modern machine learning. I do not expect that attendees (or readers) will have new models or libraries to try, but I do expect that they will be better at tasks like diagnosing problems in the linear parts of their neural networks, explaining why logistic regression will be good (or bad) for a task, and giving colleagues some intuition for regularization.

Installation

If you wish to follow along in the essays and exercises (which I would recommend), the easiest way to install the requirements is using conda, which is fastest to get via Miniconda. Should also work via pip and the supplied requirements.txt file.

Clone the repository from https://github.com/ColCarroll/pydata_nyc2017:
```
git clone https://github.com/ColCarroll/pydata_nyc2017.git
```
Navigate to the folder:
```
cd pydata_nyc2017
```
Create the conda environment:
```
conda env create -f environment.yml 
```

Activate the environment with one of:

conda activate pydata_nyc20173.6  # new conda

source activate pydata_nyc20173.6  # OSX/Linux

activate pydata_nyc20173.6  # Windows

Start the jupyter notebook server
```
jupyter notebook
```

Also See

There were a number of other talks and workshops at PyData NYC 2017 covering similar Bayesian approaches. To mention a few talks, along with links to their videos (coming soon! placeholder for now):

Understanding NBA Foul Calls with Python Austin Rochford
Diamond: mixed-effects models in Python Timothy Sweetser
Bayesian inference in computational chemistry. Chaya D. Stern
Keynote Andrew Gelman
Turning PyMC3 into scikit-learn Nicole Carlson
An Attempt At Demystifying Bayesian Deep Learning Eric J. Ma

There were also three other workshops which (like this one) were not videotaped.

pomegranate: fast and flexible probabilistic modeling in python Jacob Schreiber
Bayesian Statistics from Scratch: Building up to MCMC Justin Bozonier
Stan: Bayesian Modeling and Inference Made Easy Bob Carpenter

Two views on regression with PyMC3 and scikit-learn

Contents:

Introduction

Installation

Also See