- Introduction
- Installation
- Also See
- Linear Regression
- Probabilistic Programming
- Bayesian Linear Regression

This is a series of three essays, based on my notes from a 2017 PyData NYC tutorial. The first two essays are completely independent, and may be used as in introduction to linear regression or probabilistic programming, respectively. The third builds on the knowledge of those two, and uses the data set introduced in the first, but is fine to read if you are familiar with linear regression and probabilistic programming.

The essays go along with Jupyter notebooks and exercises. You can install the requirements by following instructions below. All material is hosted on github, and comments may be posted there as issues.

These talks cover a reasonable portion of three undergraduate courses in math, but requires only a hazy memory of the subjects to follow. We derive linear regression, along with regularization, from the standpoint of

- calculus, as the minimizer of a cost function,
- linear algebra, as a projection onto a subspace, and
- statistics, as the maximum a posteriori likelihood.

The goal of this talk is to help participants understand the math underlying so much of modern machine learning. I do not expect that attendees (or readers) will have new models or libraries to try, but I do expect that they will be better at tasks like diagnosing problems in the linear parts of their neural networks, explaining why logistic regression will be good (or bad) for a task, and giving colleagues some intuition for regularization.

If you wish to follow along in the essays and exercises (which I would recommend), the easiest way to install the requirements is using conda, which is fastest to get via Miniconda. Should also work via pip and the supplied requirements.txt file.

- Clone the repository from https://github.com/ColCarroll/pydata_nyc2017:
git clone https://github.com/ColCarroll/pydata_nyc2017.git

- Navigate to the folder:
cd pydata_nyc2017

- Create the conda environment:
conda env create -f environment.yml

- Activate the environment with one of:
conda activate pydata_nyc20173.6 # new conda

source activate pydata_nyc20173.6 # OSX/Linux

activate pydata_nyc20173.6 # Windows

- Start the jupyter notebook server
jupyter notebook

There were a number of other talks and workshops at PyData NYC 2017 covering similar Bayesian approaches. To mention a few talks, along with links to their videos (coming soon! placeholder for now):

- Understanding NBA Foul Calls with Python Austin Rochford
- Diamond: mixed-effects models in Python Timothy Sweetser
- Bayesian inference in computational chemistry. Chaya D. Stern
- Keynote Andrew Gelman
- Turning PyMC3 into scikit-learn Nicole Carlson
- An Attempt At Demystifying Bayesian Deep Learning Eric J. Ma

There were also three other workshops which (like this one) were not videotaped.

- pomegranate: fast and flexible probabilistic modeling in python Jacob Schreiber
- Bayesian Statistics from Scratch: Building up to MCMC Justin Bozonier
- Stan: Bayesian Modeling and Inference Made Easy Bob Carpenter