Finding and Using Data

Dr. Colin Carroll, Kensho

hbr.org

iquantny.tumblr.com

Finding Data


R> all.datasets <- data.frame(
  data(package=.packages(all.available=T))[[3]]
)

R> all.datasets[,-2]
      Item   Package                   Title
      Nile  datasets  Flow of the River Nile
    Orange  datasets  Growth of Orange Trees
 txhousing   ggplot2     Housing sales in TX
      acme      boot  Monthly Excess Returns
Earthquake      nlme    Earthquake Intensity
      Meat      nlme      Tenderness of meat
          
  • Body Temperature Series of Two Beavers
  • Snail Mortality Data
  • Urine Analysis Data
  • Yearly Numbers of Important Discoveries
  • Distances Between European Cities and Between US Cities
  • The World's Telephones
  • Student Admissions at UC Berkeley
  • Number of Flaws in Cloth

kaggle.com/datasets

  • Hillary Clinton's emails
  • World Development Indicators from the World Bank
  • San Francisco Salaries
  • World University Rankings
  • Ocean Ship Logbooks

City of Boston

data.cityofboston.gov

  • Approved Building Permits
  • Crime Incident Reports
  • Property Assessment 2014
  • Employee Earnings Reports 2014
  • Closed Pothole Cases

github.com/fivethirtyeight/data

  • Objects in Bob Ross paintings
  • Tattoos in the NBA
  • Popularity of the Oxford Comma
  • Age of members of Congress
  • Life expectancy of Avengers
  • Cursing in Quentin Tarantino Movies

Using Data!

Demo

Jupyter Notebook

Enter the Hadleyverse

Demo

Jupyter Notebook