Deep Visual Inference: Teaching Computers To See Rather Than Calculate Correlation

Giora Simchoni
July 31st, 2019

JSM 2019

This RPres/html is available at Github or at:

Who am I

  • Graduated MSc Statistics from TAU in 2010
  • Data Scientist (otherwise they won't hire me) subspecies Statistician
  • 888, ebay, IBM, vFunction
  • Blogger: Sex, Drugs and Data
  • R/Python enthusiast: Github

Line 'Em Up!

Does your plot contain a signal over noise?

  • The key to understanding Visual Inference:
  • A plot is a statistic
  • Permute your data a few times, gather a few plots
  • Judge your plot vs. the distribution of plots or run a survey
  • Assumption-free, Parameter-free
  • But how to present a distribution of plots?

Like so...

Is there a relation between a car's Engine Displacement and its Horse Power? (n = 32)

plot of chunk Lineup-example1

And so...

Is there a relation between Gender and Answer? (n = 843)

Q: Is it rude to bring a baby on plane?

plot of chunk Lineup-example2

And so...

Is there a relation between an actor's gender and no. of roles since his/her character ended? (n = 129)

plot of chunk Lineup-example3

Wait, did you just say “judge a plot”, as in “classify an image”, and the year is >= 2012?

Deep Learning can't solve all your problems

  • But it sure is good at Computer Vision
  • My idea: give a neural network thousands of scatter plots (mosaic plots, swarm plots)
  • Of varying linear correlation (Cramer's V, t statistic)
  • Train it to predict correlation (not calculate!)
  • If it's good (low MSE), show it the lineup
  • Make it choose the scatter plot with the highest score

A Convolutional Network

(Full code in my blog post Book'em Danno! and through References)

If it's good?

Oh, it's quite good

And, it picks the original plot