Ever since RStudio announced version 1.2 of their IDE, in which you can run both R and Python, with the help of the reticulate package - my life has changed. In fact I think it is the most important revolution in a R programmer’s work ever since the tidyverse and R Markdown revolutions, brought to you by RStudio as well. In my experience, there’s almost nothing you can do in R and not in Python, and vice versa. However, there are things which I absolutely prefer to do in R (e.g. scraping, data wrangling, statistics and visualization) and there are things which I absolutely prefer to do in Python (e.g. working with JSON and dictionaries). Yes, that’s right. I’m coming out as Bilingual!

So, I thought about a nice application which would demonstrate this new power bestowed upon me, and I suddenly remembered a field of research which was very popular around 1990-20101: Facial Attractiveness. Or more specifically finding what makes a face attractive, and the averaging of faces. One work which immediately comes to mind is the one in which the faces of people from different nationalities were averaged to create a sort of a facial prototype of this nationality. See example:

I think the work originated in the Face Research Lab in Glasgow University, but I haven’t been able to find the exact paper describing it.

So what does Python have in this field of research that R hasn’t (yet)? The OpenCV library for Computer Vision. R has various CV packages such as imager and magick which I will use here. You can implement yourself CV algorithms (e.g. SSD which I did here). But it seems like there is nothing quite as OpenCV in R, specifically here I’m interested in an implementation of the Face Detection algorithm by Viola and Jones2. The Viola-Jones detector is fast and was very popular back in the days before Deep Learning. It may still lie somewhere on your mobile phone though. I will explain some of it later on.

So the idea is:

  1. Scraping some images containing faces in R
  2. Extracting the faces with Python’s OpenCV
  3. Averaging the faces with R’s magick.

And in order to not die of boredom, I’m going to do this with the covers of American Vogue from the past 70 years :) Here’s the outline in a single image:

Strike a cover (FAILED!)

Well the idea here was to scrape the web for all American Vogue covers since 1950, and I had failed. I couldn’t find any site which contained these images and allowed me to download them programatically. The Google Images strategy I used here was no good either because it would only produce the covers as very small thumbnails. So I’m embarrassed to say I did this manually! But not for 12 * 70 = 840 covers. I downloaded the Septmeber issue cover for each and every year, if it had a clear frontal face on it. If not I looked up other months until I had 10 good covers for each decade since 1950:


get_covers_for_decade <- function(decade) {
  list.files("C:/SDAD_materials/vogue_covers", full.names = TRUE) %>%
    discard(!str_detect(., str_sub(decade, 1, 3)))

covers <- tibble(decade = 1900 + 5:11*10) %>%
  mutate(cover_file = map(decade, get_covers_for_decade)) %>%
  unnest(cover_file) %>%
  mutate(cover = map(cover_file, image_read))

collage_covers_for_decade <- function(decade_) {
  images_list <- covers %>%
    filter(decade == decade_) %>%
    mutate(cover = map(cover, image_resize, "120x200!"),
           cover = map(cover, image_border, "black", "3x3")) %>%
    stack = TRUE)







Strike a face

And now for the real magic! I’ll load the reticulate package, and activate the Python console:


And this here is Python code, importing OpenCV and performing Viola Jones face detection on the cover of the 1952 September issue. Notice I need the haarcascade_frontalface_default.xml file and I had to specify the full scary path to get it:

import cv2 as cv
cover_file = "C:/SDAD_materials/vogue_covers/1952_09.jpg"
face_cascade = cv.CascadeClassifier(
img = cv.imread(cover_file)
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)

To access the result we need to quit the Python console to go back to R and use the object py we now have back in R:

##      [,1] [,2] [,3] [,4]
## [1,]  329  400  491  491
## [2,]  867 1423  110  110
## [3,]  467  968   72   72

What did we get? We got 3 faces! Each face is a rectangle inside the cover image. So for each face, we have four numbers: the top-left X and Y locations, the width and height in pixels. We can draw these rectangles on the cover of the 1952 September issue:

get_rect_params <- function(face) {
  list(xleft = face[1],
       ybottom = face[2],
       xright = face[1] + face[3],
       ytop = face[2] + face[4]
img <- image_read(py$cover_file) %>% image_draw()
for (i in 1:nrow(py$faces)) {
  rp <- get_rect_params(py$faces[i, ])
  rect(rp$xleft, rp$ybottom, rp$xright, rp$ytop, border = "red", lwd = 5)

img %>% image_resize("30%")

Alas, we got two false positive faces. In order to alleviate this issue I will call the detectMultiScale3() function (instead of detectMultiScale()) which allows me to retrieve some level of certainty in the detected face, sort the faces and choose the single face with the highest certainty. If no faces are detected I will return an empty tuple. Finally I will put this all in the get_face() function, in a get_face.py script:

import cv2 as cv
import numpy as np
def get_face(file):
  face_cascade = cv.CascadeClassifier(
  img = cv.imread(file)
  gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
  faces = face_cascade.detectMultiScale3(gray, outputRejectLevels=True)
  if len(faces[0]) > 0:
    return faces[0][np.flip(np.argsort(np.ndarray.flatten(faces[2])), 0)][0]
    return faces[0]

Back in R I simply perform source_python() to load the get_face.py script:


And now I have a get_face() function in R, ready at my disposal!

cover_file <- "C:/SDAD_materials/vogue_covers/1952_09.jpg"
face <- get_face(cover_file)
## [1] 329 400 491 491

The face I got is a simple vector in R, and I can see if it’s the face I want:

img <- image_read(cover_file) %>% image_draw()
rp <- get_rect_params(face)
rect(rp$xleft, rp$ybottom, rp$xright, rp$ytop, border = "red", lwd = 5)
img %>% image_resize("30%")

Cool. Let’s now write a function crop_face() which will wrap get_face and crop just the faces and resize them to be of equal width and height. If no face was detected it will return NULL which later plays well with magick::image_join(). Then we’ll load all covers and crop the face out of them in a nice table:

crop_face <- function(cover_file) {
  face <- get_face(cover_file)
  if (length(face) > 0) {
    xbottom <- face[1]
    ytop <- face[2]
    width <- face[3]
    height <- face[4]
    crop_str <- str_c(width, "x", height, "+", xbottom, "+", ytop)
    cropped_face <- image_read(cover_file) %>%
      image_draw() %>%
      image_crop(crop_str) %>%
      image_resize("200x200!") %>%
      image_modulate(brightness = 120)
  } else {

faces_table <- tibble(decade = 1900 + 5:11*10) %>%
  mutate(cover_file = map(decade, get_covers_for_decade)) %>%
  unnest(cover_file) %>%
  mutate(cover = map(cover_file, image_read),
         face = map(cover_file, crop_face))

## # A tibble: 70 x 4
##    decade cover_file                         cover          face          
##     <dbl> <chr>                              <list>         <list>        
##  1   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
##  2   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
##  3   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
##  4   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
##  5   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
##  6   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
##  7   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
##  8   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
##  9   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
## 10   1950 C:/SDAD_materials/vogue_covers/19~ <S3: magick-i~ <S3: magick-i~
## # ... with 60 more rows

The faces_table has 70 rows, 10 faces for each of 7 decades, let’s see them all:

collage_faces_for_decade <- function(decade_) {
  images_list <- faces_table %>%
    filter(decade == decade_) %>%
    mutate(face = map(face, image_border, "black", "3x3")) %>%
    stack = TRUE) %>%








Now it looks really easy but if you’re following my steps notice the huge bias in selecting these images, in the first place. This for example is a beautiful cover featuring Lupita Nyong’o, I couldn’t use:

lupita <- "C:/SDAD_materials/lupita_on_vogue.jpg"

And I couldn’t use it because this was the face get_face() returned:


You might see a face in this pattern, but it’s definitely not Lupita’s. I don’t know if this could be regarded as yet another example of a face recognition algorithm which is biased towards men and women of non-white ethnicities. This false positive effect also occurred in covers featuring white women. And I did get Lupita eventually from a different cover, as well as Serena Williams, Beyonce, Naomi Campbell and Beverly Johnson. But it is always an option one has to consider.

Strike an average

Fortunately, to get a quick average face we have the image_average() function from the wonderful magick package. Let’s see those averages!

get_average_face_for_decade <- function(decade_) {
  faces_table %>%
    filter(decade == decade_) %>%
    pull(face) %>%
    image_join() %>%

av_1950 <- get_average_face_for_decade(1950)
av_1960 <- get_average_face_for_decade(1960)
av_1970 <- get_average_face_for_decade(1970)
av_1980 <- get_average_face_for_decade(1980)
av_1990 <- get_average_face_for_decade(1990)
av_2000 <- get_average_face_for_decade(2000)
av_2010 <- get_average_face_for_decade(2010)

present_average_face <- function(av_face) {
  image_composite(image_blank(206, 206, "black"), av_face, offset = "+3+3")








How cool is that? All together now:

annotate_decade <- function(av_face, decade) {
    present_average_face(av_face) %>% image_resize("50%"),
    image_blank(100, 30, "white")
  ), stack = TRUE) %>%
  image_annotate(str_c(decade, "s"), "South", size = 20,
                 boxcolor = "white") %>%
    image_border("white", "2x3")  
average_faces <- image_append(c(annotate_decade(av_1950, 1950),
                                annotate_decade(av_1960, 1960),
                                annotate_decade(av_1970, 1970),
                                annotate_decade(av_1980, 1980),
                                annotate_decade(av_1990, 1990),
                                annotate_decade(av_2000, 2000),
                                annotate_decade(av_2010, 2010)))

image_append(c(image_blank(730, 80, "white") %>%
                 image_annotate("Average Faces on the cover of American Vogue by Decade",
                                "Center", size = 25),
               image_blank(740, 50, "white")), stack = TRUE)

Strike an average with PCA

I made sure that what image_average() actually does is a simple average of the faces, pixel by pixel. I’m no expert on face averaging but I’m pretty sure that apart from using some kind of algorithm to actually align the faces so that at least all faces pairs of eyes would be right on top of each other, there is extensive use in PCA here.

Here I will perform PCA on the 1950s faces, project the faces on just the first principal component (out of 10 possible), average them and see whether I get a better result. The PCA here is done separately for the Red, Green and Blue channels of the image, then I read them back as a 3-Channel image and plot it. This Cross Validated answer was very helpful.

get_channel_vec <- function(mat, channel_num) {
  matrix(mat[, , channel_num], 1, dim(mat)[1] * dim(mat)[2])

faces_50 <- faces_table %>%
  filter(decade == 1950) %>%
  mutate(face_mat = map(face, function(face) image_data(face) %>% as.numeric()),
         red = map(face_mat, get_channel_vec, 1),
         green = map(face_mat, get_channel_vec, 2),
         blue = map(face_mat, get_channel_vec, 3))

get_projected_face_single_channel <- function(channel_str, nPCs = 1) {
  X <- do.call("rbind", faces_50[[channel_str]])
  pca <- prcomp(t(X), scale. = TRUE)
  X_proj <- pca$x[, 1:nPCs] %*% t(pca$rotation[, 1:nPCs])
  X_proj <- scale(X_proj, center = FALSE , scale = 1/pca$scale)
  X_proj <- scale(X_proj, center = -1 * pca$center, scale = FALSE)
  face_proj <- apply(X_proj, 1, mean)
  array(face_proj, c(200, 200, 1))

av_face <- abind::abind(
  get_projected_face_single_channel("blue"), along = 3)

    image_blank(200, 30, "white") %>%
      image_annotate("Original (or 10 PCs)", "Center", size = 20)),
    stack = TRUE) %>%
    image_border("white", "2x3"),
                 image_blank(200, 30, "white") %>%
                   image_annotate("With PCA (1 PC)", "Center", size = 20)),
               stack = TRUE) %>%
    image_border("white", "2x3")

So to me the result is slightly better, i.e. smoother, cleaner. But it’s not that different so I’ll move on.

Viola who?

So, how does this Viola Jones thingy work? I will try to explain it in a very high level. Then I will ask the questions you should be asking yourselves, which will hopefully motivate you to read the actual paper which is surprisingly easy to read.

The Viola Jones detector will be given a squared sub-window of the image and start by looking for the most basic features of any face: the horizontal line where the eyes meet the upper cheeks, and the vertical line which is the bridge of the nose.

From here the detector works like a waterfall, or a cascade (hence the “Cascade” in the opencv CascadeClassifier class). If for a given sub-window the detector finds these basic features, it will move on to the next stage of finding more complex features (e.g. the horizontal line of the lips). If not - it will discard this sub-window and never look at it again. The same with moving from stage 2 to stage 3, etc. This makes sense because the eyes and nose are most prominent features in a frontal face and if you can’t detect (potential for) eyes and nose anywhere in a sub-window there’s no point in looking for more subtle features.

There are in total 38 such stages. The second one has not 2 but 10 features. The 3rd and 4th have 25 each. In total across 38 stages, a single sub-window will go through 6K facial features until it has been declared “a face”! . But if it fails at any of the 38 stages, it will be discarded. This is OK because in practice we expect very few faces in an image, and a sub-window will go through only 10 features on average, which speeds things up. Finally those 6K features spread across 38 stages is the content of the haarcascade_frontalface_default.xml file OpenCV provides us. You can easily read it and work with it in R. With RStudio version 1.2 you don’t have to.

But this is just one squared sub-window. For a given 600 x 400 image a face could be… anywhere! From a small 24x24 window, anywhere in the image, to a face covering the entire image. That’s a lot of squares. How can the detector work in real time? [hint: Integral Image]

And what do you mean by “finding a feature”, “finding the eyes”? [hint: Haar Features, the “haar” in haarcascades]

And how were these features decided upon anyway? What if I wanted to detect coffee mugs in images, would I have to create my own features? What is a coffee mug feature? And why the first stage has 2 features and the second 10, why Giora why?! [hint: Boosting]

In short, RTFM, Queen.

Strike a summary

As I said, the power of switching between R and Python in RStudio with the help of the reticualte package opens up many possibilities. It may also save a great deal of development time as you may no longer need methods “implemented in R and in Python”. Finally I too feel that a good Data Scientist needs to know both data dialects (and you will probably encounter even more dialects, as I have, e.g. Scala/Java and Go). That’s all.

  1. Maybe still is, I actually don’t know

  2. Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. IEEE Conference on Computer Vision and Pattern Recognition