I hate Emojis. I’m sorry, I do. So I decided to make my own. Automatically.

# Strike a POS!

The idea is to take a given piece of text, and replace some words automatically with custom-made emojis, which are basically images. Let’s worry about finding images for our emojis later. Now, suppose you have a text, e.g. a WhatsApp correspondence between you and a friend. Which words will be replaced by emojis, and which won’t?

An emoji would probably replace a noun (e.g. “cat” in “did you remember to feed the cat?”), sometimes an adverb (e.g. “sick” in “I feel sick”). So we’re going to identify nouns! This process is called “Part Of Speech Tagging” or POS Tagging for short. In R, there are several ways to do this off-shelf. I’m going to go with the NLP and openNLP libraries. Let’s see how it is done using a Dumbledore quote, and a simple function I modified a bit from here:

library(NLP)
library(openNLP)

dumbledore <- "Happiness can be found even in the darkest of times, if one only remembers to turn on the light."

tagPOS <-  function(x, ...) {
s <- as.String(x)
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
a3w <- a3[a3$type == "word"] POStags <- unlist(lapply(a3w$features, [[, "POS"))
POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
list(POStagged = POStagged, POStags = POStags,
nouns = s[a3w][POStags %in% c("NN", "NNS", "NNP", "NNPS", "CD")])
}

tagPOS(dumbledore)$POStagged ## [1] "Happiness/NNP can/MD be/VB found/VBN even/RB in/IN the/DT darkest/NN of/IN times/NNS ,/, if/IN one/PRP only/RB remembers/VBZ to/TO turn/VB on/IN the/DT light/NN ./." We got our sentence back, only now each and every POS is tagged: • “Happinnes” is of tag “NNP” (proper noun, singular) • “can” is of tag “MD” (modal) • “be” is of tag “VB” (verb) • “found” is of tag “VBN” (verb, past participle) • Etc. For a full list of POS tags abbreviations and their meanings see here. You’ll notice the tagPOS also returns a vector of all nouns. This is what we’re going to use here, from any given text. # Google me an Image Where does one goes to these days to retrieve oneself a fine piece of image? Why, it’s Google Images. Recently I moaned about Google changing their API so it’s hard to get thousands of images from their search results. But we only need one :) library(tidyverse) library(rvest) library(stringr) downloadNounImage <- function(noun) { url <- paste0("https://www.google.com/search?q=", noun, "&tbm=isch&tbo=u&source=univ&sa=X&ved=0ahUKEwipiu-4iOrUAhXFPBQKHVTzAJMQsAQIMw&biw=1920&bih=950") fileName <- paste0("nounsImages/", noun, ".jpg") read_html(url) %>% html_nodes("img") %>% html_attr("src") %>% discard(!str_detect(., "encrypted")) %>% head(1) %>% download.file(., fileName, mode = "wb") return(fileName) } The downloadNounImage function, courtesy of the rvest package, will simply download to disk the first image found on Google Images, for a single-word search term, in our case, a noun. These are the first few images we got: # All Together Now So let’s write some text, a typical correspondence between my partner and me (did I mention we have three girls?): text <- "Hi, when are you coming home? Gee, I don't know... I guess I'm leaving at 5. Why? You need anything? Yes, I need you to pick up some diapers, formula for the babies, some fruits, vegetables, juice... OK, do we have enough milk? Yes, but you could buy some chocolate milk. Are you sure? Last time you said we had enough and I was left without my coffee on a Saturday morning. Harrumph. OK. Oh, and did you happen to pay for kindergarten next year? No... Forgot... Angry face. Will do it tomorrow. Oh, I need you to pick up Eli's sandals from Grandma. She needs them tomorrow for the trip. OK, I'll see you in a bit. Don't take too long. Babies are getting on my nerves. Haha, OK, bye Bitch." Let’s POS-tag it to extract nouns and scrape their images: taggedText <- tagPOS(text) nounsTable <- tibble(noun = taggedText$nouns) %>%

nounsTable
## # A tibble: 30 x 2
##          noun                  fileName
##         <chr>                     <chr>
##  1       home       nounImages/home.jpg
##  2          5          nounImages/5.jpg
##  3   anything   nounImages/anything.jpg
##  4    diapers    nounImages/diapers.jpg
##  5    formula    nounImages/formula.jpg
##  6     babies     nounImages/babies.jpg
##  7     fruits     nounImages/fruits.jpg
##  8 vegetables nounImages/vegetables.jpg
##  9      juice      nounImages/juice.jpg
## 10       milk       nounImages/milk.jpg
## # ... with 20 more rows

That’s it, we have our nouns, we have our images for emojis, the only thing left is printing this correspondence nicely. I’m using here the grid and magick packages. I’m looping through words. If I can’t find the word in my nouns list, I’m printing it as is. If it is in my nouns list, I render its image, fixed-size and with a black border. You can see that I’m struggling with grid. Please, feel free to help me :(

library(magick)
library(grid)

textSplitted <- str_extract_all(text, "[a-zA-Z]+|[:punct:]|[0-9]+|\n|[ ]")[[1]]

parMai <- par()$mai par(mai = c(0, 0, 0, 0)) plot.new() grid.rect(gp = gpar(fill = "aliceblue")) x <- 0.05 y <- 0.95 for (i in 1:length(textSplitted)) { word <- textSplitted[i] if (x > 0.9) { x <- 0.05 y <- y - 0.045 } if (word == "\n") { x <- 0.05 y <- y - 0.045 } else if (word %in% nounsTable$noun) {
img <- image_read(paste0("~/nounsImages/", word, ".jpg")) %>%
image_border(color = "black", "2x2")
xleft <- ifelse(x > 0.75, x + 0.03, ifelse(x < 0.2, x - 0.03, x))
ybottom <- ifelse(y < 0.3, y - 0.05, ifelse(y < 0.5, y - 0.025, ifelse(y < 0.7, y - 0.01, y)))
rasterImage(img, xleft = xleft, ybottom = ybottom, xright = xleft + 0.04, ytop = ybottom + 0.04)
x <- x + 0.05
} else {
grid.text(word, x = x, y = y, gp=gpar(fontsize = 13, col = "black"), just = "left")
lengthWLastWord <- length(str_extract_all(word, "W")[[1]])
lengthCapitalizedLastWord <- length(str_extract_all(word, "[A-Z]")[[1]]) -
length(str_extract_all(word, "I")[[1]]) - lengthWLastWord
lengthSkinnyLetters <- length(str_extract_all(word, "l|i|I|,|'|\\.")[[1]])
lengthRegularLastWord <- nchar(word) - lengthCapitalizedLastWord - lengthSkinnyLetters
x <- x + lengthRegularLastWord * 0.013 + lengthSkinnyLetters *  0.008 +
lengthCapitalizedLastWord * 0.017 + lengthWLastWord * 0.018
}
}

par(mai = parMai)

# I Still Hate Emojis

Well. Was this fun? Yes. Was this useful? Not without a full-blown App. Am I going to make this App? No! Feel free to take inspiratiion and make millions of dollars out of this idea. I still hate emojis, I’m sorry! Sue me, I like words.