I’ve always wanted to play with Sonification:

“… the use of non-speech audio to convey information or perceptualize data.”

(Wikipedia, the source of all knowledge)

Don’t get me wrong, I am thrilled with the sight of a nice visualization as the next (data geek) guy. But a good sound can really make me weep for joy. And let’s face it, the only new thing anyone can tell you about the Amazon stock, is how it sounds:

Haha, what in the world is going on? It’s actually pretty simple:

1. I downloaded the AMZN monthly stock price since 2000 from here.
2. I used the sonify package to treat this time series as the pitch of a sound wave, associating “ups” and “downs” in the data with high and low wave frequency, respectively.
3. I saved it to a .wav file with the tuneR package (and of course uploaded it to SoundCloud so I can share it with you)
library(readr)
library(sonify)

obj <- sonify(amzn\$Close, duration = 17, play = FALSE)
writeWave(obj, "amzn.wav")

If you plot a piece of the .Data attribute of the obj created, you’ll see the sound wave:

plot(obj@.Data[10:1000, 1], type = "l", xlab = "time", ylab = "amplitude",
main = "Portion of the AMZN Stock Price Sound Wave")

# The Sound of Probability

This has been done before, and with data having more than a single dimension. You can read all about Sonification in this free book. I’m interested this week in how probability sounds, or to be more specific how common probability distributions sound. When I taught Intro to Probability and Statistics to students back in university, I often had a hard time giving them intuition to what the distribution means, what its parameters are doing to it and why they have to sit there and listen to me when they signed up for a degree in Psychology, not Statistics1. I wonder if some students react better to sound instead of sight.

## Continuous Distributions

#### Normal Distirbution

What does that Bell Curve sound like? Let’s simulate a Norm(0, 1) distribution:

sonify(dnorm(seq(-3, 3, 0.1), 0, 1), duration = 6)

Like a police siren, actually. Try to imagine what would happen if we increase the standard deviation to 2 (imagine before you play!)

sonify(dnorm(seq(-5.5, 5.5, 0.1), 0, 2), duration = 6)

Did that surprise you? Well, it surprised me. The two distributions sound almost the same! I expected the Normal distribution with a higher standard deviation to be “flatter”, “wider”, because of increased dispersion. But then I remembered what happens when you plot the two side by side:

par(mfcol = c(1, 2))
x <- seq(-3, 3, 0.1)
plot(x, dnorm(x, 0, 1), type = "l", main = "Norm(0, 1) Distribution", ylab = "f")
x <- seq(-5.5, 5.5, 0.1)
plot(x, dnorm(x, 0, 2), type = "l", main = "Norm(0, 2) Distribution", ylab = "f")

par(mfcol = c(1, 1))

They look exactly the same! Except they’re not, not really. If you look at the x and y axes you will see that the Norm(0, 2) is in fact “flatter” and “wider”, and the way to see it is to draw them together in the same plot, using the same scale:

x <- seq(-3, 3, 0.1)
plot(x, dnorm(x, 0, 1), type = "l", main = "Norm(0, 1) and Norm(0, 2) Distributions",
ylab = "f", xlim = c(-5.5, 5.5))
x <- seq(-5.5, 5.5, 0.1)
lines(x, dnorm(x, 0, 2), col = "red")
legend("topleft", c("Norm(0, 1)", "Norm(0, 2)"), col = c("black", "red"), lty = 1)

Interestingly, what happened in the two graphs above happened with the sounds. The two distributions were turned into sound waves separately, using the same duration, with different scales - so it’s very hard to tell them appart. If you ask the Norm(0, 1) distribution to have the same x-axis (i.e. c(-5.5, 5.5)) as the Norm(0, 2) and play them in stereo - you should hear the difference:

norm_0_1 <- sonify(dnorm(seq(-5.5, 5.5, 0.1), 0, 1), play = FALSE, duration = 6)
norm_0_2 <- sonify(dnorm(seq(-5.5, 5.5, 0.1), 0, 2), play = FALSE, duration = 6)
norm_stereo <- stereo(mono(as(norm_0_1, "Wave")), mono(as(norm_0_2, "Wave")))
play(norm_stereo)

Listen to this carefully, you should be able to differentiate between two distributions, one going “up” before the other. And what if we decrease the standard deviation to 0.1?

sonify(dnorm(seq(-5.5, 5.5, 0.1), 0, 0.1), duration = 6)

Yes! A very “narrow” distribution / sound.

OK, by now, either you hate this, love this, or have no idea what is going on, but you feel it might be worth to continue.

What about the mean parameter? Let’s increase it from 0 to 1:

sonify(dnorm(seq(-5.5, 5.5, 1), 1, 1), duration = 6)

Right. If you compare this to the Norm(0, 1) sound you should hear that the distribution / sound has shifted “to the right”.

#### Exponential Distribution

The Exponential distribution is suitable for modelling the time until the next “event” occurs, e.g. the next incoming call in a calling center. The probability2 the next call will occur in time $$t$$ decays exponentially as $$t$$ grows. Let’s see Exp(1) and Exp(0.1):

x <- seq(0, 10, 0.1)
plot(x, dexp(x, 1), type = "l", main = "Exp(1) and Exp(0.1) Distributions", ylab = "f")
lines(x, dexp(x, 0.1), col = "red")
legend("topright", c("Exp(1)", "Exp(0.1)"), col = c("black", "red"), lty = 1)

As can be seen the higher the exponential parameter $$\lambda$$, the quicker the decay, or we expect the “event” to happen sooner. Let’s hear this:

exp_1 <- sonify(dexp(seq(0, 10, 0.1), 1), play = FALSE, duration = 6)
exp_01 <- sonify(dexp(seq(0, 10, 0.1), 0.1), play = FALSE, duration = 6)
exp_stereo <- stereo(mono(as(exp_1, "Wave")), mono(as(exp_01, "Wave")))
play(exp_stereo)

Do you hear the second distribution “going down” slower? Will it help you remember the meaning of the Exponential distribution parameter $$\lambda$$?3

#### Laplace Distribution

The Laplace distribution isn’t really common, but it’s beautiful and I’m interested in hearing it.

library(rmutil)

x <- seq(-5, 5, 0.1)
plot(x, dlaplace(x, 0, 1), type = "l", main = "Laplace(0, 1) Distribution", ylab = "f")

sonify(dlaplace(seq(-5, 5, 0.1), 0, 1), duration = 6)

The Laplace is also called the Double Exponential distribution, because it’s actually two mirroring Exponential distributions back to back. So if we bind the sounds of two Exponential distributions, it should sound the same as Laplace:

exp_1 <- dexp(seq(0, 10, 0.1), 1)
sonify(c(rev(exp_1), exp_1), duration = 6)

And it does, how lovely.

#### Chi Square Distribution

Another distribution which is close to the Exponential. Chi-Square with 2 degrees of freedom is exactly the Exp(0.5) distribution:

sonify(dchisq(seq(0, 10, 0.1), 2), duration = 6)

But most people know and dread the Chi Square distribution, when it looks and sounds like this:

x <- seq(0, 20, 0.1)
plot(x, dchisq(x, 6), type = "l", main = "Chi-Square(6) Distribution", ylab = "f")

sonify(dchisq(seq(0, 20, 0.1), 6), duration = 6)

#### Beta Distribution

Another photogenic distribution is the Beta distribution, specifically when its parameters are between 0 and 1:

x <- seq(0.01, 0.99, 0.01)
plot(x, dbeta(x, 0.5, 0.5), type = "l", main = "Beta(0.5, 0.5) Distribution", ylab = "f")

sonify(dbeta(seq(0.01, 0.99, 0.01), 0.5, 0.5), duration = 6)

Recall your probability course: what should Beta(1, 1) distribution sound like?

sonify(dbeta(seq(0.01, 0.99, 0.01), 1, 1), duration = 6)

Right! The Beta(1, 1) distribution is equivalent to a Uniform(0, 1) distribution - hence you’re hearing a “flat line”!

## Discrete Distributions

#### Binomial Distirbution

The beloved Binomial distribution, used to model the no. of successes in $$n$$ identical independent experiments, each with probability of success $$p$$. The tossing of an even coin and counting the number of Heads - is the classic example. How does a Binom(10, 0.2) distribution look and sound?

x <- 0:10
barplot(dbinom(x, 10, 0.2), main = "Binom(10, 0.2) Distribution", ylab = "p",
col = "white", names.arg = x, space = 0)

sonify(dbinom(0:10, 10, 0.2), duration = 6, interpolation = "constant")

Ha! Cool! Notice I used interpolation = "constant" to not smooth the sound.

Again, recall your probability course: what would Binom(10000, 0.2) distribution sound like?

sonify(dbinom(1800:2200, 10000, 0.2), duration = 6, interpolation = "constant")

Nice! It sounds just like a Normal(2000, 40) distribution because of Normal Approximation.

#### Geometric Distribution

The Geometric distribution is the discrete equivalent to the Exponential distribution: performing the same experiment with probability of success $$p$$, how many tries would it take before you “succeed”? E.g. getting home drunk, in the dark, with a set of 10 keys, trying to open the door… Theoretically it could take forever.

x <- 1:20
barplot(dgeom(x, 0.2), main = "Geom(0.2) Distribution", ylab = "p",
col = "white", names.arg = x, space = 0)

sonify(dgeom(1:20, 0.2), duration = 6, interpolation = "constant")

Sounds like my daughters’ favorite toy!

# I’m Dead Beat

Let’s just throw in everything we did together:

y <- c(dnorm(seq(-5, 5, 0.1), 0, 1), dexp(seq(0, 10, 0.1), 1), dchisq(seq(0, 10, 0.1), 3),
dbeta(seq(0.01, 0.99, 0.01), 0.5, 0.5),  dlaplace(seq(-5, 5, 0.1), 0, 1))
plot(y, type = "l", main = "")

This either looks like part of a Kandinsky painting, a sketch of the Two Towers, or… a single cycle of a crazy cool beat!

sonify(y, duration = 6)

Weird as I like it, but let’s speed it up and repeat it a few times:

sonify(rep(y, 10), duration = 15)

Ooh. Very fresh beat. I’d like to hear Tyler the Creator rap to this beat (put your earphones on!):

tylerTheCreator <- readWave("~/Tyler The Creator Sway Acapella Freestyle Remix (Back To Back).wav")
probabilityBeat <- sonify(rep(y, 43), duration = 60, play = FALSE)
tylerTheCreatorMono <- mono(tylerTheCreator)
probabilityBeatMono <- mono(as(probabilityBeat, "Wave"))
gap <- length(probabilityBeatMono@left) - length(tylerTheCreatorMono@left)
tylerTheCreatorMono@left <- c(tylerTheCreatorMono@left, rep(0, gap))
tylerWithBeat <- stereo(tylerTheCreatorMono, probabilityBeatMono)
play(tylerWithBeat)

Dope! I have made Tyler the Creator rap to Probability distributions! And all in R!

# Ending Sounds

I wonder if Ross Ihaka and Robert Gentleman who invented R ever thought this would happen. But on a serious note, some people are visually impaired and cannot see all these wonderful visualizations running around. A good alternative to convey data to visually impaired people - is sound. And I have demonstrated a few distributions’ sounds here, but this could be developed much further. Data has more dimensions and there are ways to sonify this. I intend to read a bit more, this is not the end of my journey with Sonification.

1. OK, Sonification would probably not have helped here

2. Actually the density, not probability, but we won’t go into that

3. If the answer is no, it’s perfectly OK!