Jan Van Balen

research · notes · thesis · press · talks · github

What a WAV file looks like

2020-07-10

When the Wavenet paper was first published, it was accompanied by a blog post that featured the following helpful gif:

GIF of wave form zoomed in and zoomed back out.

I'd been noticing that pretty much every other blog post about wavenet, since, re-used the figure. It appears to be helpful in conveying, to people with some experience in machine learning but perhaps not with audio data, how impressively dense audio data actually is. In particular, it suggests that modeling audio in the time domain, shown here (as opposed to time-frequence domain), might just be really difficult.

Earlier today, Ethan Hein shared on Twitter that he's also using this figure in his teaching:

This gif is the single most valuable teaching resource I have for explaining how digital audio works. https://t.co/tLj6kr1wnl
— Ethan Hein (@ethanhein) July 10, 2020

However, I thought it might be even more useful if it also reflected that, when audio is digitized, it's not just the time axis that discretized, but also the amplitude. So I made version of the figure that reflects that a little better.

Here's a few seconds of a piece for cello by Max Richter:

GIF of wave form zoomed in to 1ms and zoomed back out to 1s.

And here's a pop song, with a more complex arrangement:

Or how about some of the analog electronics of Elmer Bernstein's eery soundtrack to (appropriately) the Eames brothers' 1970 short film, The Powers of Ten:

Finally, here's a beautiful one contributed by Vincent Lostanlen, a Shepard-Risset glissando:

it's y(t) = 2 y(2t) all the way down! pic.twitter.com/ztFR5ERWgA
— Vincent Lostanlen (@lostanlen) July 13, 2020

If you want to make your own, get the code here, or investigate below: