Jan Van Balen

research · notes · thesis · press · talks · github

What a WAV file looks like

2020-07-10

When the Wavenet paper was first published, it was accompanied by a blog post that featured the following helpful gif:

GIF of wave form zoomed in and zoomed back out.

I'd been noticing that pretty much every other blog post about wavenet, since, re-used the figure. It appears to be helpful in conveying, to people with some experience in machine learning but perhaps not with audio data, how impressively dense audio data actually is. In particular, it suggests that modeling audio in the time domain, shown here (as opposed to time-frequence domain), might just be really difficult.

Earlier today, Ethan Hein shared on Twitter that he's also using this figure in his teaching:


However, I thought it might be even more useful if it also reflected that, when audio is digitized, it's not just the time axis that discretized, but also the amplitude. So I made version of the figure that reflects that a little better.

Here's a few seconds of a piece for cello by Max Richter:

GIF of wave form zoomed in to 1ms and zoomed back out to 1s.

And here's a pop song, with a more complex arrangement:

GIF of wave form zoomed in to 1ms and zoomed back out to 1s.

Or how about some of the analog electronics of Elmer Bernstein's eery soundtrack to (appropriately) the Eames brothers' 1970 short film, The Powers of Ten:

GIF of wave form zoomed in to 1ms and zoomed back out to 1s. GIF of wave form zoomed in to 1ms and zoomed back out to 1s.


Finally, here's a beautiful one contributed by Vincent Lostanlen, a Shepard-Risset glissando:


If you want to make your own, get the code here, or investigate below: