Audio Visualisation with the Web Audio API

Monday, 29th December 2014

The Web Audio API has been evolving over the last couple of years and opens the web up to many possibilities with sound and music. It's still not perfect, different browsers behave in different ways, but we are getting there. I know it's not something everybody likes, and annoying, pointless audio can cause fits of rage in some users. We've all experienced that 'Which fucking tab is playing music?' moment before, and it seems like a remnant of the Flash era. But how is a Musician or Sound Designer meant to showcase their work in fun and interesting ways on the web?

In this article we are going to create an Audio Visualisation using the DOM and the Web Audio API. Because of some of the issues with Firefox not correctly handling CORS, and Safari seemingly reporting no signal (a value of 128) across the board when requesting byte data on an AnalyserNode, this demo is Chrome only.

Take a look at the demo.

The Audio component

First off we are going to create the audio component of our demo. We need to have a look at the possibilities of preloading or streaming files and processing the audio data as it plays.

Creating the Audio Context

The AudioContext is the main backbone of the Web Audio API, and an interface that handles the creation and processing of individual audio nodes. First things first, we will initialise an over-arching AudioContext for our build. We will later use this context to create both buffer source and script processor audio nodes which we will 'wire' together.

/* Setup an AudioContext, default to the non-prefixed version if possible. */
var context;

/* Try instantiating a new AudioContext, throw an error if it fails. */
try {
    /* Setup an AudioContext. */
    context = new AudioContext();
} catch(e) {
    throw new Error('The Web Audio API is unavailable');

Preloading MP3 over XHR

Since the second iteration of XMLHttpRequest we have been able to do some rather funky things with fetching data from the server. In this instance we are going to request an .mp3 audio file as an ArrayBuffer, which makes it infinitely easier to interact with the Web Audio API.

/* Create a new XHR object. */
var xhr = new XMLHttpRequest();
/* Open a GET request connection to the .mp3 */'GET', '/path/to/audio.mp3', true);
/* Set the XHR responseType to arraybuffer */
xhr.responseType = 'arraybuffer';
xhr.onload = function() {
    /* The files arraybuffer is available at xhr.response */

In the XHR's onload handler, the array buffer of the file will be available in the response property, not the usual responseText. Now that we have that array buffer we can continue and create a buffer source on the audio context. First we will need to use the audio context's decodeAudioData asynchronous method to convert our ArrayBuffer into an AudioBuffer.

/* Hoist the buffer source to the top of our demo. */
var sound;

xhr.onload = function() {
    sound = context.createBufferSource();

    context.decodeAudioData(xhr.response, function(buffer) {
        /* Set the buffer to our decoded AudioBuffer. */
        sound.buffer = buffer;
        /* Wire the AudioBufferSourceNode into the AudioContext */

At this point, we now have an AudioBufferSourceNode with our array buffer. By simply calling sound.start() at the end of the decodeAudioData callback, the sound should play.

This method of preloading files over XHR is all very well and good for small files, but perhaps we don't want the user to have to wait until the whole file is downloaded before we start playing. Which leads us into a slightly different method which allows us to use the streaming capabilities of the HTMLMediaElement.

Streaming with the HTML Media Element

For streaming we can use an <audio> element instantiated from JavaScript. Using the createMediaElementSource method, we can connect our audio element directly into our context whilst still retaining the HTMLMediaElement API methods such as play() and pause(). Rather than waiting for our file to be fully available using the canplaythrough event, we listen to the canplay event to find out as soon as enough data is downloaded for the file to be played for at least a few frames.

/* Hoist our MediaElementAudioSourceNode variable. */
var sound,
    /* Instatiate a new `<audio>` element. Although Chrome supports `new Audio()`,
     * Firefox requires the element to be created with `createElement`. */
    audio = new Audio();

/* Add a `canplay` event handler for when the file is ready to start. */
audio.addEventListener('canplay', function() {
    /* Now that the file `canplay`, create a 
     * MediaElementAudioSourceNode from the `<audio>` element. */
    sound = context.createMediaElementSource(audio);
    /* Wire the MediaElementAudioSourceNode into the AudioContext */
    /* Here we use `play` on the `<audio>` element instead
     * of `start` on the MediaElementAudioSourceNode. */;
audio.src = '/path/to/audio.mp3';

This method reduces a lot of code and is more suitable for our demo's implementation, so let's clean this whole thing up with some promises and a bit of a Class definition of a Sound.

/* Hoist some variables. */
var audio, context;

/* Try instantiating a new AudioContext, throw an error if it fails. */
try {
    /* Setup an AudioContext. */
    context = new AudioContext();
} catch(e) {
    throw new Error('The Web Audio API is unavailable');

/* Define a `Sound` Class */
var Sound = {
    /* Give the sound an element property initially undefined. */
    element: undefined,
    /* Define a class method of play which instantiates a new Media Element
     * Source each time the file plays, once the file has completed disconnect 
     * and destroy the media element source. */
    play: function() { 
        var sound = context.createMediaElementSource(this.element);
        this.element.onended = function() {
            sound = null;

        /* Call `play` on the MediaElement. */;

/* Create an async function which returns a promise of a playable audio element. */
function loadAudioElement(url) {
    return new Promise(function(resolve, reject) {
        var audio = new Audio();
        audio.addEventListener('canplay', function() {
            /* Resolve the promise, passing through the element. */
        /* Reject the promise on an error. */
        audio.addEventListener('error', reject);
        audio.src = url;

/* Let's load our file. */
loadAudioElement('/path/to/audio.mp3').then(function(elem) {
    /* Instantiate the Sound class into our hoisted variable. */
    audio = Object.create(Sound);
    /* Set the element of `audio` to our MediaElement. */
    audio.element = elem;
    /* Immediately play the file. */;
}, function(elem) {
    /* Let's throw an the error from the MediaElement if it fails. */
    throw elem.error;

Now we have our file playing, let's go ahead and start attempting to get the frequency data from our audio.

Implement Audio Processing

To start listening to the live data from the audio context as our file plays, we need to wire up two separate audio nodes. These nodes can be defined from the start as soon as we have created the audio context. The first we create is a ScriptProcessorNode which is an interface that allows us to process the audio, the second, an AnalyserNode which provides us with real-time frequency and waveform/time domain analysis information.

/* Hoist some variables. */
var audio,
    context = new (window.AudioContext ||
                   window.webAudioContext ||
    /* Create a script processor node with a `bufferSize` of 1024. */
    processor = context.createScriptProcessor(1024),
    /* Create an analyser node */
    analyser = context.createAnalyser();

/* Wire the processor into our audio context. */
/* Wire the analyser into the processor */

/* Define a Uint8Array to receive the analysers data. */
var data = new Uint8Array(analyser.frequencyBinCount);

Now we have defined our analyser node and a data array, we have to make a slight change to our Sound class definition. Instead of wiring the audio's media element source only into our audio context, we now should wire through our analyser as well. We should also add an audioprocess handler to the processor node at this point and remove it when the file ends.

/* Removed for brevity... */
play: function() { 
    var sound = context.createMediaElementSource(this.element);
    this.element.onended = function() {
        sound = null;
        /* Noop the audioprocess handler when the file finishes. */
        processor.onaudioprocess = function() {};
    /* Add the following line to wire into the analyser. */

    processor.onaudioprocess = function() {
        /* Populate the data array with the frequency data. */
    /* Call `play` on the MediaElement. */;

This now means the audio nodes are wired up in the following way:

MediaElementSourceNode \=> AnalyserNode => ScriptProcessorNode /=> AudioContext

If you were simply to add a console.log(data) to the end of the audioprocess handler you would see a number of large arrays of integers fill up the console pretty quickly. This is exactly the data we are interested in.

To get the Frequency data instead, we would simply change the line in the audioprocess handler to:

/* Populate the data array with the waveform data. */

Frequency Data vs. Waveform/Time Domain Data

There are actually 4 different methods that the AnalyserNode gives us. Two of these methods relate to the Frequency Data and two to the Waveform or Time Domain Data. Each of these two sets of data can be copied from the analyser as either byte or float data. This means however that our typed array will have to be of the correct type relating to what we choose. In terms of byte data, as you have seen, we should provide an unsigned byte array, but with float data, we need to provide a Float32Array.

I have found that the Waveform/Time Domain data provides a much smoother output, where as the Frequency data provides a much more visual representation of the idiosyncrasies of the audio. It is worth experimenting with these outputs and seeing what ways you can transform data to suit your visual representation.

The visual component

Now we have all that audio guff out of the way, the fun and experimentation begins. There is so much potential for the use of that data to manipulate elements on the page, whether it be altering paths of an SVG or affecting the birth of new particles on a Canvas. However, for this demo, we are going to create the visualisation using DOM nodes and requestAnimationFrame. This means we have quite a lot of versatility with our output. For the benefit of performance we should only use compositable CSS properties such as transform and opacity.

The initial setup

First lets add an image to our document and setup some CSS. In the case of the Fourth of 5 logo, it is a transparent SVG and the circular background is created using a border-radius in CSS.

<div class="logo-container">
    <img class="logo" src="/path/to/image.svg"/>
.logo-container, .logo, .container, .clone {
    width: 300px;
    height: 300px;
    position: absolute;
    top: 0; bottom: 0;
    left: 0; right: 0;
    margin: auto;

.logo-container, .clone {
    background: black;
    border-radius: 200px;

.mask {
    overflow: hidden;
    will-change: transform;
    position: absolute;
    transform: none;
    top: 0; left: 0;

Essentially, we are going to slice up the image into a number of columns or slices. We should define the number of slices we want as a constant and then look into cloning the element that number of times. Let's start setting up the JavaScript:

/* Start of the visual component, let's define some constants. */
var NUM_OF_SLICES = 300,
    /* The `STEP` constant allows us to step through all the data we receive,
     * instead of just the first `NUM_OF_SLICES` elements in the array. */
    STEP = Math.floor(data.length / NUM_OF_SLICES),
    /* When the analyser receives no data, all values in the array will be 128. */
    NO_SIGNAL = 128;

/* Get the element we want to 'slice'. */
var logo = document.querySelector('.logo-container');

/* We need to store our 'slices' to interact with them later. */
var slices = []
    rect = logo.getBoundingClientRect(),
    /* Thankfully Chrome supplies us a width and
     * height property in our `TextRectangle` object. */
    width = rect.width,
    height = rect.height,
    widthPerSlice = width / NUM_OF_SLICES;

/* Create a container `<div>` to hold our 'slices'. */
var container = document.createElement('div');
container.className = 'container'; = width + 'px'; = height + 'px';

Creating the 'slices'

For each 'slice' we want to create a mask of the original element with a width of our widthPerSlice and offset it on the x-axis based on it's index in the array.

You will notice that in the mask elements instance, we will use a 2-dimensional CSS matrix rather than the usual transform helper functions. I have found, particularly vector or DOM nodes, seem to blur as they scale, suggesting the browser caches them at a specific size and scales that cached version. To prevent this artefacting we manually define a matrix.

/* Let's create our 'slices'. */
for (var i = 0; i < NUM_OF_SLICES; i++) {
    /* Calculate the `offset` for each individual 'slice'. */
    var offset = i * widthPerSlice;

    /* Create a mask `<div>` for this 'slice'. */
    var mask = document.createElement('div');
    mask.className = 'mask'; = widthPerSlice + 'px';
    /* For the best performance, and to prevent artefacting when we
     * use `scale` we instead use a 2d `matrix` that is in the form:
     * matrix(scaleX, 0, 0, scaleY, translateX, translateY). We initially
     * translate by the `offset` on the x-axis. */ = 'matrix(1,0,0,1,' + offset + '0)';

    /* Clone the original element. */
    var clone = logo.cloneNode(true);
    clone.className = 'clone'; = width + 'px';
    /* We won't be changing this transform so we don't need to use a matrix. */ = 'translate3d(' + -offset + 'px,0,0)'; = = height + 'px';


    /* We need to maintain the `offset` for when we
     * alter the transform in `requestAnimationFrame`. */
    slices.push({ offset: offset, elem: mask });

/* Replace the original element with our new container of 'slices'. */
document.body.replaceChild(container, logo);

We should now see... nothing different. What we have done is replaced our original element with 300 separate 'slices' that should line up to create the original element again. It's easier to see this under the hood, look at the DOM tree to see our 300 .mask elements. These are the elements which we are going to affect with data.

Defining our render function

Although our audioprocess handler is receiving data very quickly, we dont want to overwhelm the browser with too many composition changes. So we hold on to the data until the browser reports its availability for another paint. That's where using requestAnimationFrame comes in.

/* Create our `render` function to be called every available frame. */
function render() {
    /* Request a `render` on the next available frame.
     * No need to polyfill because we are in Chrome. */

    /* Loop through our 'slices' and use the STEP(n) data from the
     * analysers data. */
    for (var i = 0, n = 0; i < NUM_OF_SLICES; i++, n+=STEP) {
        var slice = slices[i],
            elem = slice.elem,
            offset = slice.offset;

        /* Make sure the val is positive and divide it by `NO_SIGNAL`
         * to get a value suitable for use on the Y scale. */
        var val = Math.abs(data[n]) / NO_SIGNAL;
        /* Change the scaleY value of our 'slice', while keeping it's
         * original offset on the x-axis. */ = 'matrix(1,0,0,' + val + ',' + offset + ',0)'; = val;

/* Call the `render` function initially. */

That's it! You should have a fancy jiggling DOM construct! It's now up to you to start experimenting with what you have.

In terms of performance, open dev tools and under Rendering options turn on 'Show paint rectangles' and 'Show composited layer borders'. Because we have used only compositable CSS properties, you can see that after an initial flash of green from the paint rectangles, there should be no more.

At this point, we have enough code to start thinking about pulling individual components of this into separate module definitions using something like browserify. However, I will leave that up to you.