Novum Organum

Explorations in Interactive & Immersive Media

This page documents ongoing creative and technical work exploring the application of new tools in the context of interactive and immersive media.

As a starting point, I take work begun during Graduate Audio Production at the University of Colorado, Denver. A portion of that course was devoted to creating what we might call amorphous audio corpora. For instance, a session would be spent recording 20 or so minutes of drums, and another would be spent recording 20 or so minutes of synthesizers. No explicit attention was paid to how these sessions would relate, say, rhythmically or tonally. Then the challenge was to produce a consolidated piece of music out of the disparate results.

To “make sense” of the different audio corpora, I turned to machine learning tools I acquired from a summer course at Stanford’s Center for Computer Research in Music and Acoustics. Specifically, I employed tools from the Fluid Corpus Manipulation (or FluCoMa for short) library of objects developed by a multi-university team of researchers. 

In my case, I built what might be called a Novelty Slicer-Looper. This device (which can be downloaded here) scrubs through a corpus of audio and delineates novel sections and then loops those sections at a desired rate to create zones of musical stability. Novelty can be assessed along different dimensions, such as loudness or spectral centroid. Perhaps the most holistic and compact is that of mel-frequency cepstral coefficients (MFCCs). One spectral frame of audio can be effectively characterized with as few as 19 numbers. (The technique has been deployed on-chip in cellphones to optimize voice transmission.)

An additional feature of the Novelty Slicer-Looper is that a threshold of novelty can be set. For my purposes, I set the threshold such that it would delineate 5 sections. I did this because I wanted to map those sections onto a traditional pop structure of Intro(I)-Verse(II)-Chorus(III)-Verse(II)-Chorus(III)-Bridge(IV)-Verse(II)-Chorus(III)-Outro(V). To achieve this remapping, I performed a statistical analysis of the different sections delineated by the Novelty Slicer-Looper. Since the loudest section of a pop song is usually the chorus, I mapped the loudest section of the Novelty Slicer-Looper accordingly, and so on for the rest of the sections. Finally, the result can be heard here:

As I turn to my thesis research, I have begun to consider how machine learning tools could aid in the reconstruction of acoustic environments. My thesis research is geared toward creating a more dynamically navigable form of ambisonic impulse response technology. Traditionally, impulse responses (be they stereophonic or ambisonic) are recorded from a static position. While ambisonics allow for angular movement, they still do not allow for any translational movement. To allow for translational movement requires a radical rethink of how impulse responses are recorded and processed. Without going into too much technical detail at this stage, an analogy can be drawn to the great strides made in 3D graphics and photogrammetry. Most tools that derive navigable 3D visual scenes from discretized data streams (such as a collection of photos) employ tools from machine learning, such as neural radiance fields and Gaussian splatting. Even the basic mathematical process of convolutional reverb, essentially a form of sliding multiplication, can potentially be replaced by machine learning processes such as non-negative matrix factorization cross-synthesis and audio feature transport.