Spectral Smoothing for Music Information Retrieval

Spectral smoothing is a technique used to remove noise from signals to gauge trends more easily.

What is spectral smoothing? Why?

Basic spectrogram of a JTTP submission

Spectral smoothing filters noise from the signal either through the use of a lowpass filter or rolling windows. The uses vary in different fields, for example, in medicine this technique is used to filter noise from rapid movement muscles such as those used for blinking. In finance, spectral smoothing allows traders and strategists to gain insight about noisy, rapidly changing, investments. The reason this is the rough equivalent to lowpass filtering is because of the goal to remove rapid fluctuations which is the same as higher spectrum frequencies. Scientists take into consideration windowing for both approaches which takes discrete samples of the signal for digital representation adding its own effects. In addition, rolling windows aggregations such as means, sums, and others taken from a specific point in the time period help preserve some data.

Method of exploration

For the JTTP – Jeu de temps / Time Play contest we wanted to learn and educate ourselves from past winning audio pieces. With different exploratory analyses, it is possible to find informative representations of the differences between winning competitions and those who do not. For our piece DigiTral we collected many of our favorite compositions to build a classifier. The classifier was by no means a success, however, we did find some clues about important features. Spectral centroids seemed to have a large effect on the classifiers’ ability to discriminate between both classes.

We wanted to find a way to see as humans what differences might be blatantly obvious but at first glance, a spectrogram was not enough to simplify the choice between a winner and non-winner without audio. The second attempt plotted the spectral centroid superimposed above the spectrogram.

The spectrogram and spectral centroid pattern of a selected piece.

The winner shows a wide range of spectral centroid. The spectrograph has more pockets with no gain. Overall this, and a steep crescendo from 4m10 to 5m00 seem to be the largest difference from the right image.

The spectrogram and spectral centroid pattern of a non selected piece.

The non winner shows a narrower dynamic to the spectral centroid with less black pockets, therefore more consistency within the spectrum.

The two basic images on the points of spectrum and spectral centroid seem to have a few differences. The spectral centroids seem to be around the critical hearing frequencies. This range was established by Bell Labs to make telephones more efficient and states that human hearing is most sensitive between 1 kHz and 5 kHz (somewhere around there). To know more, check out the Fletcher-Munson Curve. Apart from that, we see the winner has a more dynamic spectral centroid with more pockets within the spectrum throughout the timeline. Low frequencies are removed just before the 1m40 mark while high frequencies can be seen to leave around 4m10 and crescendo back in at 5m00. The play on dynamics over time may be a clue to what cues judges rely on for scoring submissions. To get a clear view of the difference, let’s smooth the spectral centroid.

In conclusion, we can see by smoothing the spectral centroid that more background information (spectrogram) is visible. In combination, more information can be digested at one glance. After smoothing, the dynamics of the spectral centroid in the selected piece become even more noticeable compared to the non selected piece. Because averaging the signal with a window of 60 periods (samples) cleared up the noise, we can tell that at a micro time frame the composition has greater variation. An easy assessment would be that both short term spectral changes and structural sections with silence or removed low frequencies could work to our advantage. Without more than these four spectrograms to compare and some basic mathematical test, it will be difficult to assess whether such objective measurements can be taken as concrete wisdom.

What we decided to do was take the average of all audio signals over the length of the shortest JTTP 2013 winners. Having no idea what would come out, the audio file was rendered to evaluate pictures and sound. We don’t want to post the full average for two reasons.

  • The audio is too similar to that of the winners and approaches a boundary that may be harming their work
  • We want to encourage people to visit the Sonus website and support the artists
Sample of averaged signal from JTTP 2013 Winners

If you are interested in finding out more about this genre of music please check out the JTTP 2013 Winners page.

For now, we are piloting this series, and in the future, we hope to bring to light a list of usable considerations for composing high quality experimental and acousmatic art. If you would like to see more please express some interest through your support!