Can aubio be used to detect rhythm-only segments?

122 Views Asked by At

Does aubio have a way to detect sections of a piece of audio that lack tonal elements -- rhythm only? I tested a piece of music that has 16 seconds of rhythm at the start, but all the aubiopitch and aubionotes algorithms seemed to detect tonality during the rhythmic section. Could it be tuned somehow to distinguish tonal from non-tonal onsets? Or is there a related library that can do this?

2

There are 2 best solutions below

1
On

Use a spectrum analyser to detect sections with high amplitude. If you program - you could take each section and make an average of the freqencies (and amplitudes) present to give you an idea of the instrument(s) involved in creating that amplitude peak.

Hope that helps - if you're using python I could give you some pointers how to program this!?

Regards

Tony

1
On

Been busy the past couple of days - but started looking into this today...

It'll take a while to perfect I guess but I thought I'd give you a few thoughts and some code I've started working on to attack this!

Firstly, pseudo code's a good way to design an initial method.

1/ use import matplotlib.pyplot as plt to spectrum analyse the audio, and plot various fft and audio signals.

2/ import numpy as np for basic array-like structure handling.

(I know this is more than pseudo code, but hey :-)

3/ plt.specgram creates spectral maps of your audio. Apart from the image it creates (which can be used to start to manually deconstruct your audio file), it returns 4 structures.

eg

ffts,freqs,times,img = plt.specgram(signal,Fs=44100)

ffts is a 2 dimentional array where the columns are the ffts (Fast Fourier Transforms) of the time sections (rows).

The plain vanilla specgram analyses time sections of 256 samples long, stepping 128 samples forwards each time.

This gives a very low resolution frequency array at a pretty fast rate.

As musical notes merge into a single sound when played at more or less 10 hz, I decided to use the specgram options to divide the audio into 4096 sample lengths (circa 10 hz) stepping forwards every 2048 samples (ie 20 times a second).

This gives a decent frequency resolution, and the time sections being 20th sec apart are faster than people can perceive individual notes.

This means calling the specgram as follows:

plt.specgram(signal,Fs=44100,NFFT=4096,noverlap=2048,mode='magnitude')

(Note the mode - this seems to give me amplitudes of between 0 - 0.1: I have a problem with fft not giving me amplitudes of the same scale as the audio signal (you may have seen the question I posted). But here we are...

4/ Next I decided to get rid of noise in the ffts returned. This means we can concentrate on freqs of a decent amplitude, and zero out the noise which is always present in ffts (in my experience).

Here is (are) my function(s):

def gate(signal,minAmplitude):
    return np.array([int((((a-minAmplitude)+abs(a-minAmplitude))/2) > 0) * a for a in signal])

Looks a bit crazy - and I'm sure a proper mathematician could come up with something more efficient - but this is the best I could invent. It zeros any freqencies of amplitude less than minAmplitude.

This is the relevant code to call it from the ffts returned by plt.specgram as follows, my function is more involved as it is part of a class, and has other functions it references - but this should be enough:

def fft_noise_gate(minAmplitude=0.001,check=True):
    '''
    zero the amplitudes of frequencies 
    with amplitudes below minAmplitude 
    across self.ffts
    check - plot middle fft just because!
    '''       
    nffts = ffts.shape[1]
    gated_ffts = []
    for f in range(nffts):
        fft = ffts[...,f]
        # Anyone got a more efficient noise gate formula? Best I could think up!
        fft_gated = gate(fft,minAmplitude)
        gated_ffts.append(fft_gated)
    ffts = np.array(gated_ffts)
    if check:
        # plot middle fft just to see!
        plt.plot(ffts[int(nffts/2)])
        plt.show(block=False)
    return ffts

This should give you a start I'm still working on it and will get back to you when I've got further - but if you have any ideas, please share them.

Any way my strategy from here is to:

1/ find the peaks ( ie start of any sounds) then 2/ Look for ranges of frequencies which rise and fall in unison (ie make up a sound).

And

3/ Differentiate them into individual instruments (sound sources more specifically), and plot the times and amplitudes thereof to create your analysis (score).

Hope you're having fun with it - I know I am.

As I said any thoughts...

Regards

Tony