mcsm-benchs: Creating benchmarks of MCS Methods

We introduce a public, open-source, Python-based toolbox for benchmarking multi-component signal analysis methods, implemented either in Python or Matlab.

The goal of this toolbox is providing the signal-processing community with a common framework that allows researcher-independent comparisons between methods and favors reproducible research.

With the purpose of making this toolbox more useful, the methods to compare, the tests, the signal generation code and the performance evaluation functions were conceived as different modules, so that one can modify them independently. The only restriction this pose is that the methods should satisfy some requirements regarding the shape of their input and output parameters.

On the one hand, the tests and the performance evaluation functions, are encapsulated in the class Benchmark. On the other hand, the signals used in this benchmark are generated by the methods in the class SignalBank.

In order to compare different methods with possibly different parameters, we need to set up a few things before running the benchmark. A Benchmark object receives some input parameters to configure the test:

  • task: This could be 'denoising' or 'detection','component_denoising' or 'inst_frequency'. The first one compute the quality reconstruction factor (QRF) using the output of the method, whereas the second simply consist in detecting whether a signal is present or not. Finally, 'component_denoising' compares the QRF component wise, and ‘inst_frequency' computes the mean squared error between estimations of the instantaneous frequency

  • N: The length of the simulation, i.e. how many samples should the signals have.

  • methods: A dictionary of methods. Each entry of this dictionary corresponds to the function that implements each of the desired methods.

  • parameters: A dictionary of parameters. Each entry of this dictionary corresponds to iterator with positional and/or keyword arguments. In order to know which parameters should be passed to each method, the keys of this dictionary should be the same as those corresponding to the individual methods in the corresponding dictionary. An example of this is showed in below.

  • SNRin: A list or tuple of values of SNR to test.

  • repetitions: The number of times the experiment should be repeated with different realizations of noise.

  • signal_ids: A list of signal ids (corresponding to the names of the signal in the class ‘SignalBank’) can be passed here in order to test the methods on those signals. Optionally, the user can pass a dictionary where each key is used as an identifier, and the corresponding value can be a numpy array with a personalized signal.

A dummy test

First let us define a dummy method for testing. Methods should receive a numpy array with shape (N,) where N is the number of time samples of the signal. Additionally, they can receive any number of positional or keyword arguments to allow testing different combinations of input parameters. The shape of the output depends on the task (signal denoising or detection). So the recommended signature of a method should be the following:

output = a_method(noisy_signal, *args, **kwargs).

If one set task='denoising', output shoud be a (N,) numpy array, i.e. the same shape as the input parameter noisy_signal, whereas if task='detection', the output should be boolean (0 or False for no signal, and 1 or True otherwise).

After this, we need to create a dictionary of methods to pass the Benchmark object at the moment of instantiation.

[4]:
import numpy as np
from numpy import pi as pi
import pandas as pd
from matplotlib import pyplot as plt
from mcsm_benchs.Benchmark import Benchmark
from mcsm_benchs.ResultsInterpreter import ResultsInterpreter
from mcsm_benchs.SignalBank import SignalBank
from utils import spectrogram_thresholding, get_stft

Creating a dictionary of methods

Let’s create a dictionary of methods to benchmark. As as example, we will compare two strategies for spectrogram thresholding. The first one is hard thresholding, in which the thresholding function is defined as: The second one is soft thresholding, here defined as:

These two approaches are implemented in the python function thresholding(signal, lam, fun='hard') function, which receives a signal to clean, a positional argument lam and a keyword argument fun that can be either hard or soft.

Our dictionary of methods will consist then in two methods: hard thresholding and soft thresholding. For both approaches, let’s use a value of lam=1.0 for now.

[5]:

def method_1(noisy_signal, *args, **kwargs): # If additional input parameters are needed, they can be passed in a tuple using # *args or **kwargs and then parsed. xr = spectrogram_thresholding(noisy_signal,1.0,fun='hard') return xr def method_2(noisy_signal, *args, **kwargs): # If additional input parameters are needed, they can be passed in a tuple using # *args or **kwargs and then parsed. xr = spectrogram_thresholding(noisy_signal,2.0,fun='soft') return xr # Create a dictionary of the methods to test. my_methods = { 'Method 1': method_1, 'Method 2': method_2, }

The variable params in the example above allows us to pass some parameters to our method. This would be useful for testing a single method with several combination of input parameters. In order to do this, we should give the Benchmark object a dictionary of parameters. An example of this functionality is showed in the next section. For now, lets set the input parameter parameters = None.

Now we are ready to instantiate a Benchmark object and run a test using the proposed methods and parameters. The benchmark constructor receives a name of a task (which defines the performance function of the test), a dictionary of the methods to test, the desired length of the signals used in the simulation, a dictionary of different parameters that should be passed to the methods, an array with different values of SNR to test, and the number of repetitions that should be used for each test. Once the object is created, use the class method run_test() to start the experiments.

Remark 1: You can use the ``verbosity`` parameter to show less or more messages during the progress of the experiments. There are 6 levels of verbosity, from ``verbosity=0`` (indicate just the start and the end of the experiments) to ``verbostiy = 5`` (show each method and parameter progress)

Remark 2: Parallelize the experiments is also possible by passing the parameter ``parallelize = True``.

[6]:
benchmark = Benchmark(task = 'denoising',
                        methods = my_methods,
                        N = 256,
                        SNRin = [10,20],
                        repetitions = 1000,
                        signal_ids=['LinearChirp', 'CosChirp',],
                        verbosity=0,
                        parallelize=False)

benchmark.run_test() # Run the test. my_results is a dictionary with the results for each of the variables of the simulation.
benchmark.save_to_file('saved_benchmark') # Save the benchmark to a file.
Method run_test() will be deprecated in newer versions. Use run() instead.
Running benchmark...
100%|██████████| 2/2 [00:00<00:00,  2.17it/s]
100%|██████████| 2/2 [00:00<00:00,  2.19it/s]
[6]:
True
[7]:
same_benchmark = Benchmark.load_benchmark('saved_benchmark') # Load the benchmark from a file.

Now we have the results of the test in a nested dictionary called my_results. In order to get the results in a human-readable way using a DataFrame, and also for further analysis and reproducibility, we can use the class method get_results_as_df().

[8]:
results_df = benchmark.get_results_as_df() # This formats the results on a DataFrame
results_df
[8]:
Method Parameter Signal_id Repetition 10 20
2000 Method 1 ((), {}) CosChirp 0 11.886836 22.377151
2001 Method 1 ((), {}) CosChirp 1 12.722813 22.764552
2002 Method 1 ((), {}) CosChirp 2 12.397020 22.369574
2003 Method 1 ((), {}) CosChirp 3 11.548922 22.221890
2004 Method 1 ((), {}) CosChirp 4 12.083445 22.151975
... ... ... ... ... ... ...
1995 Method 2 ((), {}) LinearChirp 995 6.838226 17.240144
1996 Method 2 ((), {}) LinearChirp 996 6.713695 18.924030
1997 Method 2 ((), {}) LinearChirp 997 6.418767 18.474438
1998 Method 2 ((), {}) LinearChirp 998 6.711867 18.170891
1999 Method 2 ((), {}) LinearChirp 999 7.773640 19.012253

4000 rows × 6 columns

[9]:
sb = SignalBank(N=1024)
s = sb.signal_linear_chirp()
noise = np.random.randn(1024)
x = Benchmark.sigmerge(s, noise, 15)

xr = spectrogram_thresholding(x,1.0,'hard')

fig,ax = plt.subplots(1,1)
ax.plot(s,label='Original Signal')
ax.plot(xr,alpha=0.5,label='Recovered Signal')
ax.legend()
[9]:
<matplotlib.legend.Legend at 0x73e23f1644f0>
../_images/notebooks_demo_benchmark_11_1.png

As we can see, the DataFrame show the results ordered by columns. The first column corresponds to the method identification, and the values are taken from the keys of the dictionary of methods. The second column enumerates the parameters used (more on this on the next section). The third column corresponds to the signal identification, using the signal identification values from the SignalBank class. The next column shows the number of repetition of the experiment. Finally, the remaining columns show the results obtained for the SNR values used for each experiment. Since task = 'denoising', these values correspond to the QRF computed as QRF = 10*np.log10(E(s)/E(s-sr)), where E(x) is the energy of x, and s and sr are the noiseless signal and the reconstructed signal respectively.

Passing different parameters to the methods.

It is common that a method depends on certain input parameters (thresholds, multiplicative factors, etc). Therefore, it would be useful that the tests could also be repeated with different parameters, instead of creating multiple versions of one method. We can pass an array of parameters to a method provided it parses them internally. In order to indicate the benchmark which parameters combinations should be given to each method, a dictionary of parameters can be given.

Let us now create this dictionary. The parameters combinations should be given in a tuple of tuples, so that each internal tuple is passed as the additional parameter (the corresponding method, of course, should implement how to deal with the variable number of input parameters). For this to work, the keys of this dictionary should be the same as those of the methods dictionary.

We can now see more in detail how to pass different parameters to our methods. For instance, let’s consider a function that depends on two thresholds thr1 (a positional argument) and thr2 (a keyword argument):

Now let us create a method that wraps the previous function and then define the dictionary of methods for our benchmark. Notice that the method should distribute the parameters in the tuple params.

[10]:
def method_1(noisy_signal, *args, **kwargs):
    # If additional input parameters are needed, they can be passed in a tuple using
    # *args or **kwargs and then parsed.
    xr = spectrogram_thresholding(noisy_signal,*args,**kwargs)
    return xr
    # aaa = np.random.rand()
    # if aaa>0.5:
    #     return True
    # else:
    #     return False


def method_2(noisy_signal, *args, **kwargs):
    # If additional input parameters are needed, they can be passed in a tuple using
    # *args or **kwargs and then parsed.
    # print(np.std(noisy_signal))
    xr = spectrogram_thresholding(noisy_signal,*args,**kwargs)
    return xr
    # aaa = np.random.rand()
    # if aaa>0.2:
    #     return True
    # else:
    #     return False

# Create a dictionary of the methods to test.
my_methods = {
    'Method 1': method_1,
    'Method 2': method_2,
    }

Having done this, we can define the different combinations of parameters using the corresponding dictionary:

[11]:
# Create a dictionary of the different combinations of thresholds to test.
# Remember the keys of this dictionary should be same as the methods dictionary.
my_parameters = {
    'Method 1': [((thr,),{'fun': 'hard',}) for thr in np.arange(1.0,4.0,1.0)],
    'Method 2': [((thr,),{'fun': 'soft',}) for thr in np.arange(1.0,4.0,1.0)],
}

print(my_parameters['Method 1'])
[((1.0,), {'fun': 'hard'}), ((2.0,), {'fun': 'hard'}), ((3.0,), {'fun': 'hard'})]

So now we have four combinations of input parameters for another_method(), that will be passed one by one to the method so that all the experiments will be carried out for each of the combinations. Let us set the benchmark and run a test using this new configuration of methods and parameters. After that, we can use the Benchmark class method get_results_as_df() to obtain a table with the results as before:

[12]:
benchmark = Benchmark(task = 'denoising',
                    methods = my_methods,
                    parameters=my_parameters,
                    N = 256,
                    SNRin = [10,20,30],
                    repetitions = 15,
                    signal_ids=['LinearChirp', 'CosChirp',],
                    verbosity=0,
                    parallelize=False,
                    write_log=True,
                    )

benchmark.run() # Run the benchmark.
benchmark.save_to_file('saved_benchmark') # Save the benchmark to a file.
Running benchmark...
100%|██████████| 3/3 [00:00<00:00, 41.75it/s]
100%|██████████| 3/3 [00:00<00:00, 35.88it/s]
[12]:
True
[13]:
benchmark = Benchmark.load_benchmark('saved_benchmark')
benchmark.log
[13]:
[]

The experiments have been repeated for every combination of parameters, listed in the second column of the table as Parameter.

[14]:
results_df = benchmark.get_results_as_df() # This formats the results on a DataFrame

Generating plots with the Results Interpreter.

[15]:
# Summary interactive plots with Plotly and a report.
from plotly.offline import  iplot
interpreter = ResultsInterpreter(benchmark)
interpreter.save_report(bars=True)

[15]:
True
[16]:

figs = interpreter.get_summary_plotlys(bars=True) for fig in figs: iplot(fig)

Data type cannot be displayed: application/vnd.plotly.v1+json

Data type cannot be displayed: application/vnd.plotly.v1+json

Checking elapsed time for each method

[17]:
df = interpreter.elapsed_time_summary()
df
[17]:
Average time (s) Std
CosChirp-Method 1-((1.0,), {'fun': 'hard'}) 0.000252 0.000106
CosChirp-Method 1-((2.0,), {'fun': 'hard'}) 0.000196 0.000015
CosChirp-Method 1-((3.0,), {'fun': 'hard'}) 0.000198 0.000019
CosChirp-Method 2-((1.0,), {'fun': 'soft'}) 0.000207 0.000015
CosChirp-Method 2-((2.0,), {'fun': 'soft'}) 0.000178 0.000007
CosChirp-Method 2-((3.0,), {'fun': 'soft'}) 0.000253 0.000024
LinearChirp-Method 1-((1.0,), {'fun': 'hard'}) 0.000169 0.000015
LinearChirp-Method 1-((2.0,), {'fun': 'hard'}) 0.000233 0.000087
LinearChirp-Method 1-((3.0,), {'fun': 'hard'}) 0.000220 0.000054
LinearChirp-Method 2-((1.0,), {'fun': 'soft'}) 0.000209 0.000015
LinearChirp-Method 2-((2.0,), {'fun': 'soft'}) 0.000210 0.000011
LinearChirp-Method 2-((3.0,), {'fun': 'soft'}) 0.000226 0.000036
[18]:
interpreter.rearrange_data_frame()
[18]:
SNRin Method Parameter Signal_id Repetition QRF
0 10 Method 1 ((1.0,), {'fun': 'hard'}) CosChirp 0 11.886836
1 10 Method 1 ((1.0,), {'fun': 'hard'}) CosChirp 1 12.722813
2 10 Method 1 ((1.0,), {'fun': 'hard'}) CosChirp 2 12.397020
3 10 Method 1 ((1.0,), {'fun': 'hard'}) CosChirp 3 11.548922
4 10 Method 1 ((1.0,), {'fun': 'hard'}) CosChirp 4 12.083445
... ... ... ... ... ... ...
535 30 Method 2 ((3.0,), {'fun': 'soft'}) LinearChirp 10 25.606940
536 30 Method 2 ((3.0,), {'fun': 'soft'}) LinearChirp 11 25.779403
537 30 Method 2 ((3.0,), {'fun': 'soft'}) LinearChirp 12 25.338387
538 30 Method 2 ((3.0,), {'fun': 'soft'}) LinearChirp 13 23.843373
539 30 Method 2 ((3.0,), {'fun': 'soft'}) LinearChirp 14 25.568327

540 rows × 6 columns