SpecAugment
warping the features, masking blocks of frequency channels, and masking blocks of time steps
AddBackgroundNoise
Mix in another sound, e.g. a background noise. A folder of (background noise) sounds to be mixed in must be specified [Datasets available]
ApplyImpulseResponse
Convolve the audio with a random impulse response. [Datasets available]
AddShortNoises
Mix in various (bursts of overlapping) sounds with random pauses between
RoomSimulator
The system simulates millions of different room dimensions, a wide distribution of reverberation time and signal-to-noise ratios, and a range of microphone and sound source locations
SpeedPerturbation
Change the speed of the audio signal, producing 3 versions of the original signal with speed factors of 0.9, 1.0 and 1.1
PitchShift
Pitch shift the sound up or down without changing the tempo
AddGaussianSNR
Add gaussian noise to the input. A random Signal to Noise Ratio (SNR) will be picked uniformly in the decibel scale. This aligns with human hearing, which is more logarithmic than linear.
SevenBandParametricEQ
Because this transform changes the timbre, but keeps the overall "class" of the sound the same (depending on application), it can be used for data augmentation to make ML models more robust to various frequency spectrums.
Vocal Tract Length Perturbation (VTLP)
Transforming spectrograms, using a random linear warping along the frequency dimension