AI4Bharat

Explore

Gallery

AI4Bharat

IndicASR

Data Augmentation

Collection of data augmentation libraries -

https://github.com/AgaMiko/data-augmentation-review/blob/master/README.md#Audio⁠

⁠

Audiomentations (
`https://github.com/iver56/audiomentations`⁠
)

Implements several waveform and spectrogram transform techniques

Supports CPU and GPU

Speech Augmentation Techniques

Speech Augmentation Techniques

Name

Description

Implementation

Paper (if any)

Augmentation Type

SpecAugment

warping the features, masking blocks of frequency channels, and masking blocks of time steps

Spectrogram

AddBackgroundNoise

Mix in another sound, e.g. a background noise. A folder of (background noise) sounds to be mixed in must be specified [Datasets available]

Waveform

ApplyImpulseResponse

Convolve the audio with a random impulse response. [Datasets available]

Waveform

AddShortNoises

Mix in various (bursts of overlapping) sounds with random pauses between

Waveform

RoomSimulator

The system simulates millions of different room dimensions, a wide distribution of reverberation time and signal-to-noise ratios, and a range of microphone and sound source locations

Waveform

SpeedPerturbation

Change the speed of the audio signal, producing 3 versions of the original signal with speed factors of 0.9, 1.0 and 1.1

Waveform

PitchShift

Pitch shift the sound up or down without changing the tempo

Waveform

AddGaussianSNR

Add gaussian noise to the input. A random Signal to Noise Ratio (SNR) will be picked uniformly in the decibel scale. This aligns with human hearing, which is more logarithmic than linear.

Waveform

SevenBandParametricEQ

Because this transform changes the timbre, but keeps the overall "class" of the sound the same (depending on application), it can be used for data augmentation to make ML models more robust to various frequency spectrums.

Waveform

Vocal Tract Length Perturbation (VTLP)

Transforming spectrograms, using a random linear warping along the frequency dimension

Spectrogram

There are no rows in this table

⁠

Experiments

Augmentation 1:

TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),

PitchShift(min_semitones=-4, max_semitones=4, p=0.5)

Augmentation 2:

TimeStretch(min_rate=0.9, max_rate=1.1, p=0.3)

Gallery

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.

Data Augmentation

Audiomentations (https://github.com/iver56/audiomentations⁠)

Experiments

Audiomentations (
`https://github.com/iver56/audiomentations`⁠
)