Skip to content
AI4Bharat
  • Pages
    • AI4Bharat Public
      • Seminars
      • Publications
      • People
      • Models
    • AI4Bharat Admin
      • Members
      • Planning
      • Licensing
      • Meity Timelines
      • Hiring
        • AI4Bharat Summer of Code
    • IndicMining
      • Meeting Minutes
      • NeurIPS dataset paper plan
    • IndicASR
      • RNN-T
      • Multilingual ASR
        • Analysis
      • Adaptation in End-to-End Speech Recognition
      • Data Augmentation
      • Text Normalization for speech
    • Shoonya
      • Documentation - User Manual
        • Welcome Page
        • User-Roles on Shoonya
        • Getting Started with Workflow
          • Manager Workflow
          • Language-Experts Workflow
            • Annotation Workflow
            • Collection Workflow
        • Terminology
        • FAQs and Feedback
      • Management Dashboard
        • Language Experts
        • Annotation Tasks
      • Reporting and Analytics
        • Projects DataExports
        • Task Details
    • Shoonya Development Document
      • Shoonya Workflow
      • Software Architecture Diagrams
      • Technology Used
      • Shoonya Code Structure
      • Shoonya Deployment
    • Shoonya Forms
      • Feature Suggestions
      • Report Bugs for Shoonya
      • User Feedbacks
      • Stats-collection Forms

Data Augmentation

Audiomentations ()

Implements several waveform and spectrogram transform techniques
Supports CPU and GPU
Speech Augmentation Techniques
Name
Description
Implementation
Paper (if any)
Augmentation Type
SpecAugment
warping the features, masking blocks of frequency channels, and masking blocks of time steps
AddBackgroundNoise
Mix in another sound, e.g. a background noise. A folder of (background noise) sounds to be mixed in must be specified [Datasets available]
ApplyImpulseResponse
Convolve the audio with a random impulse response. [Datasets available]
AddShortNoises
Mix in various (bursts of overlapping) sounds with random pauses between
RoomSimulator
The system simulates millions of different room dimensions, a wide distribution of reverberation time and signal-to-noise ratios, and a range of microphone and sound source locations
SpeedPerturbation
Change the speed of the audio signal, producing 3 versions of the original signal with speed factors of 0.9, 1.0 and 1.1
PitchShift
Pitch shift the sound up or down without changing the tempo
AddGaussianSNR
Add gaussian noise to the input. A random Signal to Noise Ratio (SNR) will be picked uniformly in the decibel scale. This aligns with human hearing, which is more logarithmic than linear.
SevenBandParametricEQ
Because this transform changes the timbre, but keeps the overall "class" of the sound the same (depending on application), it can be used for data augmentation to make ML models more robust to various frequency spectrums.
Vocal Tract Length Perturbation (VTLP)
Transforming spectrograms, using a random linear warping along the frequency dimension
There are no rows in this table

Experiments

Augmentation 1:
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
PitchShift(min_semitones=-4, max_semitones=4, p=0.5)
Augmentation 2:
TimeStretch(min_rate=0.9, max_rate=1.1, p=0.3)

 
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.