VideoDB Documentation

Pages
- Welcome to VideoDB Docs
- Quick Start Guide
  Video Indexing Guide
  Semantic Search
  Collections
  Public Collections
  Callback Details
  Ref: Subtitle Styles
  Language Support
  Guide: Subtitles
  How Accurate is Your Search?
- Visual Search and Indexing
  Scene Extraction Algorithms
  Custom Annotations
  Scene-Level Metadata: Smarter Video Search & Retrieval
  Advanced Visual Search Pipelines
  Playground for Scene Extractions
  Deep Dive into Prompt Engineering : Mastering Visual Indexing
  How VideoDB Solves Complex Visual Analysis Tasks
  Multimodal Search: Quickstart
  Conference Slide Scraper with VideoDB
- Examples and Tutorials
  Dubbing - Replace Soundtrack with New Audio
  VideoDB: Adding AI Generated voiceovers to silent footage
  Beep curse words in real-time
  Remove Unwanted Content from videos
  Instant Clips of Your Favorite Characters
  Insert Dynamic Ads in real-time
  Adding Brand Elements with VideoDB
  Elevating Trailers with Automated Narration
  Add Intro/Outro to Videos
  Audio overlay + Video + Timeline
  Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
  AI Generated Ad Films for Product Videography
  Fun with Keyword Search
  Overlay a Word-Counter on Video Stream
  Generate Automated Video Outputs with Text Prompts | VideoDB
  Multimodal Search
  How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
  Make Your Video Sound Studio Quality with Voice Cloning
  Automated Traffic Violation Reporter
  VideoDB x TwelveLabs: Real-Time Video Understanding
- Live Video→ Instant Action
- Generative Media Quickstart
  Generative Media Pricing
- Video Editing Automation
  Fit & Position: Aspect Ratio Control
  Trimming vs Timing: Two Independent Timelines
  Advanced Clip Control: The Composition Layer
  Caption & Subtitles: Auto-Generated Speech Synchronization
  Example Notebooks
- Transcoding Quickstart
- Director - Video Agent Framework
  Agent Creation Playbook
  Setup Director Locally
- Workflows and Integrations
  Zapier Integration
  Auto-Dub Videos & Save to Google Drive
  Create & Add Intelligent Video Highlights to Notion
  Create GenAI Video Engine - Notion Ideas to Youtube
  Automatically Detect Profanity in Videos with AI - Update on Slack
  Generate and Store YouTube Video Summaries in Notion
  Automate Subtitle Generation for Video Libraries
  Solve customers queries with Video Answers
  N8N Workflows
  AI-Powered Meeting Intelligence: Recording to Insights Automation
  AI Powered Dubbing Workflow for Video Content
  Automate Subtitle Generation for Video Libraries
  Automate Interview Evaluations with AI
  Turn Meeting Recordings into Actionable Summaries
  Auto-Sync Sales Calls to HubSpot CRM with AI
  Instant Notion Summaries for Your Youtube Playlist
- Meeting Recording SDK
- Open Source
  LlamaIndex VideoDB Retriever
  PromptClip: Use Power of LLM to Create Clips
  StreamRAG: Connect ChatGPT to VideoDB
- VideoDB MCP Server
- Give your AI, Eyes and Ears
  Building Infrastructure that “Sees” and “Edits”
  Agents with Video Experience
  From MP3/MP4 to the Future with VideoDB
  Dynamic Video Streams
  Why do we need a Video Database Now?
  What's a Video Database ?
  Enhancing AI-Driven Multimedia Applications
  Beyond Traditional Video Infrastructure
- Customer Love
- Join us
  Internship: Build the Future of AI-Powered Video Infrastructure
  Ashutosh Trivedi
  Playlists
  Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
  Ashish
  Shivani Desai
  Gaurav Tyagi
  Rohit Garg
  Edge of Knowledge
  Language Models to World Models: The Next Frontier in AI
  Society of Machines
  Society of Machines
  Autonomy - Do we have the choice?
  Emergence - An Intelligence of the collective
  Building Intelligent Machines
  Part 1 - Define Intelligence
  Part 2 - Observe and Respond
  Part 3 - Training a Model
  Updates
  VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

VideoDB Documentation

...

Overlay a Word-Counter on Video Stream

Explore

Overlay a Word-Counter on Video Stream

⁠

Introduction

With an endless stream of new video content on our feeds, engaging the audience with dynamic visual elements can make educational and promotional videos much more impactful. VideoDB's suite of features allows you to enhance videos with programmatic editing.

In this tutorial, we'll explore how to create a video that visually counts and displays instances of a specified word as it's spoken. We'll use VideoDB’s

Keyword Search⁠

to index spoken words, and then apply audio and

text overlays⁠

to show a counter updating in real-time with synchronized audio cues.

⁠

Setup

📦 Installing packages

%pip install videodb

🔑 API Keys

Before proceeding, ensure access to

VideoDB⁠

and set up

Get your API key from

VideoDB Console⁠

. ( Free for first 50 uploads, No credit card required ) 🎉

import videodb

import os

from getpass import getpass

# Prompt user for API key securely

api_key = getpass("Please enter your VideoDB API Key: ")

os.environ["VIDEO_DB_API_KEY"] = api_key

⁠

Steps

🌐 Step 1: Connect to VideoDB

Establish a session for uploading videos. Import the necessary modules from VideoDB library to access functionalities.

from videodb import connect

conn = connect()

coll = conn.get_collection()

🗳️ Step 2: Upload Video

Upload and play the video to ensure it's correctly loaded. We’ll be using

this video⁠

for the purpose of this tutorial.

video = coll.upload(url="https://www.youtube.com/watch?v=Js4rTM2Z1Eg")

video.play()

📝 Step 3: Indexing Spoken Words

Index the video to identify and timestamp all spoken words.

video.index_spoken_words()

🔍 Step 4: Keyword Search

Search within the video for the keyword ("education" in this example), and note each occurrence.

from videodb import SearchType

result = video.search(query="education", search_type=SearchType.keyword)

🎼 Step 5: Setup Timeline and Audio

Initialize the timeline and prepare an audio asset to use for each word occurrence.

from videodb.editor import Timeline, Track, Clip, AudioAsset, VideoAsset, TextAsset

from videodb.editor import Font, Background, Alignment, HorizontalAlignment, VerticalAlignment, Position, Offset

from videodb import MediaType

timeline = Timeline(conn)

# Upload the twink sound effect

audio = conn.upload(url="https://github.com/video-db/videodb-cookbook-assets/raw/main/audios/twink.mp3", media_type=MediaType.audio)

💬 Step 6: Overlay Text and Audio

Add text and audio overlays at each instance where the word is spoken using the `Track` and `Clip` pattern.

Note: Adding the 'padding' is an optional step. It helps in adding a little more context to the exact instance identified, thus resulting in a better compiled output.

video_duration = min(300, int(video.length)) # First 5 minutes only

audio_offset = 1 # Delay audio/text update by 1 second for better sync

# Create timeline and tracks

timeline = Timeline(conn)

video_track = Track()

text_track = Track()

audio_track = Track()

# Add video clip (first 5 minutes)

video_clip = Clip(

asset=VideoAsset(id=video.id, start=0),

duration=video_duration

)

video_track.add_clip(0, video_clip)

# Filter shots within our duration

shots_in_range = [s for s in result.shots if int(s.start) + audio_offset < video_duration]

# Add text overlays that update at each word occurrence

for i, shot in enumerate(shots_in_range):

trigger_time = int(shot.start) + audio_offset

# Initial "Count-0" from start until first word

if i == 0 and trigger_time > 0:

text_asset = TextAsset(

text="Count-0",

font=Font(family="Do Hyeon", size=72, color="#000100"),

background=Background(color="#F702A4", opacity=1.0),

alignment=Alignment(horizontal=HorizontalAlignment.right, vertical=VerticalAlignment.top),

)

text_clip = Clip(asset=text_asset, duration=trigger_time,

position=Position.top_right, offset=Offset(x=-0.05, y=0.05))

text_track.add_clip(0, text_clip)

# Duration until next word or end of video

if i + 1 < len(shots_in_range):

next_trigger = int(shots_in_range[i + 1].start) + audio_offset

else:

next_trigger = video_duration

text_dur = next_trigger - trigger_time

# Text overlay with updated count

text_asset = TextAsset(

text=f"Count-{i + 1}",

font=Font(family="Do Hyeon", size=72, color="#000100"),

background=Background(color="#F702A4", opacity=1.0),

alignment=Alignment(horizontal=HorizontalAlignment.right, vertical=VerticalAlignment.top),

)

text_clip = Clip(asset=text_asset, duration=text_dur, position=Position.top_right, offset=Offset(x=-0.05, y=0.05))

text_track.add_clip(trigger_time, text_clip)

# Audio cue at same trigger time

if trigger_time < video_duration - 2:

audio_clip = Clip(asset=AudioAsset(id=audio.id), duration=2)

audio_track.add_clip(trigger_time, audio_clip)

# Add all tracks to timeline

timeline.add_track(video_track)

timeline.add_track(text_track)

timeline.add_track(audio_track)

⚡️ Step 7: Generate and Play the Stream

Finally, generate a streaming URL for your edited video and play it.

from videodb import play_stream

stream_url = timeline.generate_stream()

play_stream(stream_url)

Here’s a preview of showing occurrence of the word Education

⁠

Conclusion

This tutorial showcases VideoDB’s capabilities to create a video that programmatically counts and displays the frequency of a specific keyword spoken throughout the video. This method can be adapted for various applications where dynamic text overlays add significant value to video content.