VideoDB Documentation

Pages
- Welcome to VideoDB Docs
- Quick Start Guide
  Video Indexing Guide
  Semantic Search
  Collections
  Public Collections
  Callback Details
  Ref: Subtitle Styles
  Language Support
  Guide: Subtitles
  How Accurate is Your Search?
- Visual Search and Indexing
  Scene Extraction Algorithms
  Custom Annotations
  Scene-Level Metadata: Smarter Video Search & Retrieval
  Advanced Visual Search Pipelines
  Playground for Scene Extractions
  Deep Dive into Prompt Engineering : Mastering Visual Indexing
  How VideoDB Solves Complex Visual Analysis Tasks
  Multimodal Search: Quickstart
  Conference Slide Scraper with VideoDB
- Examples and Tutorials
  Dubbing - Replace Soundtrack with New Audio
  VideoDB: Adding AI Generated voiceovers to silent footage
  Beep curse words in real-time
  Remove Unwanted Content from videos
  Instant Clips of Your Favorite Characters
  Insert Dynamic Ads in real-time
  Adding Brand Elements with VideoDB
  Elevating Trailers with Automated Narration
  Add Intro/Outro to Videos
  Audio overlay + Video + Timeline
  Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
  AI Generated Ad Films for Product Videography
  Fun with Keyword Search
  Overlay a Word-Counter on Video Stream
  Generate Automated Video Outputs with Text Prompts | VideoDB
  Multimodal Search
  How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
  Make Your Video Sound Studio Quality with Voice Cloning
  Automated Traffic Violation Reporter
  VideoDB x TwelveLabs: Real-Time Video Understanding
- Live Video→ Instant Action
- Generative Media Quickstart
  Generative Media Pricing
- Video Editing Automation
  Fit & Position: Aspect Ratio Control
  Trimming vs Timing: Two Independent Timelines
  Advanced Clip Control: The Composition Layer
  Caption & Subtitles: Auto-Generated Speech Synchronization
  Example Notebooks
- Transcoding Quickstart
- Director - Video Agent Framework
  Agent Creation Playbook
  Setup Director Locally
- Workflows and Integrations
  Zapier Integration
  Auto-Dub Videos & Save to Google Drive
  Create & Add Intelligent Video Highlights to Notion
  Create GenAI Video Engine - Notion Ideas to Youtube
  Automatically Detect Profanity in Videos with AI - Update on Slack
  Generate and Store YouTube Video Summaries in Notion
  Automate Subtitle Generation for Video Libraries
  Solve customers queries with Video Answers
  N8N Workflows
  AI-Powered Meeting Intelligence: Recording to Insights Automation
  AI Powered Dubbing Workflow for Video Content
  Automate Subtitle Generation for Video Libraries
  Automate Interview Evaluations with AI
  Turn Meeting Recordings into Actionable Summaries
  Auto-Sync Sales Calls to HubSpot CRM with AI
  Instant Notion Summaries for Your Youtube Playlist
- Meeting Recording SDK
- Open Source
  LlamaIndex VideoDB Retriever
  PromptClip: Use Power of LLM to Create Clips
  StreamRAG: Connect ChatGPT to VideoDB
- VideoDB MCP Server
- Give your AI, Eyes and Ears
  Building Infrastructure that “Sees” and “Edits”
  Agents with Video Experience
  From MP3/MP4 to the Future with VideoDB
  Dynamic Video Streams
  Why do we need a Video Database Now?
  What's a Video Database ?
  Enhancing AI-Driven Multimedia Applications
  Beyond Traditional Video Infrastructure
- Customer Love
- Join us
  Internship: Build the Future of AI-Powered Video Infrastructure
  Ashutosh Trivedi
  Playlists
  Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
  Ashish
  Shivani Desai
  Gaurav Tyagi
  Rohit Garg
  Edge of Knowledge
  Language Models to World Models: The Next Frontier in AI
  Society of Machines
  Society of Machines
  Autonomy - Do we have the choice?
  Emergence - An Intelligence of the collective
  Building Intelligent Machines
  Part 1 - Define Intelligence
  Part 2 - Observe and Respond
  Part 3 - Training a Model
  Updates
  VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

VideoDB Documentation

...

Advanced Visual Search Pipelines

Explore

Advanced Visual Search Pipelines

⁠

Let's deep dive into Scene and Frame objects

Scene

A Scene object describes a unique event in the video. From a timeline perspective it’s a timestamp range.

video_id : id of the video object

start : seconds

end : seconds

description : string description

Each scene object has an attribute frames, that has list of Frame objects.

⁠

Frame

Each Scene can be described by a list of frames. Each Frame object primarily has the URL of the image and its description field.

id : ID of the frame object

url : URL of the image

frame_time : Timestamp of the frame in the video

description : string description

video_id : id of the video object

scene_id : id of the scene object

⁠

Screenshot 2024-07-04 at 11.41.39 AM.jpg

⁠

We provide you with easy-to-use Objects and Functions to bring flexibility in designing your visual understanding pipeline. With these tools, you have the freedom to:

Extract scene according to your use case.

Go to frame level abstraction.

Assign label, custom model description for each frame.

Use of multiple models, prompts for each scene or frame to convert information to text.

Send multiple frames to vision model for better temporal activity understanding.

⁠

extract_scenes()

This function accepts the extraction_type and extraction_config and returns a

SceneCollection⁠

object, which keep information about all the extracted scene lists.

Checkout

Scene Extraction Algorithms⁠

for more details.

scene_collection = video.extract_scenes(

extraction_type=SceneExtractionType.time_based,

extraction_config={"time": 30, "select_frames": ["middle"]},

)

Capture Temporal Change

Vision models excel at describing images, but videos present an added complexity due to the temporal changes in the information. With our pipeline, you can maintain image-level understanding in frames and combine them using LLMs at the scene level to capture temporal or activity-related understanding.

You have freedom to iterate through each scene and frame level to describe the information for indexing purposes.

Get scene collection

scene_collection = video.get_scene_collection("scene_collection_id")

Iterate through each scene and frame

Iterate over scenes and frames and attach description coming from external pipeline be it custom CV pipeline or custom model descriptions.

print("This is scene collection id", scene_collection.id)

print("This is scene collection config", scene_collection.config)

# get scene from collection

scenes = scene_collection.scenes

# Iterate through each scene

for scene in scenes:

print(f"Scene Duration {scene.start}-{scene.end}")

# Iterate through each frame in the scene

for frame in scene.frames:

print(f"Frame at {frame.frame_time} {frame.url}")

frame.description = "bring text from external sources/ pipeline"

)

Create Scene by custom annotation

These annotations can come from your application or from external vision model, if you extract the description using any vision LLM

for scene in scenes:

scene.description = "summary of frame level description"

Using this pipeline, you have the freedom to design your own flow. In the example above, we’ve described each frame in the scene independently, but some vision models allow multiple images in one go as well. Feel free to customise your flow as per your needs.

Experiment with sending multiple frames to a vision model.

Utilize prompts to describe multiple frames, then assign these descriptions to the scene.

Integrate your own vision model into the pipeline.

We’ll soon be adding more details and strategies for effective and advanced multimodal search. We welcome your input on what strategies have worked best in your specific use cases

Here’s our 🎙️

Discord⁠

channel where we brainstorm about such ideas.

Once you have a description of each scene in place, you can index and search for the information using the following functions.

from videodb import IndexType

#create new index and assign a name to it

index_id = video.index_scenes(scenes=scenes, name="My Custom Model")

# search using the index_id

res = video.search(query="first 29 sec",

index_type=IndexType.scene,

index_id=index_id)

res.play()