VideoDB Documentation

Pages
- Welcome to VideoDB Docs
- Quick Start Guide
  Video Indexing Guide
  Semantic Search
  Collections
  Public Collections
  Callback Details
  Ref: Subtitle Styles
  Language Support
  Guide: Subtitles
  How Accurate is Your Search?
- Visual Search and Indexing
  Scene Extraction Algorithms
  Custom Annotations
  Scene-Level Metadata: Smarter Video Search & Retrieval
  Advanced Visual Search Pipelines
  Playground for Scene Extractions
  Deep Dive into Prompt Engineering : Mastering Visual Indexing
  How VideoDB Solves Complex Visual Analysis Tasks
  Multimodal Search: Quickstart
  Conference Slide Scraper with VideoDB
- Examples and Tutorials
  Dubbing - Replace Soundtrack with New Audio
  VideoDB: Adding AI Generated voiceovers to silent footage
  Beep curse words in real-time
  Remove Unwanted Content from videos
  Instant Clips of Your Favorite Characters
  Insert Dynamic Ads in real-time
  Adding Brand Elements with VideoDB
  Elevating Trailers with Automated Narration
  Add Intro/Outro to Videos
  Audio overlay + Video + Timeline
  Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
  AI Generated Ad Films for Product Videography
  Fun with Keyword Search
  Overlay a Word-Counter on Video Stream
  Generate Automated Video Outputs with Text Prompts | VideoDB
  Multimodal Search
  How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
  Make Your Video Sound Studio Quality with Voice Cloning
  Automated Traffic Violation Reporter
  VideoDB x TwelveLabs: Real-Time Video Understanding
- Live Video→ Instant Action
- Generative Media Quickstart
  Generative Media Pricing
- Video Editing Automation
  Fit & Position: Aspect Ratio Control
  Trimming vs Timing: Two Independent Timelines
  Advanced Clip Control: The Composition Layer
  Caption & Subtitles: Auto-Generated Speech Synchronization
  Example Notebooks
- Transcoding Quickstart
- Director - Video Agent Framework
  Agent Creation Playbook
  Setup Director Locally
- Workflows and Integrations
  Zapier Integration
  Auto-Dub Videos & Save to Google Drive
  Create & Add Intelligent Video Highlights to Notion
  Create GenAI Video Engine - Notion Ideas to Youtube
  Automatically Detect Profanity in Videos with AI - Update on Slack
  Generate and Store YouTube Video Summaries in Notion
  Automate Subtitle Generation for Video Libraries
  Solve customers queries with Video Answers
  N8N Workflows
  AI-Powered Meeting Intelligence: Recording to Insights Automation
  AI Powered Dubbing Workflow for Video Content
  Automate Subtitle Generation for Video Libraries
  Automate Interview Evaluations with AI
  Turn Meeting Recordings into Actionable Summaries
  Auto-Sync Sales Calls to HubSpot CRM with AI
  Instant Notion Summaries for Your Youtube Playlist
- Meeting Recording SDK
- Open Source
  LlamaIndex VideoDB Retriever
  PromptClip: Use Power of LLM to Create Clips
  StreamRAG: Connect ChatGPT to VideoDB
- VideoDB MCP Server
- Give your AI, Eyes and Ears
  Building Infrastructure that “Sees” and “Edits”
  Agents with Video Experience
  From MP3/MP4 to the Future with VideoDB
  Dynamic Video Streams
  Why do we need a Video Database Now?
  What's a Video Database ?
  Enhancing AI-Driven Multimedia Applications
  Beyond Traditional Video Infrastructure
- Customer Love
- Join us
  Internship: Build the Future of AI-Powered Video Infrastructure
  Ashutosh Trivedi
  Playlists
  Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
  Ashish
  Shivani Desai
  Gaurav Tyagi
  Rohit Garg
  Edge of Knowledge
  Language Models to World Models: The Next Frontier in AI
  Society of Machines
  Society of Machines
  Autonomy - Do we have the choice?
  Emergence - An Intelligence of the collective
  Building Intelligent Machines
  Part 1 - Define Intelligence
  Part 2 - Observe and Respond
  Part 3 - Training a Model
  Updates
  VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

VideoDB Documentation

...

Scene Extraction Algorithms

Explore

Scene Extraction Algorithms

SceneExtractionType

A video is a series of images that are called frames, these frames can be processed using multimodal modals or computer vision pipelines. There are many ways to identify the temporal change of concepts in the video.

⁠

Screenshot 2024-07-04 at 11.41.39 AM.jpg

⁠

SceneExtractionType and extraction_config can be used with two functions as parameters for scene identification.

It can be passed to index_scenes() function as an argument.

It can be pass as an argument to extract_scenes() function.

Checkout

Advanced Visual Search Pipelines⁠

for Scene and Frame object details.

⁠

Screenshot 2024-07-04 at 12.03.45 PM.jpg

⁠

Time based extraction is a simple way to break video into scenes. You define a frequency at which you want to split the video in scenes, for example, you may consider every 10 second as a one scene. This method is useful when you have no information about the nature of video or the video is random & dynamic. You can even create scenes with 1 second time interval.

This method has following extraction_config :

time : The interval (in seconds) at which scenes are segmented. Default value is 10 — which means every 10 seconds segment is a scene.

frame_count: The number of frames to extract per scene. This allows you to increase the number of frames collected for more context. Default value is 1.

select_frames: A list of frames to select from each segment. The list can contain strings from ["first", "middle", or "last"] which selects the respective frames. Default value is ["first"]

Note: You can use either select_frames or frame_count strategy to extract frames for the scene.

wait_index = traffic_video.index_scenes(

extraction_type=SceneExtractionType.time_based,

extraction_config={"time": 4, "frame_count": 5},

prompt="Identify when multiple cars are slowing down or waiting. Mention that cars are waiting or stopping and also specify the lane as left, middle, or right. For example, you can say `cars in the middle lanes are waiting`.",

name="wait_index"

)

extraction_type=SceneExtractionType.time_based,

extraction_config={"time":10, "select_frames": ['first']},

⁠

Screenshot 2024-07-04 at 12.13.39 PM.jpg

⁠

Videos share context between timestamps. A scene is a logical segment of a video that completes a concept. You can identify scene changes based on visual content within the video.

Key factors for calculating changes are significant changes in the visual content, such as transitions, lights and movement.

This method has following extraction_config :

threshold: Determines the sensitivity of the model towards scene changes within the video. Default value is 20, which known to be good for detecting camera shot changes from a video.

frame_count: Accepts a number that specifies how many frames to pick from each shot. Default value is 1. Increasing this number will result in more frames being selected from each shot, which could provide a more detailed analysis of the scene.

extraction_type=SceneExtractionType.shot_based,

extraction_config={"threshold":20, "frame_count":4},

Gallery

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.