Skip to content
videodb
VideoDB Documentation
  • Pages
    • Welcome to VideoDB Docs
    • Quick Start Guide
      • Video Indexing Guide
      • Semantic Search
      • Collections
      • Public Collections
      • Callback Details
      • Ref: Subtitle Styles
      • Language Support
      • Guide: Subtitles
      • How Accurate is Your Search?
    • Visual Search and Indexing
      • icon picker
        Scene Extraction Algorithms
      • Custom Annotations
      • Scene-Level Metadata: Smarter Video Search & Retrieval
      • Advanced Visual Search Pipelines
      • Playground for Scene Extractions
      • Deep Dive into Prompt Engineering : Mastering Visual Indexing
      • How VideoDB Solves Complex Visual Analysis Tasks
      • Multimodal Search: Quickstart
      • Conference Slide Scraper with VideoDB
    • Examples and Tutorials
      • Dubbing - Replace Soundtrack with New Audio
      • VideoDB: Adding AI Generated voiceovers to silent footage
      • Beep curse words in real-time
      • Remove Unwanted Content from videos
      • Instant Clips of Your Favorite Characters
      • Insert Dynamic Ads in real-time
      • Adding Brand Elements with VideoDB
      • Elevating Trailers with Automated Narration
      • Add Intro/Outro to Videos
      • Audio overlay + Video + Timeline
      • Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
      • AI Generated Ad Films for Product Videography
      • Fun with Keyword Search
      • Overlay a Word-Counter on Video Stream
      • Generate Automated Video Outputs with Text Prompts | VideoDB
      • VideoDB x TwelveLabs: Real-Time Video Understanding
      • Multimodal Search
      • How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
      • Make Your Video Sound Studio Quality with Voice Cloning
      • Automated Traffic Violation Reporter
    • Live Video→ Instant Action
    • Generative Media Quickstart
      • Generative Media Pricing
    • Video Editing Automation
      • Fit & Position: Aspect Ratio Control
      • Trimming vs Timing: Two Independent Timelines
      • Advanced Clip Control: The Composition Layer
      • Caption & Subtitles: Auto-Generated Speech Synchronization
      • Notebooks
    • Transcoding Quickstart
    • director-light
      Director - Video Agent Framework
      • Agent Creation Playbook
      • Setup Director Locally
    • Workflows and Integrations
      • zapier
        Zapier Integration
        • Auto-Dub Videos & Save to Google Drive
        • Create & Add Intelligent Video Highlights to Notion
        • Create GenAI Video Engine - Notion Ideas to Youtube
        • Automatically Detect Profanity in Videos with AI - Update on Slack
        • Generate and Store YouTube Video Summaries in Notion
        • Automate Subtitle Generation for Video Libraries
        • Solve customers queries with Video Answers
      • n8n
        N8N Workflows
        • AI-Powered Meeting Intelligence: Recording to Insights Automation
        • AI Powered Dubbing Workflow for Video Content
        • Automate Subtitle Generation for Video Libraries
        • Automate Interview Evaluations with AI
        • Turn Meeting Recordings into Actionable Summaries
        • Auto-Sync Sales Calls to HubSpot CRM with AI
        • Instant Notion Summaries for Your Youtube Playlist
    • Meeting Recording SDK
    • github
      Open Source
      • llama
        LlamaIndex VideoDB Retriever
      • PromptClip: Use Power of LLM to Create Clips
      • StreamRAG: Connect ChatGPT to VideoDB
    • mcp
      VideoDB MCP Server
    • videodb
      Give your AI, Eyes and Ears
      • Building Infrastructure that “Sees” and “Edits”
      • Agents with Video Experience
      • From MP3/MP4 to the Future with VideoDB
      • Dynamic Video Streams
      • Why do we need a Video Database Now?
      • What's a Video Database ?
      • Enhancing AI-Driven Multimedia Applications
      • Beyond Traditional Video Infrastructure
    • Customer Love
    • Join us
      • videodb
        Internship: Build the Future of AI-Powered Video Infrastructure
      • Ashutosh Trivedi
        • Playlists
        • Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
      • Ashish
      • Shivani Desai
      • Gaurav Tyagi
      • Rohit Garg
      • Edge of Knowledge
        • Language Models to World Models: The Next Frontier in AI
        • Society of Machines
          • Society of Machines
          • Autonomy - Do we have the choice?
          • Emergence - An Intelligence of the collective
        • Building Intelligent Machines
          • Part 1 - Define Intelligence
          • Part 2 - Observe and Respond
          • Part 3 - Training a Model
      • Updates
        • VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

Scene Extraction Algorithms

SceneExtractionType

A video is a series of images that are called frames, these frames can be processed using multimodal modals or computer vision pipelines. There are many ways to identify the temporal change of concepts in the video.
Screenshot 2024-07-04 at 11.41.39 AM.jpg
SceneExtractionType and extraction_config can be used with two functions as parameters for scene identification.
It can be passed to index_scenes() function as an argument.
It can be pass as an argument to extract_scenes() function.
Checkout for Scene and Frame object details.

Screenshot 2024-07-04 at 12.03.45 PM.jpg
Time based extraction is a simple way to break video into scenes. You define a frequency at which you want to split the video in scenes, for example, you may consider every 10 second as a one scene. This method is useful when you have no information about the nature of video or the video is random & dynamic. You can even create scenes with 1 second time interval.
This method has following extraction_config :
time : The interval (in seconds) at which scenes are segmented. Default value is 10 — which means every 10 seconds segment is a scene.
frame_count: The number of frames to extract per scene. This allows you to increase the number of frames collected for more context. Default value is 1.
select_frames: A list of frames to select from each segment. The list can contain strings from ["first", "middle", or "last"] which selects the respective frames. Default value is ["first"]
Note: You can use either select_frames or frame_count strategy to extract frames for the scene.

wait_index = traffic_video.index_scenes(
extraction_type=SceneExtractionType.time_based,
extraction_config={"time": 4, "frame_count": 5},
prompt="Identify when multiple cars are slowing down or waiting. Mention that cars are waiting or stopping and also specify the lane as left, middle, or right. For example, you can say `cars in the middle lanes are waiting`.",
name="wait_index"
)
extraction_type=SceneExtractionType.time_based,
extraction_config={"time":10, "select_frames": ['first']},


Screenshot 2024-07-04 at 12.13.39 PM.jpg
Videos share context between timestamps. A scene is a logical segment of a video that completes a concept. You can identify scene changes based on visual content within the video.
Key factors for calculating changes are significant changes in the visual content, such as transitions, lights and movement.
This method has following extraction_config :
threshold: Determines the sensitivity of the model towards scene changes within the video. Default value is 20, which known to be good for detecting camera shot changes from a video.
frame_count: Accepts a number that specifies how many frames to pick from each shot. Default value is 1. Increasing this number will result in more frames being selected from each shot, which could provide a more detailed analysis of the scene.
extraction_type=SceneExtractionType.shot_based,
extraction_config={"threshold":20, "frame_count":4},


 
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.