Skip to content
videodb
VideoDB Documentation
  • Pages
    • Welcome to VideoDB Docs
    • Quick Start Guide
      • Video Indexing Guide
      • Semantic Search
      • Collections
      • Public Collections
      • Callback Details
      • Ref: Subtitle Styles
      • Language Support
      • Guide: Subtitles
      • How Accurate is Your Search?
    • Visual Search and Indexing
      • Scene Extraction Algorithms
      • Custom Annotations
      • Scene-Level Metadata: Smarter Video Search & Retrieval
      • Advanced Visual Search Pipelines
      • Playground for Scene Extractions
      • Deep Dive into Prompt Engineering : Mastering Visual Indexing
      • How VideoDB Solves Complex Visual Analysis Tasks
      • Multimodal Search: Quickstart
      • Conference Slide Scraper with VideoDB
    • Examples and Tutorials
      • Dubbing - Replace Soundtrack with New Audio
      • VideoDB: Adding AI Generated voiceovers to silent footage
      • Beep curse words in real-time
      • Remove Unwanted Content from videos
      • Instant Clips of Your Favorite Characters
      • Insert Dynamic Ads in real-time
      • Adding Brand Elements with VideoDB
      • Elevating Trailers with Automated Narration
      • Add Intro/Outro to Videos
      • Audio overlay + Video + Timeline
      • Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
      • AI Generated Ad Films for Product Videography
      • Fun with Keyword Search
      • Overlay a Word-Counter on Video Stream
      • Generate Automated Video Outputs with Text Prompts | VideoDB
      • VideoDB x TwelveLabs: Real-Time Video Understanding
      • Multimodal Search
      • How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
      • Make Your Video Sound Studio Quality with Voice Cloning
      • Automated Traffic Violation Reporter
    • Live Video→ Instant Action
    • Generative Media Quickstart
      • Generative Media Pricing
    • Video Editing Automation
      • Fit & Position: Aspect Ratio Control
      • Trimming vs Timing: Two Independent Timelines
      • Advanced Clip Control: The Composition Layer
      • Caption & Subtitles: Auto-Generated Speech Synchronization
      • Notebooks
    • Transcoding Quickstart
    • director-light
      Director - Video Agent Framework
      • Agent Creation Playbook
      • Setup Director Locally
    • Workflows and Integrations
      • zapier
        Zapier Integration
        • Auto-Dub Videos & Save to Google Drive
        • Create & Add Intelligent Video Highlights to Notion
        • Create GenAI Video Engine - Notion Ideas to Youtube
        • Automatically Detect Profanity in Videos with AI - Update on Slack
        • Generate and Store YouTube Video Summaries in Notion
        • Automate Subtitle Generation for Video Libraries
        • Solve customers queries with Video Answers
      • n8n
        N8N Workflows
        • AI-Powered Meeting Intelligence: Recording to Insights Automation
        • AI Powered Dubbing Workflow for Video Content
        • Automate Subtitle Generation for Video Libraries
        • Automate Interview Evaluations with AI
        • Turn Meeting Recordings into Actionable Summaries
        • Auto-Sync Sales Calls to HubSpot CRM with AI
        • Instant Notion Summaries for Your Youtube Playlist
    • Meeting Recording SDK
    • github
      Open Source
      • llama
        LlamaIndex VideoDB Retriever
      • PromptClip: Use Power of LLM to Create Clips
      • StreamRAG: Connect ChatGPT to VideoDB
    • mcp
      VideoDB MCP Server
    • videodb
      Give your AI, Eyes and Ears
      • Building Infrastructure that “Sees” and “Edits”
      • Agents with Video Experience
      • From MP3/MP4 to the Future with VideoDB
      • Dynamic Video Streams
      • Why do we need a Video Database Now?
      • What's a Video Database ?
      • Enhancing AI-Driven Multimedia Applications
      • Beyond Traditional Video Infrastructure
    • Customer Love
    • Join us
      • videodb
        Internship: Build the Future of AI-Powered Video Infrastructure
      • Ashutosh Trivedi
        • Playlists
        • Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
      • Ashish
      • Shivani Desai
      • Gaurav Tyagi
      • Rohit Garg
      • Edge of Knowledge
        • Language Models to World Models: The Next Frontier in AI
        • Society of Machines
          • Society of Machines
          • Autonomy - Do we have the choice?
          • Emergence - An Intelligence of the collective
        • Building Intelligent Machines
          • Part 1 - Define Intelligence
          • icon picker
            Part 2 - Observe and Respond
          • Part 3 - Training a Model
      • Updates
        • VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

Part 2 - Observe and Respond

In of Building Intelligent Machines, we discussed that we can define Intelligence as a measure of magnitude of interaction with the world. More the ways or dimensions of interacting with the world, more the intelligence. Decision making is a big part of increasing the dimensions of interaction with the world, and understanding how humans take decisions can help us build intelligent machines.
Human decision making is based on information. There is a lot of information we are born with, which is stored in our DNA. For example, making a sound, crawling, eating and many such things which we know as soon as we come to life. They are part of our DNA and we don’t have to learn them.
But that information alone is not sufficient for humans to survive. We keep on gathering information from our world to refine our understanding of it. We learn from it, and that as you know is not a very simple process. Sometimes we fail, sometimes we succeed.
image.jpeg
In this post, I’ll discuss the process of learning. The scope of human behavior and characteristics discussed here are only related to rational decision making. Let’s leave the complex human behavior to the experts.
As mentioned in , there are few questions to ponder over while reading, so I’d request you to keep your notepad handy.
HOW DO WE GATHER INFORMATION ?
We are born with five senses — vision, touch, taste, sound and smell. Consciously or unconsciously we keep on gathering information from this world, the environment around us.
What happens after we have gathered this information? Perhaps a better question to think is — Why do we gather information?
image.jpeg
This question gives us the idea of goal. The most important goal of any life is — to survive.
Without a goal, gathering of information is not useful and we rarely learn from something that is not useful. All learning needs to have a goal. Goals can also be unconscious, may be hidden and not very clear to us, but we respond in accordance with our goal.

LEARNING

Learning, at its most primal, emerges from a process of observation and response. Before we dive deep into this, I have a question for you. What is this object?
image.jpeg
I know most of us will call it a tree, which is true. But, let’s think for a while on this question — Why did you call this object a tree?
Have you seen it before or somebody taught you that it’s a tree? Maybe you saw it in a book. Maybe you “just know” that it’s a tree. Let’s go deeper…
Do you remember all the trees you ever saw?
The last tree you saw?
The first tree you saw?
I am sure the answer to all the above questions is No!
The way our brain is able to identify a tree, is based on some kind of similarity. We might thinkHey, this thing is going up from the soil, has leaves, has a stem and many other characteristics that we collectively call a Tree”. Brain doesn’t store everything, it processes information very differently. There are no databases inside.
In some ways, you have understood the fundamental principle of the tree.
image.jpeg
We have gathered the information and we’ve made it “our own”. We “learned” what a tree is. We were observing it through our eyes and our goal to gather this information is to know the world better.
After gathering the information, our brain does some processing according to the goal and that process is what “learning” is. When we interact with the world, or go through the process of observe and respond, we discover the underlying principle of it.
We understand this idea scientifically as well. In computing world we can create or discover a space or environment where two things are closer even if there is no immediate similarity between them. The closeness depends on the characteristics or some kind of similarity. These spaces are called latent spaces. For example in latent space these two trees would be closer.
image.jpeg

OBSERVE AND RESPOND

Coming back to our decision making framework “observe and respond” we have introduced another important part of the process which we call “learning” which happens to be the outcome.
Through this principle, we can also have computing machines to go through the same process of observation and response, where they can start expanding the interaction with their environment and as a result learn the characteristics which will give them more decision making power.
To have a machine with the power of decision making, we should also think about where machines can be better than humans in decision making? Since we still don’t understand human emotions and other aspects of intelligence, we can only create purely rational decision making machines or software.
Humans have both positive and negative relationship with emotions. For example, while driving, anxiety and hurry can become a cause of an accident whereas in a competitive game, the same emotions can make a miracle happen. Positive emotions have pushed humans to expand their boundaries.
image.jpeg
Machines don’t get drunk and they don’t sleep. So, we know we can make use of purely rational highly analytical decision making machines. I believe driving should be better left to machines. What are the tasks you think machines can do better than humans?
In the coming post we will discuss how we can apply the principle of observation and response to create intelligent machines. A computer’s world is a vector world made of vectors in vector spaces. The fun part is creating a way to communicate our world to the computational world.
 
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.