Skip to content

Amazon Nova Act

In March 2025, Amazon AGI Labs released Nova Act, a research-preview Python SDK + vision model for building browser agents. Instead of asking a model to finish an entire workflow from one prompt, Nova Act encourages developers to break jobs into small, “atomic” act() calls—each call takes a screenshot + natural-language instruction, returns a low-level action plan (click, type, scroll), and executes it through Playwright. By constraining the model to focused steps, Amazon reports >90 % accuracy on hard UI widgets (date-pickers, dropdowns, pop-ups) and top scores on ScreenSpot & GroundUI-Web benchmarks, outperforming Claude 3.7 and OpenAI CUA on most sub-tasks.
Key features
Reliability-first design – atomic commands stitched together by the developer; suited to workflows that must run unattended.
Playwright integration – the SDK exposes the underlying page object so you can mix direct Python (e.g., typing passwords) with model-driven steps.
Structured extraction – pass a JSON or Pydantic schema to act() and receive validated data instead of free-form text.
Parallel sessions & headless mode – spin up many Nova Act instances for map-reduce-style scraping or run workflows on a schedule once stable.
Trace & video logging – every run can emit an HTML replay and optional MP4 for debugging
Preview limitations & safety notes
U.S.-only wait-list; Mac OS / Ubuntu, Python 3.10+.
No CAPTCHA solving; watch for prompt-injection, and keep secrets out of prompts (use direct Playwright calls instead).
Amazon stores prompts + screenshots for service improvement; email nova-act@amazon.com to delete data.
Nova Act positions itself as “Selenium you can talk to”—a pragmatic, reliability-oriented step toward fully autonomous agents, and the first building block in Amazon’s broader Nova model family
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.