Amazon Nova Act

In March 2025, Amazon AGI Labs released Nova Act, a research-preview Python SDK + vision model for building browser agents. Instead of asking a model to finish an entire workflow from one prompt, Nova Act encourages developers to break jobs into small, “atomic” act() calls—each call takes a screenshot + natural-language instruction, returns a low-level action plan (click, type, scroll), and executes it through Playwright. By constraining the model to focused steps, Amazon reports >90 % accuracy on hard UI widgets (date-pickers, dropdowns, pop-ups) and top scores on ScreenSpot & GroundUI-Web benchmarks, outperforming Claude 3.7 and OpenAI CUA on most sub-tasks.
Key features
Reliability-first design – atomic commands stitched together by the developer; suited to workflows that must run unattended.
Playwright integration – the SDK exposes the underlying page object so you can mix direct Python (e.g., typing passwords) with model-driven steps.
Structured extraction – pass a JSON or Pydantic schema to act() and receive validated data instead of free-form text.
Parallel sessions & headless mode – spin up many Nova Act instances for map-reduce-style scraping or run workflows on a schedule once stable.
Trace & video logging – every run can emit an HTML replay and optional MP4 for debugging
Preview limitations & safety notes
U.S.-only wait-list; Mac OS / Ubuntu, Python 3.10+.
No CAPTCHA solving; watch for prompt-injection, and keep secrets out of prompts (use direct Playwright calls instead).
Amazon stores prompts + screenshots for service improvement; email nova-act@amazon.com to delete data.
Nova Act positions itself as “Selenium you can talk to”—a pragmatic, reliability-oriented step toward fully autonomous agents, and the first building block in Amazon’s broader Nova model family
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.