User Experience

Interaction UX

Privacy and Consumer Designs

In consumer settings, trust remains a fundamental barrier to browser-based autonomy. Most users are uncomfortable delegating full remote browser automation to an agent, particularly when it involves sensitive actions like entering passwords, credit card details, or accessing private accounts. Today, consumers rely on local credential managers (e.g., Chrome’s autofill) and selectively authorize trusted apps via familiar interfaces. Asking a remote agent to perform these tasks introduces friction, both technically and psychologically. Designing agent workflows that respect user privacy expectations, by keeping sensitive data local, offering granular control, and maintaining transparency, will be critical for adoption at scale.

Managing Simultaneous Tasks

For the multi-step complex use cases such as booking a trip from city A to city B that typically involves multiple services to interface at once, operator started booking things one by one despite me mentioning in the prompt that I wanted to put all the pieces together before making all the bookings. It tried to book flight first and then rental car and then hotel, whereas I typically as a user look at all these things simultaneously. Given that all of these tasks happen on independent browser instances, running these tasks concurrently and have them talk to each other would be a much better way to tell the user all the travel options, trade offs between price , amenities and timing.

Personalized In-Line UI Components

While Operator’s chat interface is designed to automate workflows end-to-end, there are cases where preferences cannot be confidently inferred, forcing users to type unnecessary details and adding friction. The ideal approach should be to automate when possible, but when automation falls short, Operator should surface adaptive filter bubbles or selection prompts, increasingly personalized based on individual user behavior rather than offering the same choices to all users. This mirrors familiar patterns in consumer interfaces, like search + filters, which balance guidance with control.

A glimpse of this evolution can already be seen in the use case of hotel bookings, where Operator presents multiple options, recommends the best match based on user objectives (e.g., budget, proximity, amenities), and prompts the user with a “Confirm” action to proceed (as shown in the example screenshot below).

⁠

Over time, we expect chat interfaces to evolve from pure text into text + adaptive, personalized UI components, dynamically adjusting to user needs, reducing friction, and preserving the lightweight, conversational nature of interaction. As an example, Perplexity currently surfaces these as in-line chat widgets

⁠

Visualizing the Agent’s Process Flow for Complex Tasks

For complex, multi-step workflows, relying solely on chat becomes limiting. Users need the ability to see, edit, and interact with the agent’s reasoning as a structured flow, visualizing sub-tasks, branches, and outcomes, not just linear text. This concept of a prompt graph, where users guide agent behavior through editable workflows, mirrors paradigms already proven in the enterprise, particularly in low-code platforms like UiPath. While enterprise tools optimize for precision and compliance, the same structure, executed with better UX, could enable long-range consumer workflows as well, bringing clarity, control, and trust to autonomy at scale. Moreover, building a scaffolding on top of this through concepts such as

Agent Inbox⁠

is a helpful way to track progress, manage exceptions and guide the agent.

Form Filling & Action Hand-Offs

In travel booking workflows, Operator often completed navigation but handed control back to the user for final form submissions (e.g., traveler details, payment). While intended to minimize risk around sensitive actions, forcing manual intervention introduces unnecessary friction, especially when much of this information could be pre-filled through personalization. Notably, in some cases, we found that Operator could still be nudged to fill out parts of the form upon explicit user instruction, though this was not the default behavior. Allowing users to tune whether Operator completes forms automatically, or asks for confirmation before submission, would offer a more seamless and customizable experience. A visual process flow view would further enhance trust. By transparently mapping steps like navigation, form filling, and payment, users could review or approve final actions upfront, balancing control with automation.

Agent Explainability

Agent Reasoning Transparency & Personalized Control

While Operator surfaces browser actions during task execution, it often does not make its underlying reasoning, such as decision thresholds, trade-offs, or search strategies, explicit to the user. During testing, we had to actively ask questions like:

What rating thresholds did you use to select this hotel?

How many options did you consider before making a decision?

What parameters influenced your map analysis?

Although Operator answered these questions reasonably when prompted, the burden remained on the user to interrogate the agent, highlighting a gap in default reasoning explainability. Exposing lightweight reasoning traces inline (e.g., thresholds applied, options considered, trade-offs made) would build trust without disrupting task flow. Importantly, users differ in how much reasoning visibility they want. Some users may prefer frequent clarifications early on to build confidence. Others, over time, will prefer fewer interruptions as personalization strengthens. Thus, allowing tunable control over reasoning transparency, from full explainability to streamlined execution, would make Operator adaptable to varying user trust levels.

Outcome Presentation and Decision-Making

In workflows with multiple viable outcomes (e.g., booking a hotel, selecting a product), Operator often surfaces a single recommendation without presenting underlying alternatives or trade-offs. For many users, particularly in high-cost, high-sensitivity domains like travel, this limits decision confidence. Providing users with a personalized shortlist of options, annotated with the agent’s rationale and key trade-offs (e.g., price vs. location, amenities vs. cost), would improve decision-making while maintaining efficiency. High-trust users could choose to follow a recommendation immediately, while others could explore curated alternatives based on their risk and preference profile. Again, tunable control could allow users to define whether they want to see: A single best recommendation; A curated shortlist with trade-offs; Or broader search results when needed. This flexibility would better align Operator with user expectations across consumer and enterprise use cases. Here’s a reference screenshot of what it presented to be in a rudimentary format:

⁠

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.