UI automation? AI-based automation? You need both.


Recently, a number of leading AI companies have launched new capabilities that leverage their powerful foundational large language models (LLMs) to quickly automate many actions that people take on screens. Anthropic’s Computer Use, Amazon Q, and the upcoming OpenAI “Operator” all can quickly understand screens, operate the software being used, and emulate the user’s desired actions—without any coding or model training.

AI-based automation is a new way to automate. It is significantly different from UI automation, which relies on trained models and rules-based approaches to extract information and interact with screens, systems, and software. Because AI-based automation is so much simpler to use, some industry observers have suggested that it might supplant UI automation.

We have performed extensive evaluation of these new technologies, and we are excited by their potential to extend automation’s impact across enterprises and users. In fact, we are integrating the best of them into our platform. Because they allow AI to interact with software the way that humans do, we believe they can truly revolutionize interactions between people and screens. They hold the potential to boost personal productivity to new heights and allow practically anyone to become a citizen developer to automate their tedious, repetitive tasks. 

However, AI-based automation will never completely supplant UI automation in every process. For example, UI automation is a far better choice for high-volume, mission-critical automated processes that involve accessing multiple systems and working with sensitive or proprietary data. These types of processes abound throughout enterprises—and they are, in general, better handled by UI automation.

To understand why that’s so, let’s take a quick dive into how each approach works.

UI automation and LLM-based automation work differently—and that matters

AI-based approaches typically employ a multimodal LLM (understanding pictures, words, audio, etc.) to “read” a screen and take action. In broad strokes, these approaches take a series of pictures of the visible screen, transmit these pictures to their cloud, and run their models. Their LLM understands the information on the screen, predicts the actions the human would take, and then sends instructions to take the predicted action—for example, opening an application, copying, or entering data.

Conversely, in UI automation, robots follow a pre-developed set of instructions to complete defined tasks. They run within the environment of the customer and/or user. Data is only interpreted locally, and robots follow a clear, deterministic set of instructions. Recent AI-powered advances have significantly improved stability and reliability, addressing many of UI automation’s initial issues with brittleness and breakage.

The clear benefits of UI automation

The differences between these two approaches make ALL the difference when automating complex, high-volume, multisystem processes that require high security and accuracy. For these types of workflows, UI automation is a much better option. Here’s why:

Accuracy and completeness: Mission-critical processes like order-to-cash depend on the accurate extraction, movement, and posting of data from one place to another, as well as the documentation and communications surrounding these activities. In this area, AI-based approaches cannot match UI automation’s performance.

For example, an analysis of UiPath data shows that 96.5% of all our customers’ automations run successfully with our UI automation approaches. Publicly available data on AI-based automation suggests that it’s significantly less reliable. For example, Anthropic reported a 14.9% accuracy rate in a test designed to evaluate developers’ attempts to have models use computers—far below the human skill level of 70-75%. While accuracy will surely improve over time, there’s a long way to go before achieving parity with UI automation.

There are other issues, as well. All LLMs are prone to hallucinations and can take unpredictable actions. For example, Anthropic researchers noted instances where its LLM suddenly went off task—clicking the wrong screens or inexplicably downloading photos of national parks. UI automation’s deterministic robots simply lack the capacity to go rogue like that.

Then there’s the matter of completeness. An approach that takes pictures of the visible screen may miss data on dropdown lists that extend below the margins. And it might overlook short-lived actions that weren’t occurring when the pictures were taken. UI automation doesn’t have these issues.

Security and governance: When it comes to ensuring privacy, blocking malicious incursions, and keeping proprietary data within firewalls, UI automation is a significantly less risky choice. For example, with UI automation, only the data that is needed is gathered. In contrast to AI-based automation, UI automation involves no wholesale extraction of screenshots that may inadvertently contain sensitive data. Moreover, the UiPath software robots that perform our UI automation can be credentialed and their access to sensitive data controlled. That level of security is not currently available with AI-based automation.

Computational load and speed: AI-based automation is computationally intensive and requires moving data from local computers to the cloud and back again. In contrast, completing UI automation’s prescribed, directed actions on a screen in a local environment requires relatively little processing power.

Moreover, all the complex back-and-forth from a local environment to an external LLM makes AI-based automation slower than UI automation executed locally in the customer’s or user’s environment. These differences result in material differences in costs and cycle times, particularly when processing many millions of transactions a year.

For us, it’s not either/or—it’s both

As we look toward the future, it’s clear that the rise of AI-based automation represents a great leap forward for certain types of processes and activities. The world is close to realizing the dream of putting on-the-fly, no-code, prompt-driven automation into the hands of virtually everyone that uses screens and software—ushering in a new era of personal productivity and performance unlike anything we have seen before.

We are already taking steps to bring these types of capabilities into the UiPath Platform™. In particular, we’ll soon be incorporating it into our end-user experiences like Autopilot™ for Everyone—as well as providing AI-based automation as an additional option for citizen developers and automation experts.

We know that enterprises will want to take advantage of these new capabilities—but want to do so safely and with full control. Therefore, we have been expanding our platform’s capabilities to provide the necessary orchestration, management, and governance that enterprises require, regardless of which model, or models, they adopt.

But even as we expand the functionality and support for AI-based automation, we are continuing to advance our UI automation capabilities—because UI automation will be the best solution for a wide range of critical enterprise processes. We will continue to leverage emerging AI advances to make our UI automation even more intelligent, more facile at understanding and acting without significant coding and training, and more resilient. A prime example: the new UiPath Healing Agent (now in public preview), which can self-heal failing automations.

In short, UiPath believes in a future of AI-powered automation in all its forms, including both UI- and AI-based approaches. Each has unique strengths; each is the better choice for a particular set of automation opportunities. Our goal is to make both available—along with any new AI approaches that emerge—through an enterprise platform that can orchestrate, govern, and manage the full panoply of automation options available both today and in the future.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top