Apple researchers work to prevent AI to take actions that you have not approved

Apple continues to clarify the capabilities of the AI

1 facebook x.com reddit Bluesk Your iPhone on your behalf, but Apple researchers want them to know when to pause.

The recent article from Apple and the University of Washington studied this inequality. Their research was focused on teaching AI in order to understand the consequences of his actions on a smartphone.

Agents of artificial intelligence become better for daily tasks. These systems can navigate in applications, fill out forms, make purchases or settings of changes. They can often do this without the need for our direct input.

Autonomous actions will become part of the upcoming Big Siri update, which may appear in 2026. Apple showed his idea of where he wants Siri to go during the keys WWDC 2024.

The company wants Siri to perform tasks on your behalf, for example, ordering tickets for an event on the Internet. Such automation sounds convenient.

, but it also raises a serious question: what happens if AI presses the “delete the account” instead of “enter?”

Understanding the bets of mobile automation of the user interface

Mobile devices are personal. They store our banking applications, medical records, photos and personal messages.

AI, acting on our behalf, should know which actions are harmless and which can have long or risky consequences. People need systems that they know when to stop and ask for confirmation.

Most of the studies of AI were focused on ensuring the agents at all, such as buttons recognition, navigation of screens and the following instructions. But less attention was paid to what these actions mean for the user after he was accepted.

Not all actions carry the same risk level. Pressing “Report of update” is a low risk. Taping with “transfer funds” is a high risk.

Creating a map of risky and safe actions

research applied to workers participating in AI security experiments and the user interface. They wanted to create a “taxonomy” or a structured list of various types of effects that can have a user interface.

The team looked at issues such as & mdash; Is it possible to cancel the action of the agent? Does this only affect the user or others? Does this change confidentiality settings or costs money?

in the article shows how the researchers have created a way to mark any mobile application on several dimensions. For example, removal of a message can be reversible in two minutes, but not after. Sending money is usually irreversible without help.

Taxonomy is important because it gives AI the basis for a reasonable intention. This is a control list of what may go wrong, or why the action may require additional confirmation.

AI training to see the difference

The study collected examples of real estate, asking the participants to record them in a modeling mobile environment.

Modeling the effects of user interface operations on mobile interfaces. Image loan: Apple

instead of simple tasks with a low level of bets, such as viewing or searching, they are concentrated on actions with high rates. Examples included a change in account passwords, sending messages or updating payment information.

The team combined new data with existing data sets, which mainly covered safe, routine interactions. Then they annotated all this using their taxonomy.

Finally, they checked five large language models, including versions of the GPT-4 Openai. The research group wanted to see if these models could predict the level of exposure to the action or classify its properties.

The addition of taxonomy in the hints of artificial intelligence helped, increasing the accuracy of assessment when the action was risky. But even the most effective AI and MDash model; GPT-4 Multimodal & Mdash; I received this only about 58% of cases.

Why AI Safety for Mobile Applications

The study showed that the AI models often overestimated the risk. They will mark harmless actions as a high risk, such as cleaning the history of an empty calculator.

Such a careful bias may seem safer. Nevertheless, this can make helpers of artificial intelligence annoy or useless if they constantly ask to confirm when it is not necessary.

a web interface for participants to generate traces of the user interface with exposure. Image loan: Apple

more alarming (and not surprisingly), the models fought with nuanced judgments. It was difficult for them to decide when something was reversible or how this could affect another person.

users want automation, which is useful and safe. The AI agent who delements the account without asking, can be a disaster. An agent who refuses to change the volume without permission can be useless.

What will happen next to ensure that AI AIs

researchers They say that their taxonomies can help in development. For example, users can establish their own preferences about when they want to be asked to be approved.

the approach supports transparency and tuning. This helps designers of artificial intelligence determine where the models fail, especially when performing real tasks with high rates.

Automation of a mobile user interface will grow when AI becomes more integrated into our daily lives. Studies show that the training of AI can see the buttons not enough.

He must also understand the human meaning, standing behind a click. And this is a high task for artificial intelligence.

human behavior is dirty and dependent on the context. Pretending that the machine can resolve this difficulty without error, at best reflection, negligence in the worst case.