as the AI agents are approaching to accept real actions on our name (communication with someone, buying something, account settings, etc.). That's what they learned. This is not just a study of whether the agents can click on the right button, but rather if they can foresee the consequences of what can happen after they press it, and whether they must continue.
from researchers:
“While previous studies studied the mechanics of how AI agents could navigate UIS and understand the structure of UI, the effects and their autonomous actions that can be risky or original, which can be risky. We will explore the real influences and consequences of the actions of the mobile user interface undertaken by AI agents. ” Scrolling options. So, the study intended to take a few more steps.
In the study, the recruited participants were instructed to use real mobile applications and records that would make them feel uncomfortable if it is caused by AI without their permission. Things such as sending messages, password changes, editing profile data or creating financial transactions. Is this information, transactional, communicative or simply basic navigation? Influence on the user: Can this affect the confidentiality, data, behavior or digital assets of the user? Or in general?
Frequency: is that which is usually done from time to time or again? “Does this warn anyone else?” “Does this leave a trace?” And take this into account before acting on behalf of the user. After the data set was built, the team launched it through five large language models, including GPT-4, Google Gemini and Apple's own ferry to see how well they can classify the influence of each action. result? The Google Gemini performed better in the so -called tests with a zero shot (an accuracy of 56%), which measure how well the AI can perform tasks that it has not been clearly trained for. Meanwhile, the multimodal version of the GPT-4 headed a package (an accuracy of 58%) in assessing the effects with a step-by-step, reasonable step using the methods of the prediction chain. (“Book a flight”, “cancel this subscription”, etc.), the real security task is to ensure that an agent who knows when to ask for confirmation or even when not to act at all. Do what people actually want, Apple's study adds a new dimension. This questions how good the AI agents foresee the results of their actions and what they do with this information before they act.
airpods max: $ 529.00 usb-c to libting (1 m): 16% discount, with $ 15.99 powerful adapter, double USB-C: 15%, for $ 49.98 magsafe Charge (1m): 15% of 32.9 magsafe carder (1m): 15% of 32.9 magsaf