Author: Hartley Charlton
Apple researchers have developed an artificial intelligence system called ReALM (Reference Resolution as Language Modeling) that aims to radically improve the understanding and response of voice assistants to commands.
In a research paper (via VentureBeat), Apple describes a new system for how large language models solve the link resolution problem, which involves deciphering ambiguous on-screen links objects, and understanding spoken and background context. As a result, ReALM can lead to more intuitive and natural interactions with devices.
Reference resolution is an important part of understanding natural language, allowing users to use pronouns and other indirect references in conversation without confusion. For digital assistants, this capability has historically been a major challenge, limited by the need to interpret a wide range of verbal cues and visual information. Apple's ReALM system attempts to solve this problem by turning the complex process of link resolution into a pure language modeling task. In doing so, it can understand references to visual elements displayed on the screen and integrate this understanding into the conversation flow.
ReALM reconstructs the visual layout of the screen using textual representations. This involves analyzing screen objects and their locations to create a text format that reflects the content and structure of the screen. Apple researchers found that this strategy, combined with special fine-tuning of language models for link resolution tasks, significantly outperforms traditional methods, including the capabilities of OpenAI GPT-4.
ReALM could allow users to interact with digital assistants in a much more efficient way. whatever is currently displayed on their screen, without the need for precise and detailed instructions. This could make voice assistants much more useful in a variety of situations, such as helping drivers navigate infotainment systems while driving or helping disabled users by providing easier, more accurate means of indirect interaction.
Apple already has published several research papers in the field of artificial intelligence. Last month, the company introduced a new method for training large language models that seamlessly integrates both textual and visual information. Apple is expected to unveil a number of artificial intelligence features at WWDC in June.
Tag: Artificial Intelligence[ 89 comments ]