INSIDE

Apple releases details of new artificial intelligence model “MM1”

By Hartley Charlton

Apple researchers have developed a new method for training large language models (LLMs) that seamlessly integrates both textual and visual information.


The company's findings, detailed in a research paper entitled “MM1: Methods, Analysis and Pre-training results for multimodal LLM” demonstrates a new approach to creating smarter and more flexible artificial intelligence systems. Apple says that by using a diverse dataset that includes image-caption pairs, interleaved image-text documents, and text-only data, the MM1 model sets a new standard in AI's ability to perform tasks such as creating image captions, visually answering questions, and naturally high-fidelity language output.

Apple's research focuses on combining different types of training data and model architectures, allowing AI to understand and generate language based on a combination of visual and linguistic cues. . This capability is vital for tasks that require a detailed understanding of the world, such as interpreting complex images or answering questions that involve visual elements.

The paper also highlights the MM1's exceptional contextual learning capabilities, particularly in the largest model configuration consisting of 30 billion parameters. This version appears to demonstrate the remarkable capabilities of multi-step reasoning on multiple images using multi-step “chain of thought” hints, a technique that allows AI to perform complex, open-ended problem solving based on minimal examples.

This research is part of Apple's broader initiative to expand artificial intelligence capabilities in an increasingly competitive environment. Earlier today, Bloomberg's Mark Gurman reported that Apple is in talks with Google about licensing generative models from Google's large Gemini language to power new features coming to the iPhone as part of iOS 18.

Tag: Artificial Intelligence[ 43 comments ]

Leave a Reply