TECH

Apple Intelligence – The Test Apps That Paved the Way for Apple's Generative AI

Internally, Apple used several test apps before launching Apple Intelligence.

Apple Intelligence is the product of more than a year of relentless testing. Here's what Apple engineers used to ensure the quality of its AI software.

For Apple, 2024 is undoubtedly the year of artificial intelligence. The company has long been working on machine learning features, and its latest operating systems usher in a whole new set of AI-powered enhancements known collectively as Apple Intelligence.

While the generative AI tools themselves were announced in June at WWDC 2024, only a few of them made their public debut with the first developer betas of iOS 18.1 and macOS 15.1. Since then, Apple has been releasing more and more AI-powered improvements with subsequent beta releases.

At the time of writing, the iOS 18.1 and macOS 15.1 updates are nearing the end of beta testing, while the first developer beta of iOS 18.2 has just been released. Months after the big announcement, some Apple Intelligence features are still only available in beta versions of Apple’s operating systems.

According to people who spoke to AppleInsider and accurately revealed many of Apple’s Intelligence features months before launch, the company spent a year working on its internal generative AI tools before they were finally released to the public.

During development, Apple tried to keep the full scope of its AI efforts secret. Individual AI projects were given their own code names, as was the case with the email categorization feature known as Project BlackPearl.

However, Apple Intelligence as a whole was known by the codename Greymatter, an unmistakable reference to the type of tissue found in the human brain. Some of Apple's internal test apps also had names that obscured their overall purpose.

The internal tools Apple used for testing before Apple Intelligence went public

During the development of Apple Intelligence, Apple used at least two dedicated test apps and environments to test its AI software.

Apple used several test apps during the development of iOS 18 and macOS Sequoia.

The two apps in question are known as 1UP, a reference to Nintendo’s popular Super Mario series, and Smart Replies Tester. The latter is aptly named, given that AI-powered “smart replies” have since appeared in release versions of Apple’s operating systems, in the Mail and Messages apps.

We’re told that internal distributions of iOS 18.0 and macOS 15.0 Sequoia included many of the core Apple Intelligence frameworks used in the iOS 18.1 and macOS 15.1 public betas.

The frameworks were needed for testing and were included alongside the standard development and configuration utilities found in Apple’s internal operating systems.

Various AI-related features could be toggled via feature flags using the Livability app. 1UP and Smart Replies Tester, two well-known AI applications, were used by Apple engineers to test various aspects and use cases of Apple Intelligence.

1UP – Testing Text Generation with AI Models

The 1UP app, found even in the earliest internal builds of iOS 18 and macOS Sequoia, was used to test generative AI features related to text. The app itself included a variety of different test options and parameters that could be tweaked as needed.

The 1UP app included tests related to text generation and references to the Ajax large language model.

People familiar with the app told AppleInsider that it contains direct references to Apple’s long-known internal LLM, or large language model, known as Ajax, which may be running on the device.

The 1UP app includes several test variants, organized into different sections. One of the tests involves text generation. This part of the app was used to test “autoregressive text generation from a suggestion,” people familiar with the app told us.

It allowed its users to choose between different AI models, including the aforementioned Ajax LLM on-device model. The app also has a setting to adjust the maximum number of tokens generated, which can be set between 30 and 100, with the default being 48.

1UP – Document Analysis, Topic Analysis, and Text Understanding

From what we were told, it is clear that Apple has placed a lot of emphasis on document and file understanding for the AI. Some of the tests found in the 1UP app focused on document and text analysis. Regardless of whether the user input consisted of raw text, a PDF file, or a Word document, Apple's software had to identify key information in the text, such as phone numbers, addresses, languages, and the author of the text, if applicable.

A mockup of 1UP's user interface, based on information provided to us by people familiar with the app.

Web search history from Safari and conversations from Messages could also be analyzed for keywords, or “themes,” as they were known in the app. These could be words that were repeated frequently, or words that seemed central to the text. Apple-specific terms were also recognized, and key sentences were highlighted.

The app could also match information found in text or a document with information about the user, such as whether a phone number was saved in the user's Contacts, or whether an event was found in Calendar.

The significance of 1UP tests and tips about Apple Intelligence

1UP’s tests provided hints at what would eventually become Apple Intelligence features, such as a revamped Siri with personalized context and Writing Tools. With Apple Intelligence, you can edit text and create text summaries of a user’s conversations that highlight key details like names, dates, and locations.

Apple’s own AI suggestions also showed that the company was exploring multiple levels of summaries, including summaries that are just 10 or 20 words long. AppleInsider paraphrased many of these suggestions before they were published.

The 1UP tests hint at what Apple wanted to do with Safari, which was have its AI use information from the web pages a user visits. That idea eventually led to the predictive search feature now known as Highlights.

Text generation and document analysis are now handled by ChatGPT, not Apple’s AI

With the first developer beta of iOS 18.2, Apple has significantly improved Siri by integrating with OpenAI’s ChatGPT. Queries and requests that Siri can’t handle are passed on to ChatGPT, though only with the user’s explicit approval.

Tests in the 1UP app appear to reflect the functionality made possible by ChatGPT integration in iOS 18.2.

iOS 18.2 also introduces a new splash screen that outlines some of the key features made possible by ChatGPT integration, such as text generation in Writing Tools and document analysis.

The 1UP app contains tests for much of the same things, suggesting that Apple may have wanted to implement ChatGPT-like features independently, using its own AI models.

Along with the 1UP app, Apple used another internal app known as Smart Replies Tester.

Smart Replies Tester – Evaluating AI-Generated Responses

In iOS 18.1, Apple introduced AI-powered Smart Replies in Mail and Messages. The feature makes it much easier to respond to an email or message in Apple's built-in apps.

Internally, Apple used a dedicated app to test Smart Replies. It instantly generated multiple responses based on the text you entered.

On iPhone, Smart Replies appear as response suggestions above the keyboard in Mail. Apple Intelligence can generate answers to direct questions that the user can answer, but it is often less useful in other situations.

Smart Replies Tester was apparently created specifically to test how well Apple’s AI could generate a response and how quickly. The app measured the response generation time in milliseconds.

According to people familiar with the matter, the internal app consists of several test menus where users can enter text and instantly receive several AI-generated Smart Replies. This happens entirely on-device, and the responses change as soon as the entered text is changed in any way.

Smart Replies can be found in the Mail app on iOS 18.2.

The app can also be used with the image caption model, which is downloaded separately. Bulk captioning of images was also possible. In a related feature in the iOS 18.1 beta, the Photos app now includes a much-improved search feature that allows people to find images containing specific objects or locations with relative ease.

While Smart Replies Tester is clearly about AI, other internal apps also offer insight into Apple’s approach and mindset when it comes to AI.

Megadome — Your Personal Context, All in One App

Another internal Apple app, Megadome serves as the perfect visual companion for Siri’s upcoming personal context feature, which is powered by Apple Intelligence.

The Megadome app collects user data and organizes it into different categories.

According to people familiar with the matter, the Megadome app can collect relevant information about a user, sort it into categories, and present it in neatly organized cards.

The app can display the most important information about the user, including their full name, important locations, relationships, groups, contact information, organizations, installed software, and more. Megadome appears to collect this information from system apps that the user has interacted with.

This information can also be viewed in the form of a so-called “reality graph,” which visualizes the relationships between entities and locations in a diagram.

Why Apple Made Megadome and What Features It Reflects

While the idea of ​​an app that knows everything about you may seem like a nightmare at first glance, the app is just a tool for internal use, not something created for the general public. Its existence ultimately makes sense when everything is considered in context.

Apple's Megadome app offers some insight into the company's thought process.

With Apple Intelligence, Siri will be able to process natural language. The virtual assistant will also have a clear understanding of the user's so-called personal context as a result of the AI ​​upgrade.

This means that Siri will be able to understand facts about the user's life – the different people and places that are important to them. In many ways, the Megadome app is the embodiment of this idea. Apple wanted to create a tool that could understand important aspects of someone's life and use those details to help the user.

What does this mean for the future of Apple Intelligence?

Apple's internal apps often serve as an accurate indicator of what will happen in the near future. While they may contain various puns, memes, and obscure inside jokes, the company’s test apps can reveal quite a bit about features in the works.

Apple’s test apps often contain tidbits about upcoming features, like those powered by Apple Intelligence.

While names like 1UP, GreyParrot, and Megadome may not mean anything to the average user, almost everyone has used Calculator or tested Apple Intelligence in some form.

This phenomenon is hardly new. Back in 2020, an internal app known as Gobi painted a pretty good picture of what would eventually become App Clips. If any information about future test apps emerges, we’ll likely be able to infer an upcoming feature.

In the meantime, the iOS 18.2 update introduces a number of highly anticipated Apple Intelligence features. Image Playground and Visual Intelligence are among the key updates found in iOS 18.2.

Follow AppleInsider on Google News

Leave a Reply