Meta CEO Mark Zuckerberg's statement on artificial intelligence raised serious concerns after he said the company has more user data used to train ChatGPT—and will soon use it to train its own artificial intelligence systems.
The company's plan to use Facebook and Instagram posts and comments to train a rival chatbot raises concerns about both privacy and toxicity …
Zuckerberg announced the company's plan after the company's latest earnings report, as reported by Bloomberg.
For many people, Facebook is the Internet, and its user base continues to grow, according to the latest financial results from Meta Platforms Inc. But Mark Zuckerberg isn't just celebrating that continued growth. He wants to take advantage of this by using data from Facebook and Instagram to create powerful general-purpose artificial intelligence […]
[Zuckerberg said] “The next key part of our leadership is learning from unique data and feedback loops in our products… There are hundreds of billions of public images and tens of billions of public videos on Facebook and Instagram, which we estimate is larger than the Common Crawl dataset, and people also share large numbers of public text messages in comments on our services.” ;
Common Crawl refers to a huge archive of 250 billion web pages, which makes up the majority of the text used to train ChatGPT. By accessing an even larger set of data, Meta can create a smarter chatbot.
As Bloomberg notes, it's not just the sheer volume of data that can give Meta an advantage – it's that much of the data is interactive.
The pile of data it sits on is especially valuable because there's so much of it from comment threads. Any text that represents human dialogue is critical for training so-called conversational agents, which is why OpenAI has been heavily leveraging online forum Reddit Inc. to create its own popular chatbot.
But this article also points out two big red flags. First, Meta will effectively train its AI on highly private messages and conversations between friends in Facebook comments. This raises serious privacy concerns.
Secondly, anyone who has ever read a comment section anywhere on the Internet knows that the percentage of toxic content is high. While thoughtful users discuss issues, there is no shortage of commenters resorting to personal attacks and crude insults—and a worrying portion of them are racist and sexist.
This is something that concerns any chatbot. The learning system has to filter – and Apple is probably more careful than anyone else in its chatbot development efforts, which contributes to Siri's very late relaunch – but the situation here could be particularly bad.
Some content on Facebook that is marked as toxic is no longer viewed by humans and remains on the site. Even worse: when Zuckerberg said Meta had more data than Common Crawl, he was probably referring to the company's historical archive, which would include all the hyperbolic political content and fake news that was on the site before Zuckerberg made every effort to clean it.
And this is the company that just days ago said that the fake video of President Biden should be allowed to remain on the platform because it was edited by a human and not someone it's an artificial intelligence system. therefore, her standards even today are not very high.
Photo by Maria Shalabaeva on Unsplash