Silhouettes of laptop and mobile device users are seen next to a screen projection of the YouTube logo.
Dado Ruvic | Reuters
Google is using its expansive library of YouTube videos to train its artificial intelligence models, including Gemini and the Veo 3 video and audio generator, CNBC has learned.
The tech company is turning to its catalog of 20 billion YouTube videos to train these new-age AI tools, according to a person who was not authorized to speak publicly about the matter. Google confirmed to CNBC that it relies on its vault of YouTube videos to train its AI models, but the company said it only uses a subset of its videos for the training and that it honors specific agreements with creators and media companies.
“We’ve always used YouTube content to make our products better, and this hasn’t changed with the advent of AI,” said a YouTube spokesperson in a statement. “We also recognize the need for guardrails, which is why we’ve invested in robust protections that allow creators to protect their image and likeness in the AI era — something we’re committed to continuing.”
Such use of YouTube videos has the potential to lead to an intellectual property crisis for creators and media companies, experts said.
While YouTube says it has shared this information previously, experts who spoke with CNBC said it’s not widely understood by creators and media organizations that Google is training its AI models using its video library.
YouTube didn’t say how many of the 20 billion videos on its platform or which ones are used for AI training. But given the platform’s scale, training on just 1% of the catalog would amount to 2.3 billion minutes of content, which experts say is more than 40 times the training data used by competing AI models.
The company shared in a blog post published in September that YouTube content could be used to “improve the product experience … including through machine learning and AI applications.” Users who have uploaded content to the service have no way of opting out of letting Google train on their videos.
“It’s plausible that they’re taking data from a lot of creators that have spent a lot of time and energy and their own thought to put into these videos,” said Luke Arrigoni, CEO of Loti, a company that works to protect digital identity for creators. “It’s helping the Veo 3 model make a synthetic version, a poor facsimile, of these creators. That’s not necessarily fair to them.”
CNBC spoke with multiple leading creators and IP professionals, none were aware or had been informed by YouTube that their content could be used to train Google’s AI models.
Google DeepMind Veo 3.
Courtesy: Google DeepMind
The revelation that YouTube is training on its users’ videos is noteworthy after Google in May announced Veo 3, one of the most advanced AI video generators on the market. In its unveiling, Google showcased cinematic-level video sequences, including a scene of an old man on a boat and another showing Pixar-like animals talking with one another. The entirety of the scenes, both the visual and the audio, were entirely AI generated.
According to YouTube, an average of 20 million videos are uploaded to the platform each day by independent creators by nearly every major media company. Many creators say they are now concerned they may be unknowingly helping to train a system that could eventually compete with or replace them.
“It doesn’t hurt their competitive advantage at all to tell people what kind of videos they train on and how many they trained on,” Arrigoni said. “The only thing that it would really impact would be their relationship to creators.”
Even if Veo 3’s final output does not directly replicate existing work, the generated content fuels commercial tools that could compete with the creators who made the training data possible, all without credit, consent or compensation, experts said.
When uploading a video to the platform, the user is agreeing that YouTube has a broad license to the content.
“By providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use that Content,” the terms of service read.
“We’ve seen a growing number of creators discover fake versions of themselves circulating across platforms — new tools like Veo 3 are only going to accelerate the trend,” said Dan Neely, CEO of Vermillio, which helps individuals protect their likeness from being misused and also facilitates secure licensing of authorized content.
Neely’s company has challenged AI platforms for generating content that allegedly infringes on its clients’ intellectual property, both individual and corporate. Neely says that although YouTube has the right to use this content, many of the content creators who post on the platform are unaware that their videos are being used to train video-generating AI software.
Vermillio uses a proprietary tool called Trace ID to asses whether an AI-generated video has significant overlap with a human-created video. Trace ID assigns scores on a scale of zero to 100. Any score over 10 for a video with audio is considered meaningful, Neely said.
A video from YouTube creator Brodie Moss closely matched content generated by Veo 3. Using Vermillio’s Trace ID tool, the system attributed a score of 71 to the original video with the audio alone scoring over 90.
Vermillio
In one example cited by Neely, a video from YouTube creator Brodie Moss closely matched content generated by Veo 3. Trace ID attributed a score of 71 to the original video with the audio alone scoring over 90.
Some creators told CNBC they welcome the opportunity to use Veo 3, even if it may have been trained on their content.
“I try to treat it as friendly competition more so than these are adversaries,” said Sam Beres, a creator with 10 million subscribers on YouTube. “I’m trying to do things positively because it is the inevitable —but it’s kind of an exciting inevitable.”
Google includes an indemnification clause for its generative AI products, including Veo, which means that if a user faces a copyright challenge over AI-generated content, Google will take on legal responsibility and cover the associated costs.
YouTube announced a partnership with Creative Artists Agency in December to develop access for top talent to identify and manage AI-generated content that features their likeness. YouTube also has a tool for creators to request a video to be taken down if they believe it abuses their likeness.
However, Arrigoni said that the tool hasn’t been reliable for his clients.
YouTube also allows creators to opt out of third party training from select AI companies including Amazon, Apple and Nvidia, but users are not able to stop Google from training for its own models.
The Walt Disney Company and Universal filed a joint lawsuit last Wednesday against the AI image generator Midjourney, alleging copyright infringement, the first lawsuit of its kind out of Hollywood.
“The people who are losing are the artists and the creators and the teenagers whose lives are upended,” said Sen. Josh Hawley, R-Mo., in May at a Senate hearing about the use of AI to replicate the likeness of humans. “We’ve got to give individuals powerful enforceable rights and their images in their property in their lives back again or this is just never going to stop.”
Disclosure: Universal is part of NBCUniversal, the parent company of CNBC.
WATCH: Google buyouts highlight tech’s cost-cutting amid AI CapEx boom

https://www.cnbc.com/2025/06/19/google-youtube-ai-training-veo-3.html