What we actually know about Suno and Udio's training data
Nobody outside the courtroom knows what Suno or Udio actually trained on, and anyone telling you otherwise is reading a proxy. The viral tool going around lets you search public datasets that circulate among AI developers, the largest being LAION-DISCO-12M, around 21 million tracks across four sets. It shows what is floating around the AI world, not proof of what any company used. The Atlantic, which built the tool, says so directly: the report does not prove that any specific company used any specific dataset. The real training data is sealed. Suno and Udio are fighting in federal court to keep their training figures concealed, and a decisive ruling in the Sony case is expected in July 2026. So when you look yourself up and see your songs, you have found them in an open research dataset, not inside Suno. That distinction is the whole story, and almost nobody is making it.
The tool everyone is using checks the wrong thing
The thing blowing up your feed, artists pulling up their own names and finding dozens of their songs, is real and worth paying attention to. It is built on four datasets that a researcher found in academic papers and AI data-sharing sites, holding more than 21 million recordings between them. Major names are in there: Taylor Swift, Bad Bunny, and a very long list.
But here is the part that gets lost in the screenshot: those are open datasets that circulate among developers. They are evidence of what is available to train on, not proof of what any one company put into its model. The reporting itself is careful about this. The confident version, "my music trained Suno," is a leap the data does not support yet.
What is actually sealed, and why it matters
The real answer lives in discovery, and discovery is locked. In the major-label cases against Suno and Udio, both companies are fighting to seal their training-data figures, arguing competitors could use the numbers against them. A pivotal summary-judgment hearing in the Sony case is on the calendar for July 2026.
Until that opens up, the actual contents of those training sets are unknown to the public. We have allegations and audio-fingerprinting estimates from the plaintiffs' side. We do not have the receipts. Treating a public proxy as if it were the sealed truth is how a real issue turns into a bad argument.
Who benefits from the panic
Watching this play out feels like déjà vu. Remember how streaming "wasn't ready" for artists until every major label had quietly locked in its licensing deals? Same playbook here: create legal pressure, sue loudly, let the dust settle, then emerge with licensing agreements. UMG settled with Udio. Warner settled with Suno. The litigation doubles as a negotiating tool.
And just like streaming, there is no guarantee any of that trickles down. The labels can spend months talking about protecting creatives from AI, then sign deals where the actual creators see pennies on the dollar. So when the narrative gets loud and certain, ask who the certainty serves. It is usually not the writer or the producer.
The part nobody wants to sit with
There is an uncomfortable possibility inside the sealed data: it might not match the tidy story. The real training sets could lean more on independent and unprotected catalog than on major-label hits, or they could not. We do not know. That is exactly the point. Certainty is being sold on every side of this right now, and it has not been earned by anyone outside the litigation.
What you can actually control
You cannot audit Suno's sealed training data. You can make sure your own house is in order so that if statutory damages, opt-outs, or licensing frameworks ever materialize, you are actually eligible to participate. The single biggest gap I see: artists assume being on Spotify means they are registered. It does not. Registration with the U.S. Copyright Office is what unlocks statutory damages, and most catalogs I look at have real holes in it.
If you want to see where you stand, you can check your catalog for free at copyrightcheck.app, and the companion guide below walks through why distribution is not registration.
Trying to make sense of where this lands for you?
Get a straight read, no hype, no panic. $100/hour, 60 minutes, phone or video.
Book a consultation