Stripping names from data used to be enough. In the age of AI, it no longer is — and that changes everything about how we think about privacy.
Imagine your hospital shares research data with scientists and your name isn’t in it because they have ensured privacy is preserved. It does not have your address nor any other general PII information. By legal definition, the data is anonymous and yet somewhere, an Ai model pieces together your Zip code, your age, and your diagnosis date and figures out exactly who you are. Well, this might sound like science fiction but its a reality in 2026 and forcing everyone to fundamental rethink of one of privacy law’s most important assumptions: that removing your name from a dataset makes it safe.
Spoiler: it doesn’t. Not anymore. Not in the era of unprecedent technology advancement and AI.
For decades, “anonymized data” was treated like a get-out-of-jail-free card where organization stripped out obvious identifiers like name, phone number, email, ID number and the data becomes harmless. Regulators like Europe’s GDPR, and America’s HIPAA all agreed to the approach offer lighter obligations for data that’s been properly anonymized. The logic was simple: if you can’t find the person, you can’t harm them. Organizations could share anonymized data freely for research, advertising, healthcare, and AI training and assumption was solid for a while.
“Anonymous data is no longer personal data.” That assumption is increasingly unstable.
~ Gaurav Mehta, CINO, Concur-Consent Manager
What’s changed? Three things arrived at roughly the same time: vastly more data, vastly cheaper computing, and vastly smarter AI. Together, they’ve made anonymization far more fragile than anyone expected and this will create a nightmare for organizations relying on general anonymization techniques.
Interesting case of Netflix
One of the most famous privacy disasters of the 2000s didn’t involve a hacker stealing passwords but movies what we watch wherein, Netflix had released an anonymized dataset of user movie ratings to help researchers improve its recommendation algorithm. It did not contain names, emails, just plain ratings. Researchers then cross-referenced those ratings with public IMDb reviews, and identified individual users. The dataset wasn’t as anonymous as anyone thought.
The lesson hit hard: data doesn’t need your name to reveal your identity. A handful of unusual behaviors, rating an obscure French film four stars, then a documentary about beekeeping – can make you statistically unique. Unique enough to find. That was in 2006 which is 20 years ago and today the problem is infinitely worse.
You leak more data than you think
Every day, your smartphone generates thousands of behavioral signals. Most of them seem trivial on their own. Together, they paint a portrait of you that’s more revealing than any government ID. Think about what your phone knows:
None of these signals include your name but together, they form what researchers call a behavioral fingerprint – a pattern so unique to you that it’s practically as reliable as your actual fingerprint. Here’s what makes this especially troubling: you can change your email. You can change your phone number. You can even change your name. But you cannot easily change how you type, how you walk, or how you move through a city and your behavioral fingerprint sticks with you.
AI has made re-identification cheap
Ten years ago, de-anonymizing a dataset required specialist researchers, expensive tools, and a lot of time. It wasn’t impossible, but the effort acted as a practical barrier. Most data was safe – not because it was truly anonymous, but because attacking it wasn’t worth the cost. Today, AI has demolished that barrier and bought such capabilities into the hands on common people. Modern machine learning is extraordinarily good at exactly the skills needed for re-identification: spotting patterns, connecting dots across datasets, filling in missing information, and building profiles from sparse clues. What once took weeks of specialist work can now happen automatically, at scale, in minutes.
The cost of re-identification is rapidly approaching zero.
And it gets worse because researchers have shown that just four or five location data points – home, office, gym, a favorite restaurant, can uniquely identify most people in a mobility dataset. No name required, just add in payment metadata, rideshare records, or social media check-ins, and “anonymous” location data becomes a direct line to an individual.
Combination is the real threat
The deepest flaw in traditional anonymization thinking is imagining each dataset in isolation. In the real world, data from dozens of different sources gets combined and combination is where anonymization collapses.
Each individual dataset may look harmless. Privacy risk isn’t a property of a single dataset. It’s a property of an ecosystem.
It’s not just about identity, it’s about inference
Here’s a shift that most privacy discussions miss: the new threat isn’t just that someone finds out who you are. It’s what they can figure out about you, even if your identity technically stays hidden. AI systems can now infer, from behavioral patterns alone:
Real harm, discrimination, manipulation, exploitation and doesn’t require anyone to know your name. It only requires knowing enough about you. Anonymization doesn’t protect against inference. It never was designed to.
Storing data now, unlocking it later
Some privacy researchers are now raising an even longer-term concern. What if adversaries like corporations, governments, or criminal organizations are collecting anonymized datasets today, knowing they can’t fully de-anonymize them yet, but betting that future AI or computing power will make it possible? So called, “store now, identify later” risk means anonymization isn’t just a snapshot judgment. It’s a bet on the future and given how rapidly AI capabilities are advancing, it’s a bet with increasingly poor odds.
What this means for the DPDPA?
Legal frameworks globally are slowly waking up to this reality and DPDPA is yet to develop jurisprudence . The key question is shifting from “is this data anonymized?” to “how re-identifiable is this data, with what technology, by whom, and at what cost?” which is much harder question as it puts pressure on every organization that handles data to stop treating anonymization as a checkbox — and start treating it as a continuous risk assessment.
For companies operating under India’s DPDPA, or GDPR in Europe, or HIPAA in healthcare, the implication is clear: regulatory exemptions for “anonymized” data were written in a different technological era. The law may say the data is safe but AI might disagree with you.
Privacy is no longer binary and it was never really a wall you either had or didn’t have, it was always a question of how hard you were to find, and how much that finding was worth to someone. What AI has changed is the effort required and in past finding you used to be expensive. Soon, it may cost nothing at all.