The future runs on fake data—and so do its jobs

April 7, 2025

Director of Synthetic Data” wasn’t a job five years ago. Now, Visa, Mastercard, and OpenAI are hiring for it. At OpenNova, we’re mapping these hiring shifts in real-time as clients tell us, “We don’t always have enough data to train our next wave of Generative Artificial Intelligence (GenAI) models.”

Many predict we’ll run out of high-quality organic data (text, images, video) within the next few years. Even when we have the data available, there are increasing privacy concerns about how personal data gets used or protected inside GenAI systems.

AI has always learned from human-generated data—books, images, conversations, transactions. But now, it’s starting to learn from itself, creating a synthetic feedback loop where models generate the very data they train on. Inside OpenNova, this forces us to rethink AI talent² completely: We’re not hiring people to work with data anymore—we need people who can build the (synthetic) data.

We already see healthcare examples including the University of Florida Health and NVIDIA training SynGatorTron on over 2 million patient records to generate synthetic media and data, —allowing AI to model rare diseases without violating privacy laws.

While Visa only mentioned “synthetic” once in its 2024 financial report, the reality looks much different. In fact, around five years ago, Visa began using GenAI to create synthetic fraudulent transactions—fake payment records designed to mimic real-world fraud patterns. The result? Authorization models score transactions based on risk, even in cases where accurate fraud data is scarce.

Visa appears to be forming an entire department reporting to the Director of Synthetic Data—Data & AI Platforms to pioneer progress in a highly regulated arena. What intrigues our team is the sheer responsibility and weight conferred on the role over traditional directorial analytics positions. Unlike regular analytics directors who study data, you’d create it—crafting fake fraud with generative AI like GANs and VAEs, using TensorFlow, PyTorch, and Differential Privacy. They will need at least 10+ years in AI/ML and 5+ in synthetic data to assume the role.

While Visa’s director is figuring out how to stop fraud before it happens using AI-generated transactions, Applied Intuition’s Head of Synthetic Datasets is making sure self-driving cars can “see” the road—by training AI on fake but realistic driving scenarios before they ever hit the street.

The pressure on him will be immense: The models his team develops train the perception systems of autonomous vehicles. If your synthetic data is wrong, cars crash. We can imagine synthetic environments that simulate edge cases, from a child running into traffic at dusk to an icy mountain road where human drivers fail. Like Visa,

Both leadership positions at Visa and Intuitive are not for the faint-hearted. If an AI model fails at Visa, fraud spikes. For Applied Intuition, failure means self-driving cars misinterpret reality. The latter requires an MSc or PhD in computer science robotics: You must know machine learning inside out as you race to bring complex projects across the finish line.

“Through 2026, 60% of AI projects will be abandoned—simply because the data isn’t ready.”
—Gartner

We are vested in where companies like Applied Intuition are heading regarding synthetic data. In Florida, JM Family has been making AI investments through SKAVISION, focusing on AI-driven computer vision. The tech ingeniously turns security cameras into a brain for dealerships to flag a vehicle waiting too long at an inspection bay. AI detects the slowdown and alerts staff before bottlenecks pile up.

In the future, we envision how synthetic data could help train AI models on rare or unseen scenarios, such as:

Unusual service lane layouts that don’t exist in their current dataset.
Edge cases, like sudden rush-hour surges or unexpected equipment failures.
Low-light or obstructed camera situations where actual footage is limited.

Over time, we expect to converse with them and others about crucial synthetic engineering roles and the risks they face in using this new technology. For instance, it’s becoming clear that synthetic data is prone to model collapse. If AI only trains on its fake data, it’s like photocopying a photocopy—things get fuzzy, and decisions get sloppy.

Consequently, we are moving fast to nurture our US and nearshore passive talent² pools and gear up for the synthetic future. At OpenNova, we’re tracking the First 100 Synthetic Data Engineers—the pioneers shaping AI’s next evolution. These candidates must be up for the challenge of training AI without breaking its connection to reality.

Ask Lexset, the AI startup behind Seahaven, an on-demand synthetic data generator for 3D assets. It’s building a team of synthetic data engineers to tackle the sim-to-real transfer problem, ensuring AI models trained in synthetic environments can perform reliably in unpredictable real-world conditions.

These synthetic engineering hires need expertise in 3D modeling, procedural content generation and AI simulation techniques to create synthetic datasets that capture real-world complexity. They must be skilled in Python, C++, Unreal Engine, NVIDIA Omniverse, and synthetic data validation frameworks, ensuring AI models generalize beyond perfect lab conditions.

Without their work, AI risks learning from a world that doesn’t exist.

While a recent digital twin study found that hybrid datasets (synthetic + real data) outperformed purely real-world training, this tale has more twists. As a Director of Analytics or an engineer building the future of AI, your breakthroughs may challenge this assumption—pushing us toward a synthetic-only reality faster than we think.

Hiring managers should ask us, as elite talent² providers, whether we truly have the depth to source the Ph.D.-level robotics whiz or the GANs-and-PyTorch master.

And that’s a conversation we would love to have.

The future runs on fake data—and so do its jobs

April 7, 2025

“Through 2026, 60% of AI projects will be abandoned—simply because the data isn’t ready.”
—Gartner

Related Articles: