Who gets to shape the voice of AI?
- Pamela Minnoch

- Sep 16
- 2 min read
If you ask most people where AI gets its "facts" they'll guess Wikipedia, maybe news sites, or the open web at large. And that's true to a point. But I was surprised recently when a Semrush analysis of 150,000 AI citations, the single biggest source, isn't a polished encyclopedia or a respected newsroom. It's Reddit, and by a mile.

For marketers, SEOs, business owners, this news means something practical: if you want your business to show up in AI overviews, ChatGPT responses or perplexity answers, having a footprint on Reddit dramatically increases your odds.
But Reddit isn't just another platform. It's a culture. It's messy, hilarious, creative and sometimes toxic. It's where the internet workshopped its memes, refined its slang, and tested its in-jokes. Wikipedia might supply facts, but Reddit supplies voice. And for AI models designed to converse like humans, that voice is gold. That's why datasets like The Pile were uploaded with Reddit-like content, and why in 2024 OpenAI, Google, and Anthropic all signed licensing deals with Reddit. They weren't chasing scale, they were chasing quality of discourse.
But the problem is the quality of the discourse comes with highs and lows.
Depending on what survey you trust, around two-thirds of Reddit users are men. The gender skew shows up in the conversations, and not always in ways that feel welcoming. Many women, queer people, and minorities report that while Reddit has brilliant sub-communities, the wider general spaces can feel dismissive or even hostile.
So we find ourselves at a crossroads. AI systems that will increasingly mediate how we learn, shop, and interact are being trained on voices drawn disproportionately from spaces that don't reflect everyone equally. These models aren't just learning facts, they're learning tone, cultural cues, and conversational styles.
That raises uncomfortable but necessary questions:
If one of the most influential inputs into AI systems is a platform skewed by gender and culture, what biases are being amplified?
How do we democratise the sources that shape AI's voice so it reflects the breadth of human experience, not just the loudest or most dominant groups?
What responsibility falls on tech companies, governments, and communities to broaden the data that models learn from?
I'm not dismissing Reddit. Its communities are ingenious, funny, and often generous. But if we're serious about building AI that reflects society rather than a subset of it, we can't ignore the imbalance in the raw material being used.
We have to remember that the future voice of AI isn't abstract. It's being built right now from the conversations we're having, and the ones we're not.
If AI is shaping the way AI talks, how do we make sure the future voice of AI reflects all of us not just the loudest corners of the internet?



Comments