The news hit my inbox like a monsoon storm, a deluge of numbers and names that, for a moment, felt far removed from the dusty streets of Yangon. AfterQuery, a startup founded by two 23-year-olds, had just announced hitting $100 million in revenue, primarily from selling AI training data to giants like Anthropic and OpenAI. One hundred million dollars. For data. It is a staggering sum, a testament to the insatiable appetite of large language models for information, for the very essence of human communication packaged and processed.
Here, in Myanmar, the stakes are different. While Silicon Valley celebrates its latest crop of wunderkinds turning data into digital gold, we are often fighting for basic connectivity, for the right to speak freely online, for the very existence of our digital lives. The contrast is stark, almost painful. On one side, a thriving ecosystem where raw data is refined into multi-million dollar enterprises; on the other, a nation where internet shutdowns are a weapon, and access to information is a constant struggle. This isn't just about economic disparity; it's about the fundamental imbalance in who gets to shape the future of artificial intelligence, and whose voices are included, or excluded, from its foundational layers.
I have spent years reporting on how technology can be a lifeline here, how it connects us, informs us, and sometimes, even protects us. But this latest development from AfterQuery, while impressive in its own right, highlights a growing chasm. The data that fuels these powerful AI models, the narratives and experiences that teach them about the world, are overwhelmingly drawn from regions with robust infrastructure, stable governance, and open digital societies. What about the rest of us? What about the rich tapestry of languages, cultures, and histories that exist beyond the well-trodden digital paths?
"The current model of AI data acquisition is inherently extractive, not inclusive," says Dr. Myat Thiri, a computational linguist who runs a small, underfunded project archiving Burmese oral histories in Mandalay. "Companies like AfterQuery are brilliant at what they do, but they are operating within a system that prioritizes volume and accessibility over representation. Our languages, our unique cultural nuances, they are simply not part of the mainstream datasets, and if they are, they are often poorly labeled or misinterpreted. This leads to AI models that reflect a very narrow worldview." Her words echo a sentiment I hear often, a quiet frustration that our stories remain largely untold in the digital realm.
Consider the sheer volume of data AfterQuery must be collecting. Their success implies a sophisticated operation, likely leveraging advanced scraping techniques, human annotation, and quality control processes. They are providing the raw material, the digital clay, from which OpenAI's GPT models and Anthropic's Claude are molded. These models are then used globally, influencing everything from search results to creative writing, from medical diagnostics to educational tools. If the foundational data is biased, incomplete, or unrepresentative, then the AI built upon it will inevitably carry those same flaws.
"It is a classic case of 'garbage in, garbage out,' but on a global scale," explains U Hla Win, a veteran software engineer who now mentors young tech enthusiasts in a clandestine community center in Yangon. "If the data doesn't reflect the diversity of human experience, then the AI will only serve a fraction of humanity effectively. For us, it means AI tools that struggle with our language, misunderstand our cultural context, or simply ignore our specific needs. This is about survival, not convenience, when you think about how AI could be used for disaster relief, education, or even preserving our endangered languages." He emphasizes how crucial it is for AI to be robust and culturally aware, especially in crisis-prone regions.
One of AfterQuery's founders, Maya Sharma, was quoted in a recent TechCrunch article saying, "Our goal was to provide high-quality, diverse datasets to accelerate AI development. We saw a gap, and we filled it." While their entrepreneurial spirit is undeniable, the definition of "diverse" often remains skewed. Does it include data from regions where internet access is intermittent, where digital literacy is low, or where political instability makes data collection inherently difficult and dangerous? Often, the answer is no.
According to a report by the MIT Technology Review, less than 1% of the world's top AI datasets originate from low-income countries, despite these nations representing over 40% of the global population. This statistical imbalance is not just a footnote; it is a profound structural problem that perpetuates digital inequality. When AI models are trained predominantly on data from wealthier, more digitally advanced societies, they naturally perform better in those contexts and for those populations. For people in Myanmar, this means that the cutting-edge AI tools that promise to revolutionize industries and improve lives might remain largely inaccessible or ineffective.
Imagine a future where AI powered medical diagnostics struggle to identify diseases prevalent in Southeast Asia because they were trained on predominantly Western patient data. Or an educational AI that cannot effectively teach Burmese children because it lacks understanding of their curriculum, their cultural learning styles, or even their spoken dialects. These are not hypothetical scenarios; they are the logical outcomes of a data ecosystem that overlooks vast swathes of humanity.
The success of AfterQuery and similar companies should serve as a wake-up call. While we celebrate innovation and entrepreneurial achievement, we must also critically examine the foundations upon which these empires are built. The immense wealth being generated from AI data needs to be reinvested, not just into more powerful models, but into initiatives that democratize data collection, empower local communities to contribute their own information, and ensure that the benefits of AI are truly global.
Projects like the Myanmar Digital Rights Forum, though small, are working tirelessly to advocate for digital inclusion and data sovereignty. They understand that controlling our data, and ensuring its ethical representation, is paramount. We need a global effort, perhaps even a new kind of data commons, where communities in places like Myanmar can contribute their unique linguistic and cultural datasets in a way that is fair, ethical, and ensures their voices are heard by the machines that are increasingly shaping our world. The journey to a truly equitable AI begins not just with algorithms, but with the data that feeds them, and the people whose lives that data represents. Without that, the promise of AI will remain a luxury for the few, rather than a tool for liberation for all. For a deeper look at how AI is being shaped globally, you can follow updates on Reuters Technology.











