Turning Asean’s AI data challenges into opportunities

I show You how To Make Huge Profits In A Short Time With Cryptos!

THE Southeast Asia (SEA) region is quickly emerging as a global powerhouse in AI.

According to recent forecasts from IDC, AI and generative AI spending across APAC and including SEA is expected to hit an astounding $110 billion by 2028, expanding at an annual rate of 24 percent.

This growth positions APAC as a central player in driving advancements in AI and related technologies. Yet, as AI models evolve, so does the demand for immense volumes of data — a need that comes with significant challenges around data quality, privacy and the risk of overtraining.

COMPLEX JOURNEY. As AI continues to evolve, organizations need practical strategies to maintain momentum without overextending resources. CONTRIBUTED PHOTO from HITACHI VANTARA

Data deluge and quality control

To understand the scale, consider this: ChatGPT was trained on 300 billion words. For context, reading a novel every day for 80 years would cover less than 1 percent of that. Even that number pales in comparison to models like Databricks’ DBRX, which was trained on 12 trillion data points before GPT-4.

Get the latest news


delivered to your inbox

Sign up for The Manila Times newsletters

By signing up with an email address, I acknowledge that I have read and agree to the Terms of Service and Privacy Policy.

Experts warn that the demand for training data could outpace the world’s supply of public human text data by as early as 2026.

For SEA, where language diversity, cultural nuance and varied data standards add to the complexity, collecting high-quality, accurate data becomes even tougher. The region’s linguistic landscape alone poses a unique challenge; over 1,200 languages are spoken across the region, with Indonesia accounting for over 700 languages and the Philippines for around 175 recognized languages.

This vast linguistic diversity complicates standardized data collection and processing, as Natural Language Processing (NLP) systems often struggle to accommodate the wide range of local dialects and languages. If AI relies on low-quality or biased data, it risks producing unreliable or skewed results. And there’s more at stake: data privacy is a crucial issue. Without careful regulation, the push to gather more data could undermine public trust.

Overtraining woes

Overtraining presents a distinct challenge. AI models that are too finely tuned to their training data often fail when applied to new information, resulting in limited adaptability and accuracy. The problem intensifies if these models are trained on data that is, itself, AI-generated. This could lead to a feedback loop of biases.

However, synthetic data, or data generated artificially, has its uses. It’s especially valuable in fields like autonomous vehicles or life sciences, where real-world data can be scarce. By generating simulated scenarios, AI models can learn and adapt, even when real data is hard to come by.

But this method isn’t without costs. Creating synthetic data requires intense Graphics Processing Unit (GPU) processing, which translates into high energy demands and operational expenses. In this light, the AI industry’s current pace begins to look less like a sprint and more like a marathon.

Balancing act for long-term success

That’s a lot to unpack, both in terms of data and the potential consequences. While AI holds transformative promise, it’s also generating a kind of “AI fatigue.” Nearly 90 percent of AI proof-of-concept projects won’t progress to production soon — a reality check for many who expected rapid returns.

For organizations hoping to succeed, it’s about managing expectations and adopting a sustained approach. While only a few current AI initiatives may deliver substantial breakthroughs, those few have the potential to reshape entire sectors. As Everest Group recently pointed out, the AI journey may be challenging, but its long-term rewards make the effort worthwhile.

As AI continues to evolve, organizations need practical strategies to maintain momentum without overextending resources. Here’s how to maximize efficiency and prepare for the road ahead.

Embrace small language models

Large language models (LLMs) may get most of the attention, but smaller language models (SLMs) have unique benefits. SLMs are derived from LLMs but refined to focus on specific tasks, allowing organizations to tailor models to particular needs.

Imagine designing a system to monitor train operations. An LLM, packed with general information, might be too broad to be effective. An SLM, however, can be trained specifically on operational guidelines and technical details, making it a better fit for that job.

For broader inquiries, such as teaching a child about butterflies, an LLM’s expansive knowledge might serve better. While SLMs offer efficiency and cost savings, they’re most effective with highly concentrated data — an important consideration as companies scale their AI applications.

Upgrade data infrastructure

GPUs are essential for powering today’s AI models, but their high energy demands can conflict with sustainability goals. By enhancing the infrastructure surrounding GPUs, companies can improve performance while reducing environmental impact. Consider these steps for a more balanced infrastructure:

– Utilize tools for data cleansing and labeling.

– Partner with suppliers committed to sustainability.

– Choose storage options with Energy Star certification.

– Collaborate with eco-conscious partners to optimize performance.

Strategically investing in sustainable infrastructure supports AI’s growth without excessive power consumption.

AI as a collaborative process

The road to effective AI solutions isn’t straightforward, and achieving success requires input from all areas of an organization. A team approach helps reduce bias and leads to more practical outcomes. Whether you’re using a model like Llama3 or a customized SLM, the key is to match the tool to your goals.

Identify your organization’s specific objectives, clarify your desired outcomes, and then plan a structured approach to reach them.

AI’s long road ahead

AI development is more of a marathon than a sprint, with challenges like data limitations and model overtraining to overcome. But AI capabilities are growing: techniques like retrieval-augmented generation (RAG) are becoming standard, and more efficient methods for SLM creation are on the horizon.

Modernizing data infrastructure is also helping make AI systems more scalable and eco-friendly.

The journey may be complex, but in SEA, where innovation is strong and AI investment is rising, developments in scale, simplicity and sustainability are bringing AI closer to a balanced and impactful future.

Joe Ong is the vice president and general manager for Asean at Hitachi Vantara, a global provider of data storage and infrastructure solutions, including a variety of services on data management and AI-powered hybrid cloud solutions for enterprises.

Be the first to comment

Leave a Reply

Your email address will not be published.


*