May 18, 2025
Trending News

How will AI develop at the edge?

  • February 28, 2024
  • 0

More than a year after ChatGPT launched, artificial intelligence (AI) is the most influential technology trend. This is now on the management agenda of almost every company. Generative

How will AI develop at the edge?

More than a year after ChatGPT launched, artificial intelligence (AI) is the most influential technology trend. This is now on the management agenda of almost every company. Generative AI will influence the everyday work of many people and bring about drastic changes.

We will be using some AI applications every day on our phones, laptops, tablets, smartwatches and other “smart” devices like cameras. These are located at the edges of the Internet and corporate networks. How will AI develop there in the next few years?

Where do AI models run?

AI developers can best answer this question themselves. The data center market in particular is currently striving to accommodate the expected, but not yet guaranteed, strong increase in demand for generative AI applications.

Training of AI models now takes place primarily in large data centers, which are more similar in design to the supercomputers of the last decade than corporate networks. These data centers require a lot of power, cooling and tens of thousands of GPUs.

But what happens afterwards? Where is the model used after training? Each AI model will require more computing power in the use phase because it is only trained for a short time and then used by millions of people for years. This usually happens distributed on simpler GPUs or CPUs of everyday users.

AI training can therefore be carried out centrally, as model creation takes several months and latency plays a minor role. However, once the model and AI application are available in the real world and used by more and more people, the time required to load and respond can become mission critical.

In order to anticipate the development of AI in time, it is advisable to think about how this technology will affect edge infrastructure. Cloudflare already has Nvidia GPUs in over 100 cities and offers our Workers AI generative AI service. This year we will continue to expand our global network with data centers in more than 300 cities.

AI conclusion

Some experts expect that AI inference can be delegated to the mobile devices of end users working with AI applications. Although the current generation of phones has a lot of processing power, the A17 chip in the iPhone 15 can process 4K videos at up to 60 fps with six GPU cores, but this is still not enough for demanding AI applications.

There are certainly some AI applications that can run on mobile devices, but that remains limited. These devices have significantly lower processor, memory and battery capacity than the data centers and servers of corporate networks.

People like to talk about what their latest phones are capable of with the powerful GPUs, but when you compare those to an Nvidia GPU running on a server, there are still orders of magnitude differences in processing power. Some applications can certainly run on phones and tablets, such as voice recognition for AI systems and Google Assistant. Nevertheless, we expect that due to limited hardware, large and advanced AI models are more suitable for processing at the edge.

Metas Llama 2 is over 100 gigabytes in size, so it’s far too heavy for our mobile devices. If we host this model at the edge and process the AI ​​inference using GPUs there, the bandwidth and performance limitations can be largely eliminated. Combining such an application with the capacity of a mobile device creates a powerful combination.

Where latency matters

Our business model is based on data centers close to users, some of which play an essential role, but most of which are small and ubiquitous. Among other things, you work for cloud providers, telecommunications companies or in regional data centers. Every geographical region is different and every country has its own challenges. This has led to a large, global infrastructure focused on reducing latency.

We are 50 milliseconds away from 95 percent of the world’s population. What can you do with it? Both providing security and distributing content provide added value. A logical complement to this is AI inference, because you really have to think about how latency impacts performance and what we can do to boost (AI) applications.

We will continue to roll out this plan in the near future: Because generative AI is changing so quickly, the application scenarios are also evolving. Some workloads, such as Some processes, such as video and image generation, take time, so reducing latency has limited impact.

Because generative AI changes so quickly, its use cases are constantly evolving.

Tony van den Berge

Users are regularly frustrated with the speed of ChatGPT calls, but that likely has more to do with the speed the model needs to function (in addition to GPU deficiencies) than the physical proximity of users. This application can also benefit from decentralized processing at the edge.

Latency is becoming increasingly important for new generation AI applications. With the virtual assistant Siri, the user wants it to be like a real conversation. However, this requires a significant combination of mobile, cloud and edge capabilities. In the long term, self-driving cars, for example, can also benefit from generative AI to better perceive their environment.

Although today’s autonomous vehicles are adept at image recognition, a large language model could help interpret these images. For example, the car may detect a man or child on the side of the road, but if an LLM does this, it is likely to be better able to understand that a child might suddenly cross into oncoming traffic. Due to extremely low latency, autonomous cars will likely always rely on onboard computing power for their real-time decisions and reactions.

Another generative AI application for the edge is compliance. In some regions, the use of data is heavily regulated, but the mainstream nature of generative AI could lead to much greater government control. Countries may require different versions of models that reflect their own views on freedom of information, privacy, copyright and labor protection

Worker AI

Cloudflare’s Workers AI already offers many AI functionalities, but also has some limitations. It currently supports Meta’s Llama 2 7B and M2m100-1.2, OpenAI’s Whisper, Hugging Face’s Distilbert-sst-2-int8, Microsoft’s Resnet-50, and Baai’s bge-base-en-v1.5. However, in the near future we will add more and more models that use Hugging Face.

Our initial focus was on facilitating the most common use cases. At the same time, we need to find solutions to manage the costs associated with hosting proprietary models and running them in our cloud. Caching is an important consideration: how many places should the same model run and how quickly should it be available in these different places?

There are also customers who want specific applications and we find out how these can best be addressed. The first version of Workers AI is intended to show many people what is possible with business AI applications. Since its launch, our team has continued to work on the next versions.

Sufficient demand for AI

We are already seeing many companies taking their first steps towards generative AI. Some have already built their own chatbots on Cloudflare’s Edge. Our goal is to make it as easy as possible for developers to test and deploy applications. AI is still in its infancy, so many applications are still in beta mode.

In the near future, AI applications will continue to improve and transform into applications that people work with every day, with strict SLAs for uptime and performance. It is important that people tell us what they want to achieve. This feedback will then lead to a transformation comparable to the major technological changes of the past.

The rise and development of AI reminds me of previous turning points in the IT world, such as the rise of the Internet some 40 years ago. This was followed by the rise of mobile devices and the cloud. Now that AI is here, this technology seems to be even bigger and more impactful than all the others combined. It takes all of its advantages and builds on them to bring this new technology to market.

Evolving network

It is still difficult to predict how deep and far-reaching the AI ​​transformation will be. There is also a risk that the bubble will burst and we will have to make every effort to reuse the GPUs for other applications. For example, to make your own network even smarter with AI.

Then of course there is a good chance that this technology will live up to its hype. That soon every company will be working with their own models (or their own version of a model) and that everyone will be regularly speaking to a virtual assistant via voice or even video.

The rapid growth of AI applications may require a change in the scale at which we operate in the short term. Then, much more capacity will be needed in the small or telecom-focused data centers than previously requested, requiring more large-scale data center and edge deployments.

Within Cloudflare there are different versions of what we call Edge. For example, there is an edge located somewhere in a patch closet in a partner’s data center, as opposed to our larger own infrastructure in cities like Amsterdam and New York with large populations.

Cloudflare’s network continues to evolve and change over time, it’s like a living thing. At the same time, we are investing in people who truly understand the hyperscale market so that our teams grow in the ability to drive innovation in this context. The bottom line is that we have already created a solid foundation as a basis for all the AI ​​applications that will come our way.

This is a post from Tony van den Berge, VP European Regional Markets at Cloudflare. Click here for more information about the company.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version