IBM extends serverless computing to GPU workloads for enterprise artificial intelligence and simulation

The challenge of effectively running simulation and high-performance workloads is an ongoing issue that requires input from stakeholders, including infrastructure teams, cybersecurity professionals, and of course, the ever-watchful finance officer.

Running these types of thousands of high-computing jobs often involves concurrent processes and is expensive to run on traditional infrastructure. The latest update to IBM’s Cloud Code Engine—launching GPU-enabled serverless fleets—may reduce complexity. They combine high-performance computing power with a managed, pay-as-you-go serverless model where a single point of reference is handled by the user and the necessary large-scale deployment is done autonomously.

High-performance computing without infrastructure friction

Enterprises running large-scale AI training, risk simulation, or generative workloads observe two issues commonly: limited GPU access and rising infrastructure/cloud costs. Serverless fleets are an alternative. Instead of maintaining dedicated GPU clusters, organizations can dispatch large batches of compute jobs through a single endpoint.

The IBM system provides GPU-supported virtual machines, executes the workload, and withdraws the used resources after completion. This approach improves utilization and cost visibility, IBM says, with customers only paying for the active run.

In practice, this could help financial institutions (for example) with faster risk modeling or allow media companies to leverage their workloads without investing in GPU farms or entering into long-term leases. For many, this means faster innovation and lower operating costs.

Implementation realities

IBM suggests that serverless fleets can manage workloads at scale with “essentially zero SRE staff.” Although this model is ambitious, it definitely simplifies the orchestration details. Code Engine can determine the number of worker instances needed and scale them to match the work required. This reduces the tuning that is usually required to balance parallel GPU tasks.

However, adoption of the platform would require careful oversight with a strong focus on cost—pervasive challenges in serverless environments. Enterprises will need a clear view of their current workload patterns, as well as being aware of any compliance issues, when considering effectively outsourcing GPU-intensive workloads to the managed cloud.

Market and ecosystem context

IBM joins other hyperscalers in adapting serverless platforms for high-performance computing. AWS supports GPU-enabled containers through Fargate with ECS or EKS, and Microsoft Azure offers GPU-enabled containers in its Serverless Container Apps. IBM Cloud Code Engine is different, the company says, supporting web applications, event-driven functions and GPU-intensive batch jobs, all managed from a single environment.

Powerful with you

For CIOs and Cloud Directory, IBM’s serverless fleets represent a step toward the cloud’s promised elasticity and ability to handle high-performance computing. The model could at least lower the barriers to entry for GPU-intensive workloads, especially for teams without readily available DevOps. However, leaders might consider some or all of the following before adopting:

  • What are the comparative costs of on-demand vs. models with reserved capacity?
  • Is data management and security a critical issue?
  • Are there cost tracking methods that can keep track of managed jobs?
  • The scalability and predictability of example jobs can be pilot tested.
  • Is IBM’s offering better/cheaper/worse/more expensive than similar solutions from other hyperscalers?
  • Are the workloads suitable for own operation and what might the OPEX be in the long term of this choice?

Serverless GPU computing is still evolving, but IBM’s approach offers enterprises another way to explore large-scale AI and simulation workloads without unnecessary infrastructure costs.

(Image source: “Buddha Said He Wanted to Talk to Me” by Trey Ratcliff is licensed under CC BY-NC-SA 2.0.)

Want to learn more about Cloud Computing from industry leaders? Check out the Cyber ​​​​Security & Cloud Expo in Amsterdam, California and London. The comprehensive event is part of TechEx and is co-located with other leading technology events. Click here for more information.

CloudTech News is powered by TechForge Media. Explore other upcoming business technology events and webinars here.

Leave a Comment