Building an ops foundation for the future of generative AI

May 7, 2024Jeff DeMoss, Will McGrath

As generative artificial intelligence (GenAI) continues to garner the public’s attention, one thing is more apparent than ever–the pace of innovation is going strong. Innovation is not only seen in the revolutionary improvements being made to the GenAI models themselves but also in the supporting technologies that make them relevant to an enterprise. Retrieval augmented generation (RAG) has replaced fine-tuning as the preferred approach to infusing large language models (LLMs) with an organization’s data.

With GenAI changing so rapidly, many organizations feel they must risk placing a major bet with an AI platform vendor like a hyperscaler and go all in on a single primary cloud for their AI platform to navigate GenAI’s unfamiliar waters. Many of these enterprises are looking for a consistent, flexible underlying AI foundation for both GenAI and predictive AI to provide the core capabilities for building and augmenting models, serve them in AI-enabled applications and manage and monitor models. With this approach, enterprises can reduce the risk of vendor lock-in by adopting an AI platform that is flexible enough to run on-prem, on different cloud platforms or at the edge. This enables organizations to pivot and adapt as GenAI evolves.

Before we released Red Hat OpenShift AI as a fully managed cloud service, we had a strong interest in an on-premises version of this offering from our early beta release customers. Today, more than 80% of Red Hat OpenShift AI customers are adopting the self-managed version for on-premises usage. As an add-on to Red Hat OpenShift, the leading application platform that runs on-premises and on all major public clouds–even edge– Red Hat OpenShift AI contains many of the underlying capabilities of Red Hat OpenShift. Treating AI as an extension of your application environment allows users to improve the efficiency of developers and data scientists alike.

Red Hat OpenShift AI functional summary

Let’s summarize some of the capabilities of Red Hat OpenShift AI as a single platform for both GenAI and predictive AI.

Model training - projects

Red Hat OpenShift AI provides several workbench images and the ability to add custom images through an admin user interface. The project user interface (UI) allows users to organize model development files, data connections and other artifacts needed for a given project. The model development files can be created from out-of-the-box or custom workbench images that provide access to popular libraries, packages and tools, including Jupyter notebooks, PyTorch and RStudio. Projects can be shared with specific permissions to enable collaboration with colleagues. Projects also allow users to configure cluster storage for saving project data and provide access to capabilities including pipelines, model serving and monitoring for data scientists, developers and other users that contribute to the AI lifecycle.

Model training - distributed workloads

Distributed model training is a method that leverages multiple cluster nodes simultaneously for faster, more efficient model training. This approach can be used for both predictive AI training and GenAI training and tuning use cases, enabling tasks that might otherwise be computationally infeasible.

The distributed workloads stack built into Red Hat OpenShift AI includes training and validation technologies, tuning and inference technologies. CodeFlare provides a user-friendly framework for the training stack that simplifies job orchestration and monitoring. It’s integrated with technologies like Ray for distributed workloads and Kueue for job scheduling and queuing.

The distributed workloads feature offers seamless integration for optimizing the utilization of nodes with advanced accelerator support. Jobs can be prioritized and distributed, in both interactive and batch. Distributed workloads can also be used from within data science pipelines, to leverage the increased computing capabilities.

GPUs and accelerators

One of the most valuable capabilities of Red Hat OpenShift AI is the self-service nature of GPU access. ITOps personnel can easily pre-define their GPU resource environment, both on-premises and in the cloud, for their data scientists and application developers to easily select configurations for their project tasks. The product supports a range of accelerators, including NVIDIA GPUs, Intel Habana Gaudi devices and AMD GPUs. The accelerator profiles feature enables administrators to configure different types of accelerators that are most appropriate for a workload. Users can select accelerators in Red Hat OpenShift AI from both the model development and serving user interfaces.

Data science pipelines

A pipeline tool tailored to machine learning training and based on KubeFlow pipelines, the data science pipelines component allows data scientists to automate the steps to deliver and test models in development and production. A sample pipeline may be used to gather data, process it, train the model, download the existing model, compare it with the new model and push the new model into DevTest if it performs better. Pipelines can be versioned, tracked and managed like other AI project artifacts. In addition, a visual editor is provided for creating and automating these pipelines through a drag-and-drop interface. Data science pipelines can also run distributed workloads.

Model serving

The model serving UI is integrated directly into the Red Hat OpenShift AI dashboard and projects UI to serve models from providers and frameworks like Hugging Face, ONNX, PyTorch, TensorFlow and others. Users can select a model serving platform based on KServe or ModelMesh, choose from various model servers and runtimes provided with Red Hat OpenShift AI or integrate their custom inference engine or runtimes, such as NVIDIA Triton. Cluster resources, such as the CPUs and GPUs, can be scaled as one’s workload requires. The enhanced model serving stack utilizes open source technologies like KServe, Caikit, vLLM and TGIS to help with serving models.

Model monitoring

The model monitoring UI dashboard helps ops-oriented users monitor operations and performance metrics for model servers and deployed models. Model monitoring visualizations include metrics such as the number of successful and failed inference requests, average inference response time and specific compute utilization. This data can help guide users to take appropriate action such as adding compute resources if the number of requests and average response time are increasing over time.

Technology partner integrations

In addition to previous integrations directly in the Red Hat OpenShift AI product with vendors like Starburst, Anaconda, Pachyderm, NVIDIA and Intel, Red Hat is collaborating with others, including AMD, Elastic, Run:ai and Stability AI, to deliver expanded integrations for a variety of GenAI use cases.

On the hardware platform side, Red Hat has announced optimized support and integrations with Intel® Enterprise AI and NVIDIA NIM microservices on Red Hat OpenShift AI. Dell introduced enhancements to Dell APEX Cloud Platform for Red Hat OpenShift to address AI use cases with Red Hat OpenShift AI. Additionally, Cisco has created a Cisco Validated Design for MLOps on Red Hat OpenShift AI.

Red Hat OpenShift AI is a foundational component in IBM watsonx.ai, providing fundamental AI tooling and services for GenAI workloads. Watsonx.ai offers an enterprise studio for AI builders to deliver GenAI applications with low code/no code requirements, easy-to-use workflows for model development and access to a library of IBM foundation models and curated open source models. Red Hat OpenShift and Red Hat OpenShift AI are embedded technical prerequisites for watsonx.ai software.

Disconnected environments

Because of security and regulatory compliance considerations, many Red Hat OpenShift AI customers require disconnected deployments. Organizations ranging from government to financial services, to healthcare to manufacturing require support for air-gapped installations. Disconnected clusters are typically on a restricted network, often behind a firewall. This makes deployments much more challenging, requiring the ability to support private registries to mirror images.

Edge

One of the biggest tests of an AI platform is the ability to support edge environments. Red Hat OpenShift AI model serving at the edge extends the deployment of AI models to remote locations securely, consistently and at scale. Model serving at the edge helps simplify the process of deploying models to the edge, drive consistency across environments and safeguard the inferencing process at the edge. The capability is only available for single-node Red Hat OpenShift.

Try Red Hat OpenShift AI in your own cluster here, learn more about our patterns, demos and recipes on GenAI and predictive AI here and read more about building an operational foundation for GenAI here.

About the authors

Jeff DeMoss

Senior Product Manager, Red Hat OpenShift AI

Jeff DeMoss is a Senior Manager of Product Management for Red Hat OpenShift AI, a platform for developing, training, serving, and monitoring AI/ML models. Jeff was previously a product leader at SAS Institute for a suite of Software-as-a-Service (SaaS) applications used by organizations to apply analytics capabilities to optimize their marketing activities.

Read full bio

Will McGrath

Senior Principal Product Marketing Manager

Will McGrath is a senior principal product marketing manager for Red Hat’s AI/ML cloud service, database access service, and other cloud data services on Red Hat OpenShift. He has more than 30 years of experience in the IT industry. Before Red Hat, Will worked for 12 years as strategic alliances manager for media and entertainment technology partners.

Read full bio

Browse by channel

Explore all channels

Platform products

Try & buy

Featured

By category

By organization type

By customer

Featured

Topics

Articles

More to explore

For customers

For partners

About us

Open source

Company details

Communities

Recommendations

Select a language

Select a language

Building an ops foundation for the future of generative AI

Red Hat OpenShift AI functional summary

Model training - projects

Model training - distributed workloads

GPUs and accelerators

Data science pipelines

Model serving

Model monitoring

Technology partner integrations

Disconnected environments

Edge

About the authors

Jeff DeMoss

Will McGrath

More like this

Browse by channel

Products

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links