Kamiwaza Private AI Inference Mesh and Data Engine

Event: AI Field Day 4

Appearance: Kamiwaza Presents with Intel at AI Field Day 4

Company: Kamiwaza.AI

Video Links:

Personnel: Luke Norris, Matt Wallace

Luke Norris and Matt Wallace from Kamiwaza presented their mission to help enterprises achieve a trillion inferences a day, which they believe is key to the fifth industrial revolution. They discussed the massive scale at which they aim to operate, targeting Fortune 500 and Global 2000 companies, and addressing real-world use cases and problems. They explained the origin of the company name Kamiwaza, which means “superhuman” and reflects their goal to bring superhuman capabilities to enterprises.

The presenters focused on the importance of inferencing at scale, rather than model training, as they believe the latter is better left to very few experts employed by major tech companies. They emphasized that enterprises will use multiple foundational models and will need to manage these models effectively for various tasks. Kamiwaza aims to provide a full-stack generative AI solution that addresses the core problems of scale for an enterprise.

They introduced two key features of their solution: the Inference Mesh and the Distributed Data Engine. These features allow AI deployment anywhere, including on-premises, cloud, core, and edge, and work on a variety of hardware. They explained that the Inference Mesh and Distributed Data Engine work together to route inference requests efficiently, even when the data is in different locations. This hybrid approach is designed to enable massive scale data processing with LLMs (Large Language Models).


Developer and Operational Productivity in Google Cloud with Duet AI

Event: AI Field Day 4

Appearance: Google Cloud Presents Cloud Inferencing with Intel at AI Field Day 4

Company: Google Cloud

Video Links:

Personnel: Ameer Abbas

In this session, we’ll demonstrate how Duet AI enhances developer and operational productivity. We’ll explore how Google’s state of the art AI is applied to address real- world development and operations challenges. Topics include context-aware code completion, licensing compliance assistance, code explanation, test generation, operational troubleshooting and more. We’ll share customer successes and insights from within Google that inform continuous improvement of AI productivity tools.

Ameer Abbas, Senior Product Manager at Google Cloud, provides a demonstration of Duet AI and its application in enhancing developer and operational productivity. He explains how Google’s state-of-the-art AI is applied to real-world development and operations challenges, emphasizing its role in assisting with context-aware code completion, licensing compliance, code explanation, test generation, operational troubleshooting, and more.

Ameer highlights the division of Google’s AI solutions for consumers and enterprises, mentioning products like Gemini (formerly Bard), Maker Suite, Palm API, Workspace, and Vertex AI. Vertex AI is a platform for expert practitioners to build, extend, tune, and serve their own machine learning models, while Duet AI offers ready-to-consume solutions built on top of foundational models.

He discusses the importance of modern applications that are dynamic, scalable, performant, and intelligent, and how they contribute to business outcomes. Ameer references the DevOps Research and Assessment (DORA) community and its focus on key metrics like lead time for changes, deployment frequency, failure rate, and recovery time for incidents.

The presentation includes a live demo where Ameer uses Duet AI within the Google Cloud Console and an Integrated Development Environment (IDE) to perform various tasks such as generating API specs, creating a Python Flask app, and troubleshooting errors. He demonstrates how Duet AI can understand and generate code based on prompts, interact with existing code files, and provide explanations and suggestions for code improvements. Ameer also shows how Duet AI can assist with generating unit tests, documentation, and fixing errors, and he touches on its capability to learn from user interactions for future improvements.

The demo showcases how Duet AI can be integrated into different stages of the software development lifecycle and how it can be a valuable tool for developers and operators in the cloud environment. Ameer concludes by mentioning future features like the ability to have Duet AI perform actions on the user’s behalf and the incorporation of agents for proactive assistance.


Google Cloud AI Platforms and Infrastructure

Event: AI Field Day 4

Appearance: Google Cloud Presents Cloud Inferencing with Intel at AI Field Day 4

Company: Google Cloud

Video Links:

Personnel: Brandon Royal

In this session, we’ll explore how Vertex AI, Google Kubernetes Engine (GKE) and Google Cloud’s AI Infrastructure provide a robust platform for AI development, training and inference. We’ll discuss hardware choices for inference (CPUs, GPUs, TPUs), showcasing real-world examples. We’ll cover distributed training and inference with GPUs/TPUs and optimizing AI performance on GKE using tools like autoscaling and dynamic workload scheduling.

Brandon Royal, product manager at Google Cloud, discusses the use of Google Cloud’s AI infrastructure for deploying AI on Google’s infrastructure. The session focuses on how Google Cloud is applying AI to solve customer problems and the trends in AI, particularly the platform shift towards generative AI. Brandon discusses the AI infrastructure designed for generative AI, covering topics such as inference, serving, training, fine-tuning, and how these are applied in Google Cloud.

Brandon explains the evolution of AI models, particularly open models, and their importance for flexibility in deployment and optimization. He highlights that many AI startups and unicorns choose Google Cloud for their AI infrastructure and platforms. He also introduces Gemma, a new open model released by Google DeepMind, which is lightweight, state-of-the-art, and built on the same technology as Google’s Gemini model. Gemma is available with open weights on platforms like Hugging Face and Kaggle.

The session then shifts to a discussion about AI platforms and infrastructure, with a focus on Kubernetes and Google Kubernetes Engine (GKE) as the foundation for open models. Brandon emphasizes the importance of flexibility, performance, and efficiency in AI workloads and how Google provides a managed experience with GKE Autopilot.

He also touches on the hardware choices for inference, including CPUs, GPUs, and TPUs, and how Google Cloud offers the largest selection of AI accelerators in the market. Brandon shares customer stories, such as Palo Alto Networks’ use of CPUs for deep learning models in threat detection systems. He also discusses the deployment of models on GKE, including autoscaling and dynamic workload scheduling.

Finally, Brandon provides a live demo of deploying the Gemma model on GKE, showcasing how to use the model for generating responses and how it can be augmented with retrieval-augmented generation for more grounded responses. He also demonstrates the use of Gradio, a chat-based interface for interacting with models, and discusses the scaling and management of AI workloads on Google Cloud.


AI without GPUs: Using Intel AMX CPUs on VMware vSphere with Tanzu Kubernetes

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents Private AI with Intel at AI Field Day 4

Company: VMware by Broadcom

Video Links:

Personnel: Earl Ruby

Looking to deploy AI models using your existing data center investments? VMware and Intel have collaborated to announce VMware Private AI with Intel. VMware Private AI with Intel will help enterprises build and deploy private and secure AI models running on VMware Cloud Foundation and boost AI performance by harnessing Intel’s AI software suite and 4th Generation Intel® Xeon® Scalable Processors with built-in accelerators. In this session we’ll explain how to set up Tanzu Kubernetes to run AI/ML workloads that utilize AMX CPUs.

Earl Ruby, R&D engineer at VMware by Broadcom, presented deployment of AI models without GPUs, focusing on the use of Intel AMX CPUs with Tanzu Kubernetes on vSphere. He discussed the benefits of AMX, an AI accelerator built into Intel’s Sapphire Rapids and Emerald Rapids Xeon CPUs, which can run AI workloads without separate GPU accelerators. vSphere 8 supports AMX, and many ML frameworks are already optimized for Intel CPUs.

He demonstrated video processing with OpenVINO on vSphere 8, showing real-time processing with high frame rates on a VM with limited resources and no GPUs. This demonstration highlighted the power of AMX and OpenVINO’s model compression, which reduces memory and compute requirements.

For deploying AMX-powered workloads on Kubernetes, Earl explained that Tanzu is VMware’s Kubernetes distribution optimized for vSphere, with lifecycle management tools, storage, networking, and high availability features. He detailed the requirements for making AMX work on vSphere, including using hardware with Sapphire Rapids or Emerald Rapids CPUs, running the Linux kernel 5.16 or later, and using hardware version 20 for virtualizing AMX instructions.

Earl provided a guide for setting up Tanzu to use AMX, including adding a content library with the correct Tanzu Kubernetes releases (TKRs) and creating a new VM class. He showed how to create a cluster definition file for Tanzu Kubernetes clusters that specifies the use of the HWE kernel TKR and the AMX VM class for worker nodes.

Finally, he presented performance results of the Llama 2 7 billion LLM inference running on a single fourth-gen Xeon CPU, demonstrating that it could deliver inference with an average latency under 100 milliseconds, which is suitable for chatbot response times.


AI without GPUs: Using Intel AMX CPUs on VMware vSphere for LLMs

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents Private AI with Intel at AI Field Day 4

Company: VMware by Broadcom

Video Links:

Personnel: Earl Ruby

Looking to deploy AI models using your existing data center investments? VMware and Intel have collaborated to announce VMware Private AI with Intel. VMware Private AI with Intel will help enterprises build and deploy private and secure AI models running on VMware Cloud Foundation and boost AI performance by harnessing Intel’s AI software suite and 4th Generation Intel® Xeon® Scalable Processors with built-in accelerators. In this session we’ll explain the technology behind AMX CPUs and demonstrate LLMs running on AMX CPUs.

Earl Ruby, R&D Engineer at VMware by Broadcom, discusses leveraging AI without the need for GPUs, focusing on using CPUs for AI workloads. He talks about VMware’s collaboration with Intel on VMware Private AI with Intel, which enables enterprises to build and deploy private AI models on-premises using VMware Cloud Foundation and Intel’s AI software suite along with the 4th Generation Intel Xeon Scalable Processors with built-in accelerators.

Ruby highlights the benefits of Private AI, including data privacy, intellectual property protection, and the use of established security tools in a vSphere environment. He explains the technology behind Intel’s Advanced Matrix Extensions (AMX) CPUs and how they can accelerate AI/ML workloads without the need for separate GPU accelerators. AMX CPUs are integrated into the core of Intel’s Sapphire Rapids and Emerald Rapids servers, allowing for the execution of AI and non-AI workloads in a virtualized environment. Ruby demonstrates the performance of Large Language Models (LLMs) running on AMX CPUs compared to older CPUs without AMX, showing a significant improvement in speed and efficiency.

He also discusses the operational considerations when choosing between CPU and GPU for AI workloads, emphasizing that CPUs should be used when performance is sufficient and cost or power consumption are concerns, while GPUs should be used for high-performance needs, especially when low latency or frequent fine-tuning of large models is required.


Deploy AI Everywhere on Intel Xeon CPUs

Event: AI Field Day 4

Appearance: Intel Presents at AI Field Day 4

Company: Intel

Video Links:

Personnel: Ronak Shah

There’s a major AI hype cycle today, but what do businesses actually need? Today’s enterprises typically benefit from AI as a general-purpose, mixed workload instead of a purely dedicated one. Intel AI Product Director Ro Shah contextualizes the time and place for inferencing, nimble vs giant AI models, hardware and software options – all with TCO in mind. He leads into customer and partner examples to ground this in reality and avoid the FOMO.

Ro Shah, AI Product Director at Intel, discusses the deployment of AI, particularly focusing on inferencing, on Intel Xeon CPUs. He explains that while deep learning training often requires accelerators, deployment can be effectively handled by a mix of CPUs and accelerators. Shah emphasizes that CPUs are a good fit for mixed general-purpose and AI workloads, offering ease of deployment and total cost of ownership (TCO) benefits.

Shah describes a customer usage model where AI deployment bifurcates into two scenarios: large-scale dedicated AI cycles, which may require accelerators, and mixed workloads with general-purpose and AI cycles, where CPUs are advantageous. He provides a threshold for model size, suggesting CPUs for models with less than 20 billion parameters, and accelerators for anything larger. Using customer examples, Shah illustrates the advantages of deploying AI on CPUs for mixed workloads, such as video conferencing with added AI features like real-time transcription and speech translation. He also touches on the capabilities of Intel CPUs in client-side applications and the potential for on-premises deployment for enterprise customers.

Shah moves on to discuss generative AI and the use of large language models, noting that CPUs can meet latency requirements up to about 20 billion parameters. He shows performance data for specific models, highlighting the importance of next-token latency in determining whether a CPU or an accelerator is appropriate for a given task.

Regarding software, Shah stresses the importance of upstreaming optimizations to standard tools like PyTorch and TensorFlow, and mentions Intel-specific tools like OpenVINO and Intel Neural Compressor for performance improvements. He also covers the ease of transitioning between Xeon generations and how Intel’s broad ecosystem presence allows for AI deployment everywhere.


Insights into AI from Futurum Intelligence

Event: AI Field Day 4

Appearance: Insights into AI from Futurum Intelligence

Company: The Futurum Group

Video Links:

Personnel: Stephen Foskett

Stephen Foskett discusses the Futurum Group’s Intelligence platform, which is focused on workplace intelligence, customer experience, and AI. Foskett demonstrates the Intelligence platform, which is based on surveys of IT decision-makers and is updated every six months. The AI market data presented is collected by Keith Kirkpatrick and includes information about AI platform usage, vendor partnerships, and plans for changing or adding vendors. The data is global and includes a variety of industries, not just IT companies.

The platform allows users to access detailed data, including the actual survey questions and demographic information of respondents. It also shows key findings, such as the percentage of people using specific vendors for SaaS products and end-to-end AI solutions. Users can filter data by industry, region, and other criteria.

Foskett also previews upcoming data on AI chipsets and DevOps tools. He emphasizes the usefulness of the data for industry professionals, including product marketers, managers, and analysts, who need to make informed decisions based on market trends and competition.


Discussing the VAST Data Solution for AI

Event: AI Field Day 4

Appearance: VAST Data Presents at AI Field Day 4

Company: VAST Data

Video Links:

Personnel: John Mao, Keith Townsend, Neeloy Bhattacharyya

In this discussion, Keith Townsend interviewed John Mao and Neeloy Bhattacharyya from VAST Data. They discuss the company’s recent growth, including closing a funding round that values the company at $9.1 billion, due in part to significant sales and growth in the data storage market. To start, Keith asks about a recent report that VAST Data has achieved a 6% share of the data center flash storage market, which would be notable for an independent software data platform company.

The conversation shifts to VAST Data’s role in AI, noting that about 60% of their business is used for AI and high-performance computing (HPC) workloads. VAST Data has been involved in AI before it became a trending topic and has been working with large customers and AI service providers. The discussion then moves on to the unique aspects of the VAST platform that make it suitable for AI workloads. They talk about the company’s vision and strategy, which extends beyond traditional storage to include capabilities that address the entire AI data pipeline. VAST Data’s global namespace, which is right-consistent and allows for distributed inference and model serving, is a key feature that facilitates the AI pipeline by providing a common dataset for different locations to access and work from.

They also discuss VAST’s multi-protocol platform and its disaggregated shared-everything architecture, which allows for intelligent data movement based on actual workload needs rather than pre-staging data. Keith asks about how VAST helps with the data gravity problem and the challenges of moving compute closer to the data. Neeloy explains that VAST’s architecture, including its highly scalable metadata layer, allows for a better understanding of data access patterns and more intelligent pre-staging of data.

Finally, they touch upon VAST’s DataBase, which helps with the data preparation phase by assigning structure to unstructured data and accelerating ETL tools and query engines. This reduces the time necessary for data preparation, which is a significant part of the AI project lifecycle.


Optimized Storage from Supermicro and Solidigm to Accelerate Your Al Data Pipeline

Event: AI Field Day 4

Appearance: Solidigm Presents at AI Field Day 4

Company: Solidigm, Supermicro

Video Links:

Personnel: Paul McLeod, Wendell Wenjen

Wendell Wenjen and Paul McLeod from Supermicro discuss challenges and solutions for AI and machine learning data storage. Supermicro is a company that provides servers, storage, GPU-accelerated servers, and networking solutions, with a significant portion of their revenue being AI-related.

They highlighted the challenges in AI operations and machine learning operations, specifically around data management, which includes collecting data, transforming it, and feeding it into GPU clusters for training and inference. They also emphasized the need for a large capacity of storage to handle the various phases of the AI data pipeline.

Supermicro has a wide range of products designed to cater to each stage of the AI data pipeline, from data ingestion, which requires a large data lake, to the training phase, which requires retaining large amounts of data for model development and validation. They also discussed the importance of efficient data storage solutions and introduced the concept of an “IO Blender effect,” where multiple data pipelines run concurrently, creating a mix of different IO profiles.

Supermicro delved deeper into the storage solutions, highlighting their partnership with WEKA, a software-defined storage company, and how their architecture is optimized for AI workloads. They explained the importance of NVMe flash storage, which can outpace processors, and the challenges of scaling such storage solutions. They also discussed Supermicro’s extensive portfolio of storage servers, ranging from multi-node systems to petascale architectures, designed to accommodate different customer needs.

Supermicro’s approach to storage for AI includes a two-tiered solution with flash storage for high performance and disk-based storage for high capacity at a lower cost. They also touched on the role of GPU direct storage in reducing latency and the flexibility of their software-defined storage solutions.

The presentation concluded with an overview of Supermicro’s product offerings for different AI and machine learning workloads, from edge devices to large data center storage solutions.


Why Storage Matters for AI with Solidigm

Event: AI Field Day 4

Appearance: Solidigm Presents at AI Field Day 4

Company: Solidigm

Video Links:

Personnel: Ace Stryker, Alan Bumgarner

In this presentation, Ace Stryker and Alan Bumgarner of Solidigm discuss the importance of storage in AI workloads. They explain that as AI models and datasets grow, efficient and high-performance storage becomes increasingly critical. They introduce their company, Solidigm, which emerged from SK Hynix’s acquisition of Intel’s storage group, and they offer a range of SSD products suitable for AI applications.

The discussion covers several key points:

  1. The growing AI market and the shift from centralized to distributed compute and storage, including the edge.
  2. The dominance of hard drives for AI data and the opportunity for transitioning to flash storage.
  3. The role of storage in AI workflows, including data ingestion, preparation, training, and inference.
  4. The Total Cost of Ownership (TCO) benefits of SSDs over hard drives, considering factors like power consumption, space, and cooling.
  5. The Solidigm product portfolio, emphasizing different SSDs for various AI tasks, and the importance of choosing the right storage based on workload demands.
  6. A customer case study from Kingsoft in China, which saw a significant reduction in data processing time by moving to an all-flash array.
  7. The future potential of AI and the importance of SSDs in enabling efficient AI computing.

The session also includes questions from the Field Day delegates covering technical aspects of Solidigm storage products, such as the role of their cloud storage acceleration layer (CSAL), and discuss the importance of consulting with customers to understand their specific AI workload requirements for optimal storage solutions.


Real-World Use of Private AI at VMware by Broadcom

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware by Broadcom

Video Links:

Personnel: Ramesh Radhakrishnan

This session offers a deep dive into VMware’s internal AI services used by VMware employees, including our services for coding-assist, document search using retrieval augmented generation (RAG), and our internal LLM API.

In this presentation, Ramesh Radhakrishnan of VMware discusses the company’s internal use of AI, particularly large language models (LLMs), for various applications. He leads the AI Platform and Solutions team and shares insights into VMware’s AI services, which were developed even before the advent of LLMs.

Large language models (LLMs) are versatile tools that can address a wide range of use cases with minimal modification. VMware has developed internal AI services for coding assistance, document search using Retrieval-Augmented Generation (RAG), and an internal LLM API. Content generation, question answering, code generation, and the use of AI agents are some of the key use cases for LLMs at VMware.

VMware has implemented a Cloud Smart approach, leveraging open-source LLMs trained on the public cloud to avoid the environmental impact of running their own GPUs. The company has worked with Stanford to create a domain-adapted model for VMware documentation search, which significantly improved search performance compared to traditional keyword search.

The VMware Automated Question Answering System (Wacqua) is an information retrieval system based on language models, which allows users to ask questions and get relevant answers without browsing through documents. The system’s implementation involves complex processes, including content gathering, preprocessing, indexing, caching, and updating documentation.

VMware has scaled up its GPU capacity to accommodate the increased demand from software developers empowered by AI tools. The AI platform at VMware provides a GPU pool resource, developer environments, coding use cases, and LLM APIs, all running on a common platform.

Data management is highlighted as a potential bottleneck for AI use cases, and standardizing on a platform is critical for offering services to end-users efficiently. Collaboration between AI teams and infrastructure teams is essential to ensure that both the models and the infrastructure can support the workload effectively.

Ramesh encourages organizations to start small with open-source models, identify key performance indicators (KPIs), and focus on solving business problems with AI. The session concludes with Ramesh emphasizing the importance of a strategic approach to implementing AI and the benefits of leveraging a shared platform for AI services.


Running Best-of-Breed AI Services on a Common Platform with VMware Cloud Foundation

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware by Broadcom

Video Links:

Personnel: Shawn Kelly

VMware Cloud Foundation streamlines AI production, providing enterprises with unmatched flexibility, control, and choice. Through Private AI, businesses can seamlessly deploy best-in-class AI services across various environments, while ensuring privacy and security. Join us to explore VMware’s collaborations with IBM watsonx, Intel AI, and AnyScale Ray, delivering cutting-edge AI capabilities on top of VMware’s Private Cloud platform.

Sean Kelly, Principal Engineer at Broadcom, discusses the benefits of using VMware Cloud Foundation (VCF) to run AI services. He explains that VCF solves many of the infrastructure challenges associated with AI projects, such as agility, workload migration, avoiding idle compute resources, scaling, lifecycle management, privacy, and security.

He addresses concerns about VCF not being the only platform for AI, noting that while other products like vSphere and vSAN are still in use, VCF is the strategic direction for VMware, particularly for their strategic customers. He clarifies that VCF includes underlying vSphere technology and that using VCF inherently involves using vSAN.

Kelly also talks about performance, mentioning that VMware’s hypervisor scheduler has been optimized over two decades to match bare-metal speeds, with only a plus or minus 2% performance difference in AI workloads. He confirms that VMware supports NVIDIA’s NVLink, which allows multiple GPUs to connect directly to each other.

The talk then moves on to VMware’s Private AI, which is an architectural approach that balances business AI benefits with privacy and compliance needs. Kelly highlights collaborations with AnyScale Ray, an open-source framework for scaling Python AI workloads, and IBM Watson X, which brings IBM Watson capabilities on-premises for customers with specific data compliance requirements.

He covers the integration of Ray with vSphere, demonstrating how it can quickly spin up worker nodes (RayLits) for AI tasks. He also addresses licensing concerns, noting that while NVIDIA handles GPU licensing, Ray is an open-source plugin without additional licensing costs.

For IBM WatsonX, Kelly discusses the stack setup with VMware Cloud Foundation at the base, followed by OpenShift and WatsonX on top. He emphasizes security features, such as secure boot, identity and access management, and VM encryption. He also mentions the choice of proprietary, open-source, and third-party AI models available on the platform. Kelly briefly touches on use cases enabled by WatsonX, such as code generation, contact center resolution, IT operations automation, and advanced information retrieval. He concludes by directing listeners to a blog for more information on Private AI with IBM WatsonX.


VMware Private AI Foundation with NVIDIA Demo

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware by Broadcom

Video Links:

Personnel: Justin Murray

This VMware Private AI Foundation with NVIDIA demo works with the data scientist user as well as the VMware system administrator/devops person. A data scientist can reproduce their LLM environment rapidly on VMware Cloud Foundation (VCF). This is done through a self-service portal or through assistance from a VCF system administrator. We show that a VCF administrator can serve the data scientist with a set of VMs, created in a newly automated way from deep learning VM images, with all the deep learning tooling and platforms already active in them. We show a small LLM example application running on this setup to give the data scientist a head-start on their work.

In this presentation, Justin Murray, product marketing engineer from Broadcom, demonstrates VMware Private AI Foundation with NVIDIA technology. The demo is structured to show how the end user, particularly a data scientist, can benefit from the solution. Key points from the transcript include:

  1. Application Demonstration: Justin begins by showcasing a chatbot application powered by a large language model (LLM) which utilizes retrieval-augmented generation (RAG). The bot is demonstrated to answer questions more accurately after updating its knowledge base.
  2. Deep Learning VMs: The demo highlights the use of virtual machines (VMs) that come pre-loaded with deep learning toolkits, which are essential for data scientists. These VMs can be rapidly provisioned using ARIA automation, and they can be customized with specific tool bundles as per the data scientist’s requirements.
  3. Containers and VMs: Justin explains the solution uses a combination of containers and VMs, with NVIDIA components shipped as containers that can be run using Docker or integrated into Kubernetes clusters.
  4. Private AI Foundation Availability: The Private AI Foundation with NVIDIA is mentioned to be an upcoming product that will be available for purchase in the current quarter, with some customers already having early access to the beta version.
  5. Automation and User Interface: The ARIA automation tool is showcased, which allows data scientists or DevOps personnel to request resources through a simple interface, choosing the amount of GPU power they require.
  6. GPU Visibility: The demo concludes with a look at GPU visibility, showing how vCenter can be used to monitor GPU consumption at both the host and VM level, which is important for managing resources in LLM operations.
  7. Customer Use and Power Consumption: Justin notes that there’s interest in both dedicated VMs for data scientists and shared infrastructure like Kubernetes. He also acknowledges the importance of power consumption as a concern for those using GPUs.

VMware Private AI Foundation with NVIDIA aims to simplify the deployment and management of AI applications and infrastructure for data scientists, offering a combination of automation, privacy, and performance monitoring tools.


VMware Private AI Foundation with NVIDIA Overview

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware by Broadcom

Video Links:

Personnel: Justin Murray

VMware Private AI Foundation with NVIDIA is a fully integrated solution featuring generative AI software and accelerated computing from NVIDIA, built on VMware Cloud Foundation and optimized for AI. The solution includes integrated AI tools to empower enterprises to customize models and run generative AI applications adjacent to their data while addressing corporate data privacy, security and control concerns. The platform will feature NVIDIA NeMo, which combines customization frameworks, guardrail toolkits, data curation tools and pretrained models to offer enterprises an easy, cost-effective and fast way to adopt generative AI.

In this presentation, Justin Murray, Product Marketing Engineer at VMware by Broadcom, discusses the VMware Private AI Foundation with NVIDIA, which is a solution designed to run generative AI applications with a focus on privacy, security, and control for enterprises. The platform is built on VMware Cloud Foundation and optimized for AI, featuring NVIDIA NeMo for customization and generative AI model deployment.

Murray explains the architecture of the solution, which includes a self-service catalog for data scientists to easily access their tools, GPU monitoring in the vCenter interface, and deep learning VMs pre-packaged with data science toolkits. He emphasizes the importance of vector databases, particularly PG vector, which is central to retrieval-augmented generation (RAG). RAG combines database technology with large language models to provide up-to-date and private responses to queries.

He also touches on the GPU operator and Triton inference server from NVIDIA for managing GPU drivers and scalable model inference. Murray notes that the solution is designed to be user-friendly for data scientists and administrators serving them, with a focus on simplifying the deployment and management of AI applications.

Murray mentions that the solution is compatible with various vector databases and is capable of being used with private data, making it suitable for industries like banking. He also indicates that there is substantial demand for this architecture across different industries, with over 60 customers globally interested in it before the product’s general availability.

The presentation aims to provide technical details about the VMware Private AI Foundation with NVIDIA, including its components, use cases, and the benefits it offers to enterprises looking to adopt generative AI while maintaining control over their data.


Introduction to VMware Private AI

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware by Broadcom

Video Links:

Personnel: Chris Wolf

VMware Private AI brings compute capacity and AI models to where enterprise data is created, processed, and consumed, whether that is in a public cloud, enterprise data center, or at the edge. VMware Private AI consists of both product offerings (VMware Private AI Foundation with NVIDIA) and a VMware Private AI Reference Architecture for Open Source to help customers achieve their desired AI outcomes by supporting best-in-class open source software (OSS) technologies today and in the future. VMware’s interconnected and open ecosystem supports flexibility and choice in customers’ AI strategies.

Chris Wolf, the Global Head of AI and Advanced Services at VMware by Broadcom, discusses VMware’s Private AI initiative, which was announced in August 2023. The goal of Private AI is to democratize general AI and ignite business innovation across all enterprises while addressing privacy and control concerns. VMware focuses on providing AI infrastructure, optimizations, security, data privacy, and data serving, leaving higher-level AI services to AI ISVs (Independent Software Vendors). This non-competitive approach makes it easier for VMware to partner with ISVs since VMware does not directly compete with them in offering top-level AI services, unlike public clouds.

Wolf shares an example of VMware’s code generation use case with a 92% acceptance rate by software engineers using an internal solution based on an open-source model for the ESXi kernel. He discusses the importance of governance and compliance, particularly in AI-generated code, and mentions VMware’s AI council and governance practices.

He highlights use cases such as call center resolution and advanced information retrieval across various industries. VMware’s solution emphasizes flexibility, choice of hardware and software, simplifying deployment, and mitigating risks. Wolf also notes VMware’s capability to stand up an AI cluster with preloaded models in about three seconds, which is not possible in public clouds or on bare metal.

The discussion covers the advantages of VMware Private AI in managing multiple AI projects within large enterprises, including efficient resource utilization and integration with existing operational tools, leading to lower total cost of ownership.

Wolf touches on the trend of AI adoption at the edge, the importance of security features within VMware’s stack, and the curated ecosystem of partners that VMware is building. He points out that VMware’s Private AI solution can leverage existing IT investments by bringing AI models to where the data already resides, such as on VMware Cloud Foundation (VCF).

Finally, Wolf previews upcoming Tech Field Day sessions that will go into detail about VMware’s collaborations with NVIDIA, Intel, and IBM, showcasing solutions like Private AI Foundation with NVIDIA and WatsonX SaaS service deployment on-premises. He encourages attendees to participate in these sessions to learn more about VMware’s AI offerings.


VMware by Broadcom Private AI Primer – An Emerging Category

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware by Broadcom

Video Links:

Personnel: Chris Wolf

Private AI as an architectural approach that aims to balance the business gains from AI with the practical privacy and compliance needs of the organization. What is most important is that privacy and control requirements are satisfied, regardless of where AI models and data are deployed. This session will walk through the core tenets of Private AI and the common use cases that it addresses.

Chris Wolf, Global Head of AI and Advanced Services at VMware by Broadcom, discusses the evolution of application innovation, highlighting the shift from PC applications to business productivity tools, web applications, and mobile apps, and now the rise of AI applications. He emphasizes that AI is not new, with its use in specialized models for fraud detection being a longstanding practice. Chris notes that financial services with existing AI expertise have quickly adapted to generative AI with large language models, and he cites a range of industry use cases, such as VMware’s use of SaaS-based AI services for marketing content creation.

He mentions McKinsey’s projection of the annual potential economic value for generative AI being around $4.4 trillion, indicating a significant opportunity for industry transformation. Chris discusses the early adoption of AI in various regions, particularly in Japan, where the government invests in AI to compensate for a shrinking population and maintain global competitiveness.

The conversation shifts to privacy concerns in AI, with Chris explaining the concept of Private AI, which is about maintaining business gains from AI while ensuring privacy and compliance needs. He discusses the importance of data sovereignty, control, and not wanting to inadvertently benefit competitors with shared AI services. Chris also highlights the need for access control to prevent unauthorized access to sensitive information through AI models.

He then outlines the importance of choice, cost, performance, and compliance in the AI ecosystem, asserting that organizations should not be locked into a single vertical AI stack. Chris also describes the potential for fine-tuning language models with domain-specific data and the use of technologies like retrieval augmented generation (RAG) for simplifying AI use cases.

Finally, Chris emphasizes the need for adaptability in AI solutions and mentions VMware’s focus on adding value to the ecosystem through partnerships. He briefly touches on technical implementation, including leveraging virtualization support for GPU resources and partnering with companies like IBM Watson for model serving and management. He concludes by providing resources for further information on VMware’s AI initiatives.


Dell Technologies APEX Cloud Platform Cluster Expansion

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

Michael Wells, a Tech Marketing Engineer at Dell Technologies, presents a demonstration on scalability and cluster expansion using the APEX Cloud Platform, specifically focusing on adding worker nodes to an OpenShift cluster. The process involves searching for new nodes, running compatibility checks to ensure they match the existing cluster, and then configuring settings such as the node name, IP address, TPM passphrase, location information, NIC settings, and network settings. The system pre-populates certain values like VLAN IDs based on the existing setup and then validates the configuration before adding the node to the cluster.

He highlights how the APEX Cloud Platform integrates infrastructure management directly into the cloud OS experience, offering a unique solution for different cloud operating models. He also discusses the advantages of installing Red Hat OpenShift on bare metal, which includes better performance due to the absence of a hypervisor, reduced licensing requirements, and a smaller attack surface. Additionally, he explains the benefits of lifecycle management of both OpenShift and hardware together, simplifying the deployment process and providing developers with more direct access to hardware resources.

Wells also touches on the topic of OpenShift virtualization, explaining that running virtual machines inside of OpenShift as pods allows for pod-to-pod networking and avoids the need for routing traffic through an ingress controller. This setup can be more efficient for workloads that need to communicate with other OpenShift services.


Dell Technologies APEX Cloud Platform Lifecycle Management

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

Michael Wells, a Tech Marketing Engineer for the APEX Cloud Platform at Dell Technologies, demonstrates the lifecycle management process for updating Red Hat OpenShift and Azure clusters on the platform. The process involves:

  1. Configuring support portal access with a username and password to check for online updates from the Dell support site.
  2. Using a local update process when no online updates are available by uploading and decompressing an update bundle.
  3. Running pre-checks to ensure the cluster is healthy and in a suitable state for updating.
  4. Reviewing the update details, including versions of software to be updated.
  5. Executing the update, which includes hardware (BIOS, firmware, drivers), OpenShift software, core OS, CSI, and Apex Cloud Platform Foundation software, all in a single workflow to optimize efficiency and minimize reboots.
  6. Applying updates to Azure clusters in a similar fashion, including compliance checks and cluster health pre-checks.
  7. Temporarily disabling lockdown mode on servers during the update process and re-enabling it afterward.
  8. Performing a rolling update across nodes, with each node being updated one at a time in a non-disruptive manner.

The update process is designed to be efficient, reducing downtime by controlling the sequence of updates and using parallel staging where possible. The system provides detailed progress information and time estimates throughout the process.


Dell Technologies APEX Cloud Platform Management Experience

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

In this presentation, Michael Wells, Tech Marketing Engineer at Dell Technologies, discusses the management experience of the APEX Cloud Platform. He highlights the platform’s ability to provide a consistent hybrid management experience across different environments without requiring users to leave their usual management interfaces.

Wells demonstrates the integration of Dell APEX Cloud Platform within the OpenShift web console, showing how users can view node information, cluster status, CPU and memory usage, and manage hardware components directly from the console. He mentions that the platform is set to support hosted control planes (formerly HyperShift) and discusses the ability to expand or remove worker nodes within the cluster.

He also covers the platform’s update mechanism, security features (including certificate management), and support capabilities, such as dial-home alerts and integration with Cloud IQ for hardware-related issues. Additionally, Wells touches on how hardware alerts are integrated into OpenShift alerting, allowing users to leverage existing monitoring and notification setups.

Wells then shifts to discussing the Azure side of things, showing similar capabilities within the Windows Admin Center for Azure Stack HCI, including physical views of nodes, detailed component information, and compliance checks.

Finally, he emphasizes the consistency of the Dell APEX Cloud Platform across different cloud operating systems and how it integrates infrastructure management with cluster management tools used by administrators. He notes the upcoming VMware integration and the ability to lock infrastructure settings for security.


Dell Technologies APEX Cloud Platform Hardware Configurations

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

Michael Wells, Tech Marketing Engineer for Dell Technologies, discusses the hardware configurations for the APEX Cloud Platform.

  • The APEX Cloud Platform uses specialized configurations of PowerEdge servers called MC nodes, specifically the MC660 (1U 10 drive) and MC760 (2U 24 drive).
  • The nodes support Intel scalable fourth-generation processors with 2 to 4 terabytes of memory per node, which is currently limited by supply chain issues rather than technical constraints.
  • There are options for NVMe and SSD storage configurations, as well as Nvidia GPU support, with the 1U supporting single-width cards and the 2U supporting both single-width and double-width cards.
  • Michael mentions a white paper released in November of the previous year about implementing OpenShift AI and a generative AI solution on the APEX Cloud Platform, using Lama 2 and RAG to build a chatbot trained against Dell’s technical documentation.

Michael explains that the MC nodes have a subset of components that are continuously validated to ensure support and control over the configurations. This approach excludes the possibility of using existing servers customers may already have, as the solution requires common building blocks for simplicity and manageability.

There’s also a mention of the possibility of connecting to PowerFlex storage, which supports various operating systems and allows for the connection of bare metal, hypervisors, and other systems. This could be a way for customers to use existing hardware and gradually transition to the APEX Cloud Platform.