|
This video is part of the appearance, “VMware by Broadcom Presents Private AI with Intel at AI Field Day 4“. It was recorded as part of AI Field Day 4 at 9:45-10:45 on February 22, 2024.
Watch on YouTube
Watch on Vimeo
Looking to deploy AI models using your existing data center investments? VMware and Intel have collaborated to announce VMware Private AI with Intel. VMware Private AI with Intel will help enterprises build and deploy private and secure AI models running on VMware Cloud Foundation and boost AI performance by harnessing Intel’s AI software suite and 4th Generation Intel® Xeon® Scalable Processors with built-in accelerators. In this session we’ll explain the technology behind AMX CPUs and demonstrate LLMs running on AMX CPUs.
Earl Ruby, R&D Engineer at VMware by Broadcom, discusses leveraging AI without the need for GPUs, focusing on using CPUs for AI workloads. He talks about VMware’s collaboration with Intel on VMware Private AI with Intel, which enables enterprises to build and deploy private AI models on-premises using VMware Cloud Foundation and Intel’s AI software suite along with the 4th Generation Intel Xeon Scalable Processors with built-in accelerators.
Ruby highlights the benefits of Private AI, including data privacy, intellectual property protection, and the use of established security tools in a vSphere environment. He explains the technology behind Intel’s Advanced Matrix Extensions (AMX) CPUs and how they can accelerate AI/ML workloads without the need for separate GPU accelerators. AMX CPUs are integrated into the core of Intel’s Sapphire Rapids and Emerald Rapids servers, allowing for the execution of AI and non-AI workloads in a virtualized environment. Ruby demonstrates the performance of Large Language Models (LLMs) running on AMX CPUs compared to older CPUs without AMX, showing a significant improvement in speed and efficiency.
He also discusses the operational considerations when choosing between CPU and GPU for AI workloads, emphasizing that CPUs should be used when performance is sufficient and cost or power consumption are concerns, while GPUs should be used for high-performance needs, especially when low latency or frequent fine-tuning of large models is required.
Personnel: Earl Ruby