|
This video is part of the appearance, “Intel Presents at AI Field Day 4“. It was recorded as part of AI Field Day 4 at 8:00-8:30 on February 22, 2024.
Watch on YouTube
Watch on Vimeo
There’s a major AI hype cycle today, but what do businesses actually need? Today’s enterprises typically benefit from AI as a general-purpose, mixed workload instead of a purely dedicated one. Intel AI Product Director Ro Shah contextualizes the time and place for inferencing, nimble vs giant AI models, hardware and software options – all with TCO in mind. He leads into customer and partner examples to ground this in reality and avoid the FOMO.
Ro Shah, AI Product Director at Intel, discusses the deployment of AI, particularly focusing on inferencing, on Intel Xeon CPUs. He explains that while deep learning training often requires accelerators, deployment can be effectively handled by a mix of CPUs and accelerators. Shah emphasizes that CPUs are a good fit for mixed general-purpose and AI workloads, offering ease of deployment and total cost of ownership (TCO) benefits.
Shah describes a customer usage model where AI deployment bifurcates into two scenarios: large-scale dedicated AI cycles, which may require accelerators, and mixed workloads with general-purpose and AI cycles, where CPUs are advantageous. He provides a threshold for model size, suggesting CPUs for models with less than 20 billion parameters, and accelerators for anything larger. Using customer examples, Shah illustrates the advantages of deploying AI on CPUs for mixed workloads, such as video conferencing with added AI features like real-time transcription and speech translation. He also touches on the capabilities of Intel CPUs in client-side applications and the potential for on-premises deployment for enterprise customers.
Shah moves on to discuss generative AI and the use of large language models, noting that CPUs can meet latency requirements up to about 20 billion parameters. He shows performance data for specific models, highlighting the importance of next-token latency in determining whether a CPU or an accelerator is appropriate for a given task.
Regarding software, Shah stresses the importance of upstreaming optimizations to standard tools like PyTorch and TensorFlow, and mentions Intel-specific tools like OpenVINO and Intel Neural Compressor for performance improvements. He also covers the ease of transitioning between Xeon generations and how Intel’s broad ecosystem presence allows for AI deployment everywhere.
Personnel: Ronak Shah