|
This video is part of the appearance, “Google Cloud Presents at AI Infrastructure Field Day 2 – Afternoon“. It was recorded as part of AI Infrastructure Field Day 2 at 13:00 - 16:30 on April 22, 2025.
Watch on YouTube
Watch on Vimeo
Vivek Sarswat, Group Product Manager at Google Cloud Storage, presented on analytics storage and AI, focusing on data preparation and data lakes. He emphasized the close ties between analytics and AI workloads, highlighting key innovations built to address related challenges. The presentation demonstrates that analytics play a crucial role in the AI data pipeline, particularly in ingestion, data preparation, and cleaning.
Sarswat explained how customers increasingly build unified data lake houses using open metadata table formats like Apache Iceberg. This approach enables analytics and AI workloads, including running analytics on AI data. He cited Snap as a customer example, processing trillions of user events weekly using Spark for data preparation and cleaning on top of Google Cloud Storage. Google Cloud Storage offers optimizations like the Cloud Storage Connector, Anywhere Cache, and Hierarchical Namespace (HNS) to enhance data preparation.
Sarswat covered the concept of a data lakehouse, combining structured and unstructured data in a unified platform with a separation layer using open table formats. Examples from Snowflake, Databricks, Uber, and Google Cloud’s BigQuery tables for Apache Iceberg illustrated the diverse architectures employed. Sarswat also addressed common customer challenges like data fragmentation, performance bottlenecks, and optimization for resilience, security, and cost, offering solutions like Storage Intelligence, Anywhere Cache, and Bucket Relocate, referencing customer case studies such as Spotify and Two Sigma.
Personnel: Vivek Saraswat