Watch on YouTube
Watch on Vimeo
Sean Derrington from Google Cloud’s storage group presented on the company’s efforts to optimize AI and workload infrastructure, focusing on the needs of large-scale customers. Google Cloud has been working on a comprehensive system, referred to as the AI hypercomputer, which integrates hardware and software to help customers efficiently manage their AI tasks. The hardware layer includes a broad portfolio of accelerators like GPUs and TPUs, tailored for different workloads. The network capabilities of Google Cloud ensure predictable and consistent performance globally. Additionally, Google Cloud offers various framework packages and managed services like Vertex AI, which supports different AI activities, from building and training models to serving them.
Derrington highlighted the recent release of Parallel Store, Google Cloud’s first managed parallel file system, and Hyperdisk ML, a read-only block storage service. These new storage solutions are designed to handle the specific demands of AI workloads, such as training, checkpointing, and serving. Parallel Store, for instance, is built on local SSDs and is suitable for scratch storage, while Hyperdisk ML allows multiple hosts to access the same data, making it ideal for AI applications. The presentation also touched on the importance of selecting the right storage solution based on the size and nature of the training data set, checkpointing needs, and serving requirements. Google Cloud’s open ecosystem, including partnerships with companies like SciCom, offers additional storage options like GPFS-based solutions.
The presentation emphasized the need for customers to carefully consider their storage requirements, especially as they scale their AI operations. Different storage solutions are suitable for different scales of operations, from small-scale jobs requiring low latency to large-scale, high-throughput needs. Google Cloud aims to provide consistent and flexible storage solutions that can seamlessly transition from on-premises to cloud environments. The goal is to simplify the decision-making process for customers and ensure they have access to the necessary resources, such as H100s, which might not be available on-premises. The session concluded with a promise to delve deeper into the specifics of Parallel Store and other storage solutions, highlighting their unique capabilities and use cases.
Personnel: Sean Derrington
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!