HPE Aruba Networking Comprehensive Security for All Network Devices

Event: Networking Field Day Exclusive at HPE Discover

Appearance: HPE Aruba Networking Presents at HPE Discover 2024

Company: HPE Aruba Networking

Video Links:

Personnel: Ram Krishnan, Scott Koster, Yash Nagaraju

Explore how HPE Aruba Networking delivers robust defense mechanisms against cyber threats across the entire network for unmanaged and managed devices, including IoT. Discover how HPE Aruba Networking CX switching solutions include built-in Zero Trust capabilities with AI-powered visibility, application-based policy, and micro-segmentation. Experience seamless protection enabled by SD-WAN augmented with SWG (Secure Web Gateway), shielding all network devices and accelerating SASE adoption.


Oxide Computer Company Console Demo and VM Provisioning

Event: Cloud Field Day 20

Appearance: Oxide Computer Company Presents at Cloud Field Day 20

Company: Oxide Computer Company

Video Links:

Personnel: Travis Haymore

Travis demonstrates the Oxide console and deploys multiple virtual machines (utilizing Terraform). He covers silos, images, projects, and instances as well.

Travis Haymore from Oxide Computer Company presented a detailed demonstration of their console and virtual machine (VM) provisioning capabilities during Cloud Field Day 20. He began by providing an overview of their co-location environment, specifically highlighting a rack deployed in the CoreSite data center in Milpitas. This rack serves multiple purposes, including internal use, customer demos, and evaluations. Haymore introduced the concept of “silos” as a means of achieving multi-tenancy, where resources such as CPU, memory, and storage are logically and cryptographically separated. Each silo has a distinct API endpoint, ensuring a clear demarcation between different users or customers.

In his demo, Haymore showcased the process of creating and managing projects within a silo. Projects act as logical containers for different efforts within an organization, allowing for varied configurations and user privileges. He demonstrated how to create a new project, manage instances, disks, snapshots, and images, and configure virtual private clouds (VPCs) and floating IPs. Using the UI, he provisioned a VM by selecting the necessary resources and an operating system image, in this case, Ubuntu. He emphasized the ease of passing SSH keys via Cloud-Init for immediate access to the instances, which is particularly useful for teams needing quick and secure access.

Haymore also touched on more advanced configurations, such as attaching external storage and managing network interfaces. He highlighted the flexibility of the system, noting that while many customers prefer using the API, CLI, or Terraform provider for automation, the UI remains a useful tool for less sophisticated users. The demo concluded with an example of using Terraform to automate the provisioning of multiple instances, demonstrating the system’s responsiveness and ease of use. Haymore also discussed the future potential for expanding storage and memory within the rack, and the possibility of integrating different hardware profiles to meet specific needs, emphasizing Oxide’s commitment to flexibility and scalability in their cloud computing solutions.


Oxide Cloud Computer Customer Use Cases

Event: Cloud Field Day 20

Appearance: Oxide Computer Company Presents at Cloud Field Day 20

Company: Oxide Computer Company

Video Links:

Personnel: Steve Tuck

Steve Tuck discusses some of Oxide’s key customer segments, across federal, financial, and cloud SaaS players.

Steve Tuck, co-founder and CEO of Oxide Computer Company, presented at Cloud Field Day 20, focusing on the diverse customer segments that are leveraging Oxide’s solutions. These segments include federal agencies, financial services, and cloud SaaS companies, each with unique needs and challenges. For instance, federal agencies are undergoing initiatives like the Digital First Public Experience, which aims to digitize a significant portion of government documents. These agencies often deal with on-premises IT infrastructures, and the introduction of Oxide’s cloud computing appliances has allowed them to streamline operations significantly. For example, tasks that once took months can now be accomplished in hours, leading to increased efficiency and effectiveness in handling projects.

In the federal space, Oxide’s solutions have been particularly transformative. One federal agency shared an anecdote about how their highly skilled security engineers were spending 60% of their time on tasks like VMware license management and BIOS updates instead of focusing on critical security projects. By deploying Oxide’s appliances, these engineers could redirect their efforts towards more impactful work, thereby increasing the agency’s overall productivity. The deployment process itself has been straightforward, with Oxide racks integrating seamlessly into existing environments and providing enhanced telemetry and visibility. This has been especially beneficial for networking teams, who can now quickly identify and resolve issues, leading to a smoother and more efficient operation.

Cloud SaaS companies and financial services firms are also finding significant value in Oxide’s offerings. For cloud SaaS companies, the ability to extend their platforms beyond the public cloud to reach more data and provide better performance has been a game-changer. For example, an e-commerce company using Oxide’s solutions can reduce latency and improve customer experience by deploying their platform closer to customer assets. Similarly, financial services firms are moving away from DIY infrastructure to focus on building financial products, leveraging Oxide’s API-driven, cloud-like operations to simplify their IT environments. Overall, Oxide’s solutions are enabling these organizations to achieve greater efficiency, transparency, and performance, making it an attractive option across various industries.


What Does Oxide Value Mean to Oxide Computer Company?

Event: Cloud Field Day 20

Appearance: Oxide Computer Company Presents at Cloud Field Day 20

Company: Oxide Computer Company

Video Links:

Personnel: Bryan Cantrill

Oxide offers the values that hyperscaler infrastructure offers, but on prem. This is in sharp contrast to existing on prem solutions, which is cobbled commodity infrastructure from disparate vendors – a result of market capture, not innovation. By codesigning like the hyperscalers, Oxide offers a rack-scale system it takes complete responsibility for – a system that is robustly supportable/debuggable, and open-source.

Oxide Computer Company, co-founded by Bryan Cantrill, aims to deliver the robust infrastructure benefits of hyperscalers but for on-premises environments. Unlike traditional on-prem solutions that rely on pieced-together components from various vendors, Oxide’s approach involves a holistic co-design of hardware and software, similar to hyperscalers like Google and Amazon. This co-design allows Oxide to offer a rack-scale system that is highly supportable, debuggable, and built on open-source principles. Cantrill emphasizes that this integrated approach solves many of the inefficiencies and complexities that have historically plagued on-prem systems, which often require significant manual intervention and suffer from poor inter-vendor integration.

Reflecting on the evolution of computing infrastructure, Cantrill notes that the initial on-prem era required extensive manual setup and maintenance. The rise of cloud computing in the mid-2000s, facilitated by advancements in internet ubiquity, open-source software, and commodity hardware, allowed hyperscalers to innovate rapidly. However, this led to a divide where only hyperscalers could leverage modern, efficient infrastructure, leaving traditional on-prem systems outdated and inefficient. Oxide aims to bridge this gap by offering on-prem solutions that bring the same level of efficiency and innovation found in hyperscaler environments, but with the added benefits of ownership, compliance, and localized control.

Oxide’s unique value proposition lies in its ability to offer a fully integrated system where both hardware and software are designed to work seamlessly together. This integration not only enhances performance and reliability but also simplifies support and troubleshooting. Cantrill highlights that Oxide’s commitment to transparency, open-source development, and deep partnerships with suppliers enables them to solve complex supply chain issues and deliver a resilient product. By focusing on engineering rigor and first principles, Oxide ensures that their systems are not only high-performing but also highly supportable, providing a delightful customer experience. This approach allows Oxide to meet the needs of enterprises that require the flexibility and control of on-prem solutions without sacrificing the innovations and efficiencies of modern cloud infrastructure.


What is the Oxide Cloud Computer?

Event: Cloud Field Day 20

Appearance: Oxide Computer Company Presents at Cloud Field Day 20

Company: Oxide Computer Company

Video Links:

Personnel: Steve Tuck

The Oxide Cloud Computer is a vertically integrated platform built at rack-scale for efficiency and operational benefits, with a control plane for elasticity and multi-tenancy, and with networking built in.

The Oxide Cloud Computer is a vertically integrated platform designed to bring the efficiencies and operational benefits of hyperscale cloud infrastructure to on-premises environments. Unlike traditional enterprise IT setups, which require piecing together hardware from various vendors and dealing with integration and maintenance complexities, Oxide offers a rack-scale solution that integrates hardware, software, and networking from the ground up. This approach is inspired by the design principles of cloud hyperscalers, who have moved away from the “kit car” approach of assembling individual servers and instead build at the rack level for better density, energy efficiency, and operational simplicity.

A key feature of the Oxide Cloud Computer is its focus on an API-first mentality, which simplifies automation and management tasks. By building their own switch and incorporating a fully programmable network stack using P4, Oxide ensures that the entire system is optimized for performance and observability. This allows for advanced features like delay-driven multipath routing, which constantly optimizes packet paths based on real-time latency data. Furthermore, the system’s hardware and software are co-designed to work seamlessly together, enabling capabilities such as dynamic power orchestration and power capping, which are crucial for managing energy efficiency and addressing power constraints in data centers.

The Oxide Cloud Computer also emphasizes ease of use and scalability. The system features blind-mate server sleds that can be easily added or replaced without the need for complex cabling, reducing the time and effort required to scale up capacity. This design allows for quick deployment and minimizes downtime, enabling developers to be productive within hours of installation. Additionally, the platform includes an elastic storage service and a set of networking and security services akin to those found in public clouds, providing a familiar and powerful environment for developers. By offering a CapEx model with a support subscription for updates and maintenance, Oxide aims to deliver a cost-effective and long-lasting solution for on-premises cloud computing.


Who is Oxide Computer Company?

Event: Cloud Field Day 20

Appearance: Oxide Computer Company Presents at Cloud Field Day 20

Company: Oxide Computer Company

Video Links:

Personnel: Steve Tuck

The Oxide founders experienced the challenges of managing on prem cloud infrastructure firsthand – their pains alchemized into Oxide. The team consists of leading technologists specialized across the stack (from companies like Sun, Meta, Amazon, Google, etc.).

Oxide Computer Company was founded in 2019 by Steve Tuck and Bryan Cantrill, who previously worked together at Joyent, a cloud computing company. Their mission is to bring the benefits of cloud computing architecture, such as elastic infrastructure services and API-driven automation, to on-premises environments. They believe that cloud computing should not be limited to public clouds but should be ubiquitous, available to enterprises that still maintain significant IT infrastructure outside the public cloud. The founders’ experience at companies like Dell and Sun Microsystems informed their understanding of the challenges and inefficiencies in the current market, leading them to create a solution that integrates hardware and software into a unified product.

The team at Oxide is composed of top-tier technologists from various leading companies such as Sun, Meta, Amazon, and Google, covering the entire stack from hardware to software. This diverse expertise allows Oxide to take a holistic approach to designing their cloud computer, ensuring that every layer is optimized and integrated seamlessly. The company has embraced a remote-first model, which has enabled them to attract talent from around the world, including experts from the GE medical team and other specialized fields. This diverse and highly skilled team is the nucleus of Oxide, driving their ambitious goal of revolutionizing on-premises cloud computing.

Oxide’s focus is on the large enterprise and cloud SaaS markets, with significant interest from the federal sector and financial services. These sectors require the automation and developer tooling provided by cloud-native architectures but need to maintain control over their data and applications due to security, latency, and regulatory requirements. Oxide’s transparent approach, including sharing their software on GitHub and discussing their journey on the “Oxide and Friends” podcast, reflects their commitment to openness and innovation. Their partnerships with key suppliers, like Sanyo Denki, have enabled them to co-design high-efficiency systems, further enhancing their products’ performance and reliability.


Cloud Field Day at Google Cloud Wrap-Up

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Bobby Allen

Wrapping up the Cloud Field Day 20 presentation by Google Cloud, Bobby Allen emphasized the importance of innovation in the realm of cloud computing. He pointed out that innovation happens at the intersection of where the model lives and where the application runs, highlighting the significance of both components. Allen referenced various Google Cloud tools presented during this day-long session such as Vertex, GKE, and Cloud Run, and demonstrated through examples how these tools can be leveraged to build generative AI applications. He stressed the flexibility and adaptability of these tools depending on the specific needs and stages of a project, suggesting that different iterations and combinations can be used simultaneously to optimize outcomes.

Allen also discussed the broader implications of leveraging AI within different layers of the platform and the software development lifecycle (SDLC). He reinforced the idea that AI itself is not the end goal but a means to enhance and transform existing processes and applications. By showcasing tools like Cloud Assist and Code Assist along with AI-centric platforms, he illustrated how these technologies can be integrated to provide substantial improvements. The emphasis was on Google Cloud’s role as a transformation partner rather than just a technology provider, offering numerous options that avoid technical debt and allow for flexible decision-making.


Google Kubernetes Engine – The Container Platform for AI at Scale from Google Cloud

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Brandon Royal

Brandon Royal, a Product Manager at Google Cloud, describes how Kubernetes can be leveraged for AI applications, particularly focusing on model training and serving. He begins by emphasizing the growing importance of generative AI across many organizations, highlighting that Google Kubernetes Engine (GKE) provides a robust platform for integrating AI into products and services. The platform is designed to handle the increasing complexity and scale of AI models, which demand high efficiency and cost-effectiveness. Royal mentions that GKE, often referred to as the operating system of Google’s AI hypercomputer, orchestrates workloads across storage, compute, and networking to deliver optimal price performance.

Royal addresses the challenges of scaling AI workloads, noting that model sizes are growing and pushing the limits of infrastructure. To tackle these challenges, GKE offers several optimizations, such as dynamic workload scheduling and container preloading, which enhance the efficiency and utilization of AI resources like CPUs, GPUs, and TPUs. He introduces the concept of “good put,” a metric for measuring machine learning productivity, which includes scheduling good put, runtime good put, and program good put. These metrics help ensure that resources are utilized effectively, minimizing idle time and maximizing forward progress in model training. Royal also highlights the importance of leveraging open-source frameworks like Ray and Kubeflow, which integrate seamlessly with GKE to provide a comprehensive AI development and deployment environment.

The presentation includes a demo showcasing the optimization capabilities of GKE. Royal demonstrates how container preloading and persistent volume claims can significantly reduce the time required to deploy AI models. By preloading container images and sharing model weights across instances, GKE can cut down deployment times from several minutes to mere seconds. This optimization is crucial for large-scale AI deployments, where efficiency and speed are paramount. Royal concludes by encouraging the audience to explore the resources and tutorials available for building AI platforms on GKE, emphasizing that these optimizations can provide a competitive edge in the fast-evolving field of AI.


Google Cloud Run and GenAI Apps

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Lisa Shen

In this presentation, Lisa Shen, a product manager at Google Cloud, introduces Cloud Run, Google Cloud’s serverless runtime platform, and discusses its integration with generative AI (GenAI) applications. Cloud Run simplifies the deployment and scaling of modern workloads by removing the overhead of infrastructure management. Built on container technology, it offers flexibility, portability, and cost-saving benefits, as users only pay when their code is running. Shen highlights two primary resources within Cloud Run: services for HTTP endpoints and jobs for executing tasks to completion, making it suitable for various use cases, including web applications and batch data processing.

Shen provides examples of companies like L’Oreal and Ford that have adopted Cloud Run to modernize their infrastructure and accelerate innovation. L’Oreal, for instance, used Cloud Run to implement a GenAI service for its employees, resulting in the rapid launch of L’Oreal GPT. Similarly, Ford transitioned to a Cloud Run-first approach to enhance scalability and reliability in vehicle design and manufacturing. These examples illustrate Cloud Run’s ability to improve developer velocity, reduce costs, and simplify application deployment, making it an attractive option for both cloud-native and traditional enterprises.

The presentation includes a demonstration of building a GenAI application using Cloud Run and Vertex AI. Shen explains how Cloud Run can handle various architectural components of GenAI applications, such as serving and orchestration, data ingestion, and quality evaluation. The demo showcases the process of deploying a web-based application that queries Cloud Run release notes, highlighting the ease of use and efficiency of Cloud Run in handling such tasks. Shen emphasizes that while Cloud Run is primarily for serving and orchestrating applications, more complex tasks like model fine-tuning and heavy lifting are better suited for Vertex AI or Google Kubernetes Engine (GKE). The session concludes with a discussion on managing costs and scaling with Cloud Run, ensuring that users can deploy applications efficiently without unexpected expenses.


Google Cloud Vertex AI Platform

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Neama Dadkhahnikoo

Google Cloud’s Vertex AI platform is built on a rich history of innovation and enterprise readiness, offering an integrated AI-optimized portfolio. The platform leverages Google’s groundbreaking technologies such as TPUs and the transformer architecture, which have been instrumental in the development of large language models (LLMs) like Gemini. Gemini stands out for its multimodal capabilities, allowing it to process and reason across text, images, audio, and video simultaneously. This multimodal approach enables advanced functionalities like identifying specific moments in a video or understanding complex prompts that combine text and images. The platform also emphasizes flexibility and choice, providing options for different model sizes and prompting windows to match various use cases and cost considerations.

The presentation highlighted the practical applications of Vertex AI through several demos. One notable example demonstrated the model’s ability to process a 44-minute video and accurately identify a specific scene based on a text prompt, showcasing its capability to handle long context understanding. Another demo illustrated the use of a multimodal prompt, where a simple doodle was used to locate a corresponding scene in the video. These examples underscore the potential of Vertex AI in real-world scenarios, such as customer service chatbots, sports highlight identification, and even complex tasks like code transformation and financial document analysis. The platform’s ability to cache context and perform batch processing further enhances its efficiency and cost-effectiveness.

Vertex AI also focuses on enterprise readiness, ensuring data security, governance, and compliance. The platform provides tools for model evaluation, monitoring, and customization, allowing enterprises to tailor models to their specific needs while protecting their data. Features like grounding APIs help ensure the accuracy of model outputs by linking responses to verified data sources, addressing concerns about AI-generated content’s reliability. Additionally, the platform supports various levels of coding expertise, from no-code to full-code, making it accessible to a wide range of users. With its comprehensive suite of tools and emphasis on security and flexibility, Vertex AI positions itself as a robust solution for enterprises looking to leverage AI for diverse applications.


Generate Storage Insights with Gemini in Google Cloud

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Manjul Sahay

In this presentation, Manjul Sahay, a Group Product Manager at Google Cloud, introduces a new feature designed to provide valuable insights into cloud storage using the Gemini Cloud Assist portfolio. He highlights the unique challenges faced by customers managing vast amounts of data across numerous projects and buckets, often handled by a small team of administrators. The traditional approach involves extensive manual effort to export metadata, build data pipelines, and develop automation, which can be time-consuming and complex. To address this, Google Cloud has developed a set of features that simplify the analysis and management of storage at scale, focusing on ease of use and operational efficiency.

Sahay explains that the new feature, introduced at Cloud Next, leverages storage insights datasets and BigQuery to provide daily snapshots of object metadata. This data is then processed using Gemini to generate actionable insights through natural language queries, eliminating the need for specialized SQL knowledge or complex data pipelines. The feature allows users to type in questions in plain language and receive accurate, verified answers, making it accessible to both administrators and general users. Pre-curated prompts cover common queries related to usage, savings, security, and data discovery, ensuring high accuracy and reliability. The demo showcased how users can quickly identify storage distribution across regions, check for public access vulnerabilities, and manage cost by locating and addressing orphaned or unnecessary data.

The presentation also addresses the potential for future enhancements, such as integrating more advanced security and access control features. While the current focus is on reading and understanding metadata, Sahay hints at the possibility of expanding these capabilities to include more complex operations and other storage services. He emphasizes the importance of AI in accelerating analysis and providing deeper insights, while cautioning against fully automated actions without human oversight. The feature is currently in experimental preview, with plans for general availability in the coming months, promising to significantly improve storage management for Google Cloud customers by reducing complexity and enhancing operational efficiency.


Gemini Code Assist in Google Cloud

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Rakesh Dhoopar

Rakesh Dhoopar, Director of Product Management at Google, presented Gemini Code Assist at Cloud Field Day 20, focusing on enhancing developer productivity and addressing common challenges in the coding world. He discussed how onboarding new developers can be slow due to the time required to get them up to speed on a project, and how excessive context switching and technical debt can further hinder productivity. Dhoopar emphasized the importance of reducing repetitive tasks and providing tools that assist in writing and maintaining code efficiently.

Dhoopar highlighted several capabilities of Gemini Code Assist, such as code generation and code completion. He explained the difference between these two features: code completion helps developers by predicting and finishing code as they type, while code generation allows developers to specify what they need in natural language, and the tool generates the entire code. He also mentioned the integration of Code Assist with Snyk for real-time vulnerability scanning, ensuring that the generated code is secure and complies with enterprise standards. Additionally, Gemini Code Assist can explain code in natural language and generate test plans and unit tests, significantly easing the developer’s burden.

The presentation also covered the technical aspects of Gemini Code Assist, including its ability to handle large context windows with up to one million tokens, which can represent a substantial portion of a codebase. This capability allows the tool to provide context-aware suggestions by analyzing the entire codebase, including local files, open tabs, and remote repositories. Dhoopar explained the importance of maintaining security and privacy by using mechanisms like Developer Connect and Cloud Build to manage and convert code into embeddings stored in alloyDB. This ensures that the actual code remains within the customer’s VPC, addressing security concerns while leveraging the power of large language models to enhance developer productivity.


Gemini Cloud Assist in Google Cloud

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Bobby Allen

Gemini Cloud Assist, a feature of Google Cloud, serves as an extensible cloud intelligence tool designed to enhance user efficiency by providing actionable insights and recommendations. Bobby Allen emphasizes that Gemini Cloud Assist integrates the intelligence from Google’s Gemini model to supercharge workloads on Google Cloud. This feature is particularly beneficial for users who are not necessarily building AI but are looking to optimize their existing cloud infrastructure. It offers insights on various aspects such as cost-saving opportunities, operational efficiencies, and application design improvements, all contextualized within the user’s specific cloud environment.

One of the primary advantages of Gemini Cloud Assist is its ability to save time and reduce technical debt. As organizations face increasing demands without corresponding increases in budget or personnel, tools like Gemini Cloud Assist become essential. The feature provides actionable insights directly within the Google Cloud console, allowing users to address inefficiencies such as underutilized resources or potential upgrade issues. For example, it can identify idle clusters that may be candidates for cost-saving measures like autopilot mode and even offer commands and best practices to implement these changes. This functionality ensures that users can maintain optimal performance and cost-efficiency without needing to be experts in every aspect of their cloud environment.

Gemini Cloud Assist also addresses the challenge of keeping up with rapidly evolving technology. As training can quickly become outdated, the feature acts as a knowledgeable assistant, providing real-time, resource-aware insights and recommendations. It helps users navigate complex cloud environments by surfacing relevant information and best practices, thereby reducing the cognitive load on IT professionals. Additionally, the tool supports user queries through a chat interface, offering contextual answers based on the user’s specific resources. This makes it easier for users to implement best practices and optimize their cloud infrastructure effectively, ensuring they stay ahead of potential issues and maintain a high level of operational efficiency.


Google Cloud Network Infrastructure for AI/ML

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Victor Moreno

Victor Moreno, a product manager at Google Cloud, presented on the network infrastructure Google Cloud has developed to support AI and machine learning (AI/ML) workloads. The exponential growth of AI/ML models necessitates moving vast amounts of data across networks, making it impossible to rely on a single TPU or host. Instead, thousands of nodes must communicate efficiently, which Google Cloud achieves through a robust software-defined network (SDN) that includes hardware acceleration. This infrastructure ensures that GPUs and TPUs can communicate at line rates, dealing with challenges like load balancing and data center topology restructuring to match traffic patterns.

Google Cloud’s AI/ML network infrastructure involves two main networks: one for GPU-to-GPU communication and another for connecting to external storage and data sources. The GPU network is designed to handle high bandwidth and low latency, essential for training large models distributed across many nodes. This network uses a combination of electrical and optical switching to create flexible topologies that can be reconfigured without physical changes. The second network connects the GPU clusters to storage, ensuring periodic snapshots of the training process are stored efficiently. This dual-network approach allows for high-performance data processing and storage communication within the same data center region.

In addition to the physical network infrastructure, Google Cloud leverages advanced load balancing techniques to optimize AI/ML workloads. By using custom metrics like queue depth, Google Cloud can significantly improve response times for AI models. This optimization is facilitated by tools such as the Open Request Cost Aggregation (ORCA) framework, which allows for more intelligent distribution of requests across model instances. These capabilities are integrated into Google Cloud’s Vertex AI service, providing users with scalable, efficient AI/ML infrastructure that can automatically adjust to workload demands, ensuring high performance and reliability.


AI Workloads and Hardware Accelerators – Introducing the Google Cloud AI Hypercomputer

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Ishan Sharma

Ishan Sharma, a Senior Product Manager for Google Kubernetes Engine (GKE), presented advancements in enhancing AI workloads on Google Cloud during Cloud Field Day 20. He emphasized the rapid evolution of AI research and its practical applications across various sectors, such as content generation, pharmaceutical research, and robotics. Google Cloud’s infrastructure, including its AI hypercomputer, is designed to support these complex AI models by providing robust and scalable solutions. Google’s extensive experience in AI, backed by over a decade of research, numerous publications, and technologies like the Transformer model and Tensor Processing Units (TPUs), positions it uniquely to meet the needs of customers looking to integrate AI into their workflows.

Sharma highlighted why customers prefer Google Cloud for AI workloads, citing the platform’s performance, flexibility, and reliability. Google Cloud offers a comprehensive portfolio of AI supercomputers that cater to different workloads, from training to serving. The infrastructure is built on a truly open and comprehensive stack, supporting both Google-developed models and those from third-party partners. Additionally, Google Cloud ensures high reliability and security, with metrics focused on actual work done rather than just capacity. The global scale of Google Cloud, with 37 regions and cutting-edge infrastructure, combined with a commitment to 100% renewable energy, makes it an attractive option for AI-driven enterprises.

The presentation also covered the specifics of Google Cloud’s AI Hypercomputer, a state-of-the-art platform designed for high performance and efficiency across the entire stack from hardware to software. This includes various AI accelerators like GPUs and TPUs, and features like the dynamic workload scheduler (DWS) for optimized resource management. Sharma explained how GKE supports AI workloads with tools like Q for job queuing and DWS for dynamic scheduling, enabling better utilization of resources. Additionally, GKE’s flexibility allows it to handle both training and inference workloads efficiently, offering features like rapid node startup and GPU sharing to drive down costs and improve performance.


Security in Google Cloud

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Glen Messenger

In his presentation at Cloud Field Day 20, Glenn Messinger, Product Manager for Google’s GKE security team, discussed the complexities and challenges of securing Kubernetes environments. He emphasized that while Kubernetes offers significant power and flexibility, these attributes also introduce substantial complexity, making security a primary concern for users. Many Kubernetes users have experienced security incidents, either in production or during deployment, highlighting the need for robust security measures. Google’s approach to GKE security focuses on reducing risk, enhancing compliance, and improving operational efficiency. Messinger introduced the concept of Kubernetes Security Posture Management (KSPM), which is designed to automate security and compliance specifically for Kubernetes environments.

Messinger detailed several key areas of focus within KSPM, including vulnerability management, threat detection, and compliance and governance. For vulnerability management, Google has developed GKE Security Posture, a tool that performs runtime-based vulnerability detection on clusters, providing detailed insights into container OS vulnerabilities and language packs. The tool is designed to be user-friendly, allowing customers to filter vulnerabilities by severity, region, cluster, and other parameters. In terms of threat detection, Messinger highlighted the capabilities of GKE Threat Detection, which utilizes both log detection and behavior-based detection methods to identify and mitigate potential threats. This service is integrated with Google’s Security Command Center, providing a comprehensive view of threats across the entire GCP environment.

Regarding compliance and governance, Messinger explained that GKE compliance tools help customers adhere to industry standards and set governance guardrails. These tools provide dashboards that show compliance status and detailed remediation steps for identified issues. Additionally, Google’s policy controller, which utilizes OPA Gatekeeper, allows for the customization of policies to meet specific compliance requirements. Messinger concluded the presentation by addressing questions about automated remediation, the ability to filter and mute known vulnerabilities, and protections against data encryption attacks. Overall, Google’s GKE security efforts aim to simplify the management of security and compliance in Kubernetes environments, enabling customers to innovate while minimizing risk.


AI/ML Storage Workloads in Google Cloud

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Sean Derrington

Sean Derrington from Google Cloud’s storage group presents advancements in cloud storage, particularly for AI and ML workloads. Google Cloud has focused on optimizing storage solutions to support the unique requirements of AI and ML applications, such as the need for high throughput and low latency. Key innovations include the Anywhere Cache, which allows data to be cached close to GPU and TPU resources to accelerate training processes, and the parallel file system, which is based on Intel DAOS and is designed to handle ultra-low latency and high throughput. These advancements aim to provide flexible and scalable storage options that can adapt to various workloads and performance needs.

Derrington also highlights the introduction of HyperDisk ML, a block storage offering that enables volumes of data to be accessible as read-only across thousands of hosts, further speeding up data loading for training. Furthermore, Google Cloud has introduced Cloud Storage FUSE with caching, which allows customers to mount a bucket as if it were a file system, reducing storage costs and improving training efficiency by eliminating the need for multiple data copies. These solutions are designed to decrease the time required for training epochs, thereby enhancing the overall efficiency of AI and ML workloads.

In addition to AI and ML optimizations, Google Cloud has focused on providing robust storage solutions for other workloads, such as GKE and enterprise applications. Filestore offers various instance types—Basic, Zonal, and Regional—each catering to different performance, capacity, and availability needs. Filestore Multi-Share allows for the provisioning of small persistent volumes, scaling automatically as needed. HyperDisk also introduces storage pools, enabling the pooling of IOPS and capacity across multiple volumes, thus optimizing resource usage and cost. These storage solutions are designed to support both stateless and stateful workloads, ensuring high availability and seamless failover capabilities.


Running Modern Workloads in Google Cloud

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: William Denniss

William Denniss, product manager at Google, introduces GKE Autopilot during his presentation at Cloud Field Day 20. He explained that GKE Autopilot is a simplified way of using Google Kubernetes Engine (GKE) by focusing on Kubernetes as the primary interface, eliminating the need for users to manage the underlying infrastructure. Denniss emphasized that Kubernetes sits between traditional virtual machines (VMs) and fully managed services like Cloud Run, offering a balanced approach that provides flexibility without the complexity of managing low-level resources. He highlighted that Kubernetes is particularly beneficial for complex workloads, such as high-availability databases and AI training jobs, which require robust orchestration capabilities.

Denniss discussed the traditional challenges of managing Kubernetes, such as configuring node pools and handling security concerns. He explained that GKE Autopilot addresses these issues by collapsing the complex layers of infrastructure management into a more streamlined process. With Autopilot, users only need to interact with the Kubernetes API, while Google manages the underlying VMs and other infrastructure components. This approach reduces the administrative burden on users and allows them to focus on their workloads rather than the intricacies of infrastructure management. Denniss also mentioned that this model shifts the responsibility for infrastructure issues to Google, providing users with a more reliable and hands-off experience.

Discussing this solution with the delegates, Denniss concluded by emphasizing the importance of understanding the trade-offs between control and convenience, suggesting that while Autopilot may not be suitable for every use case, it offers significant benefits for those looking to simplify their Kubernetes management.


Running Enterprise Workloads in Google Cloud

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Jeff Welsch

Jeff Welsch, product manager at Google Cloud, discusses the opportunity of running enterprise workloads in the cloud, emphasizing that enterprise use cases are substantial for many customers. He outlines Google’s compute organization, which includes offerings such as virtual machines, TPUs, GPUs, block storage, and enterprise solutions like VMware, SAP, and Microsoft. Welsch explains that Google Cloud is focused on optimizing infrastructure to meet customer requirements, especially in light of challenges like increasing compute demands from AI and the plateauing of Moore’s Law. Google Cloud’s approach involves leveraging AI capabilities and modern infrastructure to improve performance, reliability, security, and cost efficiency, while also prioritizing sustainability.

Welsch introduces Google’s Titanium technology, which aims to optimize infrastructure by breaking out of traditional server limitations and disaggregating performance capabilities. Titanium allows for tiered offloading, improving CPU responsiveness and storage performance, as exemplified by the HyperDisk service. He highlights that Titanium enables better optimization and efficiency, providing benefits like reduced latency and improved price performance without requiring customers to consume more resources. Additionally, Titanium supports dynamic resource management, allowing for live migration and non-disruptive maintenance, which enhances the overall reliability and performance of enterprise workloads.

The presentation also covers specific enterprise workloads like Microsoft, VMware, and SAP. Google Cloud offers robust support for Microsoft workloads, with features like cost optimization, live migration, and integration with AI-based modernization tools. For VMware, Google Cloud provides a seamless, integrated experience with the Google Cloud VMware Engine, facilitating easy migration and access to Google Cloud services. SAP workloads benefit from Google Cloud’s memory-optimized instances and tight integration with AI and machine learning capabilities. Welsch concludes by emphasizing Google Cloud’s commitment to optimizing infrastructure to meet the diverse needs of enterprise applications, ensuring performance, reliability, and cost-effectiveness.


Google Cloud Overview and Cloud Field Day Introduction

Event: Cloud Field Day 20

Appearance: Google Cloud Presents at Cloud Field Day 20

Company: Google Cloud

Video Links:

Personnel: Bobby Allen

In this presentation, Bobby Allen from Google Cloud provides an overview of the themes and topics to be discussed during their full-day session at Cloud Field Day 20. He begins by acknowledging the vast scope of Google Cloud, noting that this presentation focuses on foundational topics like storage, networking, and security, as well as the importance of AI in today’s tech landscape. Specifically, he discusses AI’s integration into the platform and its role in the software development lifecycle (SDLC). Throughout the presentation Allen introduces his “Bobby-isms” and frames the discussion with key considerations to ponder throughout the day.

Allen underscores that Google Cloud is not just another cloud provider but a platform that supports billions of users globally, requiring robust, planet-scale infrastructure. He introduces the concept of Google Distributed Cloud, which offers various solutions for those who can’t always use the public cloud due to regulatory or operational constraints. These solutions include software-only options, Google-connected hardware, and air-gapped solutions for environments with limited connectivity. He also mentions modernization tools like Migration Center and Migrate to Containers, which help transition legacy workloads to more modern architectures like containers and serverless computing.

Throughout the presentation, Allen emphasizes the importance of balancing new technologies with existing, proven solutions. He introduces the idea that AI is not an end in itself but a means to enhance other applications and use cases. Using the analogy of AI as a “sauce” that improves the “dish” (the core application), he stresses the need for practical, customer-focused solutions. Allen also differentiates between incremental improvements (Neos) and groundbreaking innovations (Kainos), urging a balanced approach to technology adoption.