Optimized Storage from Supermicro and Solidigm to Accelerate Your Al Data Pipeline

Event: AI Field Day 4

Appearance: Solidigm Presents at AI Field Day 4

Company: Solidigm, Supermicro

Video Links:

Personnel: Paul McLeod, Wendell Wenjen

Wendell Wenjen and Paul McLeod from Supermicro discuss challenges and solutions for AI and machine learning data storage. Supermicro is a company that provides servers, storage, GPU-accelerated servers, and networking solutions, with a significant portion of their revenue being AI-related.

They highlighted the challenges in AI operations and machine learning operations, specifically around data management, which includes collecting data, transforming it, and feeding it into GPU clusters for training and inference. They also emphasized the need for a large capacity of storage to handle the various phases of the AI data pipeline.

Supermicro has a wide range of products designed to cater to each stage of the AI data pipeline, from data ingestion, which requires a large data lake, to the training phase, which requires retaining large amounts of data for model development and validation. They also discussed the importance of efficient data storage solutions and introduced the concept of an “IO Blender effect,” where multiple data pipelines run concurrently, creating a mix of different IO profiles.

Supermicro delved deeper into the storage solutions, highlighting their partnership with WEKA, a software-defined storage company, and how their architecture is optimized for AI workloads. They explained the importance of NVMe flash storage, which can outpace processors, and the challenges of scaling such storage solutions. They also discussed Supermicro’s extensive portfolio of storage servers, ranging from multi-node systems to petascale architectures, designed to accommodate different customer needs.

Supermicro’s approach to storage for AI includes a two-tiered solution with flash storage for high performance and disk-based storage for high capacity at a lower cost. They also touched on the role of GPU direct storage in reducing latency and the flexibility of their software-defined storage solutions.

The presentation concluded with an overview of Supermicro’s product offerings for different AI and machine learning workloads, from edge devices to large data center storage solutions.


Why Storage Matters for AI with Solidigm

Event: AI Field Day 4

Appearance: Solidigm Presents at AI Field Day 4

Company: Solidigm

Video Links:

Personnel: Ace Stryker, Alan Bumgarner

In this presentation, Ace Stryker and Alan Bumgarner of Solidigm discuss the importance of storage in AI workloads. They explain that as AI models and datasets grow, efficient and high-performance storage becomes increasingly critical. They introduce their company, Solidigm, which emerged from SK Hynix’s acquisition of Intel’s storage group, and they offer a range of SSD products suitable for AI applications.

The discussion covers several key points:

  1. The growing AI market and the shift from centralized to distributed compute and storage, including the edge.
  2. The dominance of hard drives for AI data and the opportunity for transitioning to flash storage.
  3. The role of storage in AI workflows, including data ingestion, preparation, training, and inference.
  4. The Total Cost of Ownership (TCO) benefits of SSDs over hard drives, considering factors like power consumption, space, and cooling.
  5. The Solidigm product portfolio, emphasizing different SSDs for various AI tasks, and the importance of choosing the right storage based on workload demands.
  6. A customer case study from Kingsoft in China, which saw a significant reduction in data processing time by moving to an all-flash array.
  7. The future potential of AI and the importance of SSDs in enabling efficient AI computing.

The session also includes questions from the Field Day delegates covering technical aspects of Solidigm storage products, such as the role of their cloud storage acceleration layer (CSAL), and discuss the importance of consulting with customers to understand their specific AI workload requirements for optimal storage solutions.


Real-World Use of Private AI at VMware by Broadcom

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware

Video Links:

Personnel: Ramesh Radhakrishnan

This session offers a deep dive into VMware’s internal AI services used by VMware employees, including our services for coding-assist, document search using retrieval augmented generation (RAG), and our internal LLM API.

In this presentation, Ramesh Radhakrishnan of VMware discusses the company’s internal use of AI, particularly large language models (LLMs), for various applications. He leads the AI Platform and Solutions team and shares insights into VMware’s AI services, which were developed even before the advent of LLMs.

Large language models (LLMs) are versatile tools that can address a wide range of use cases with minimal modification. VMware has developed internal AI services for coding assistance, document search using Retrieval-Augmented Generation (RAG), and an internal LLM API. Content generation, question answering, code generation, and the use of AI agents are some of the key use cases for LLMs at VMware.

VMware has implemented a Cloud Smart approach, leveraging open-source LLMs trained on the public cloud to avoid the environmental impact of running their own GPUs. The company has worked with Stanford to create a domain-adapted model for VMware documentation search, which significantly improved search performance compared to traditional keyword search.

The VMware Automated Question Answering System (Wacqua) is an information retrieval system based on language models, which allows users to ask questions and get relevant answers without browsing through documents. The system’s implementation involves complex processes, including content gathering, preprocessing, indexing, caching, and updating documentation.

VMware has scaled up its GPU capacity to accommodate the increased demand from software developers empowered by AI tools. The AI platform at VMware provides a GPU pool resource, developer environments, coding use cases, and LLM APIs, all running on a common platform.

Data management is highlighted as a potential bottleneck for AI use cases, and standardizing on a platform is critical for offering services to end-users efficiently. Collaboration between AI teams and infrastructure teams is essential to ensure that both the models and the infrastructure can support the workload effectively.

Ramesh encourages organizations to start small with open-source models, identify key performance indicators (KPIs), and focus on solving business problems with AI. The session concludes with Ramesh emphasizing the importance of a strategic approach to implementing AI and the benefits of leveraging a shared platform for AI services.


Running Best-of-Breed AI Services on a Common Platform with VMware Cloud Foundation

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware

Video Links:

Personnel: Shawn Kelly

VMware Cloud Foundation streamlines AI production, providing enterprises with unmatched flexibility, control, and choice. Through Private AI, businesses can seamlessly deploy best-in-class AI services across various environments, while ensuring privacy and security. Join us to explore VMware’s collaborations with IBM watsonx, Intel AI, and AnyScale Ray, delivering cutting-edge AI capabilities on top of VMware’s Private Cloud platform.

Sean Kelly, Principal Engineer at Broadcom, discusses the benefits of using VMware Cloud Foundation (VCF) to run AI services. He explains that VCF solves many of the infrastructure challenges associated with AI projects, such as agility, workload migration, avoiding idle compute resources, scaling, lifecycle management, privacy, and security.

He addresses concerns about VCF not being the only platform for AI, noting that while other products like vSphere and vSAN are still in use, VCF is the strategic direction for VMware, particularly for their strategic customers. He clarifies that VCF includes underlying vSphere technology and that using VCF inherently involves using vSAN.

Kelly also talks about performance, mentioning that VMware’s hypervisor scheduler has been optimized over two decades to match bare-metal speeds, with only a plus or minus 2% performance difference in AI workloads. He confirms that VMware supports NVIDIA’s NVLink, which allows multiple GPUs to connect directly to each other.

The talk then moves on to VMware’s Private AI, which is an architectural approach that balances business AI benefits with privacy and compliance needs. Kelly highlights collaborations with AnyScale Ray, an open-source framework for scaling Python AI workloads, and IBM Watson X, which brings IBM Watson capabilities on-premises for customers with specific data compliance requirements.

He covers the integration of Ray with vSphere, demonstrating how it can quickly spin up worker nodes (RayLits) for AI tasks. He also addresses licensing concerns, noting that while NVIDIA handles GPU licensing, Ray is an open-source plugin without additional licensing costs.

For IBM WatsonX, Kelly discusses the stack setup with VMware Cloud Foundation at the base, followed by OpenShift and WatsonX on top. He emphasizes security features, such as secure boot, identity and access management, and VM encryption. He also mentions the choice of proprietary, open-source, and third-party AI models available on the platform. Kelly briefly touches on use cases enabled by WatsonX, such as code generation, contact center resolution, IT operations automation, and advanced information retrieval. He concludes by directing listeners to a blog for more information on Private AI with IBM WatsonX.


VMware Private AI Foundation with NVIDIA Demo

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware

Video Links:

Personnel: Justin Murray

This VMware Private AI Foundation with NVIDIA demo works with the data scientist user as well as the VMware system administrator/devops person. A data scientist can reproduce their LLM environment rapidly on VMware Cloud Foundation (VCF). This is done through a self-service portal or through assistance from a VCF system administrator. We show that a VCF administrator can serve the data scientist with a set of VMs, created in a newly automated way from deep learning VM images, with all the deep learning tooling and platforms already active in them. We show a small LLM example application running on this setup to give the data scientist a head-start on their work.

In this presentation, Justin Murray, product marketing engineer from Broadcom, demonstrates VMware Private AI Foundation with NVIDIA technology. The demo is structured to show how the end user, particularly a data scientist, can benefit from the solution. Key points from the transcript include:

  1. Application Demonstration: Justin begins by showcasing a chatbot application powered by a large language model (LLM) which utilizes retrieval-augmented generation (RAG). The bot is demonstrated to answer questions more accurately after updating its knowledge base.
  2. Deep Learning VMs: The demo highlights the use of virtual machines (VMs) that come pre-loaded with deep learning toolkits, which are essential for data scientists. These VMs can be rapidly provisioned using ARIA automation, and they can be customized with specific tool bundles as per the data scientist’s requirements.
  3. Containers and VMs: Justin explains the solution uses a combination of containers and VMs, with NVIDIA components shipped as containers that can be run using Docker or integrated into Kubernetes clusters.
  4. Private AI Foundation Availability: The Private AI Foundation with NVIDIA is mentioned to be an upcoming product that will be available for purchase in the current quarter, with some customers already having early access to the beta version.
  5. Automation and User Interface: The ARIA automation tool is showcased, which allows data scientists or DevOps personnel to request resources through a simple interface, choosing the amount of GPU power they require.
  6. GPU Visibility: The demo concludes with a look at GPU visibility, showing how vCenter can be used to monitor GPU consumption at both the host and VM level, which is important for managing resources in LLM operations.
  7. Customer Use and Power Consumption: Justin notes that there’s interest in both dedicated VMs for data scientists and shared infrastructure like Kubernetes. He also acknowledges the importance of power consumption as a concern for those using GPUs.

VMware Private AI Foundation with NVIDIA aims to simplify the deployment and management of AI applications and infrastructure for data scientists, offering a combination of automation, privacy, and performance monitoring tools.


VMware Private AI Foundation with NVIDIA Overview

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware

Video Links:

Personnel: Justin Murray

VMware Private AI Foundation with NVIDIA is a fully integrated solution featuring generative AI software and accelerated computing from NVIDIA, built on VMware Cloud Foundation and optimized for AI. The solution includes integrated AI tools to empower enterprises to customize models and run generative AI applications adjacent to their data while addressing corporate data privacy, security and control concerns. The platform will feature NVIDIA NeMo, which combines customization frameworks, guardrail toolkits, data curation tools and pretrained models to offer enterprises an easy, cost-effective and fast way to adopt generative AI.

In this presentation, Justin Murray, Product Marketing Engineer at VMware by Broadcom, discusses the VMware Private AI Foundation with NVIDIA, which is a solution designed to run generative AI applications with a focus on privacy, security, and control for enterprises. The platform is built on VMware Cloud Foundation and optimized for AI, featuring NVIDIA NeMo for customization and generative AI model deployment.

Murray explains the architecture of the solution, which includes a self-service catalog for data scientists to easily access their tools, GPU monitoring in the vCenter interface, and deep learning VMs pre-packaged with data science toolkits. He emphasizes the importance of vector databases, particularly PG vector, which is central to retrieval-augmented generation (RAG). RAG combines database technology with large language models to provide up-to-date and private responses to queries.

He also touches on the GPU operator and Triton inference server from NVIDIA for managing GPU drivers and scalable model inference. Murray notes that the solution is designed to be user-friendly for data scientists and administrators serving them, with a focus on simplifying the deployment and management of AI applications.

Murray mentions that the solution is compatible with various vector databases and is capable of being used with private data, making it suitable for industries like banking. He also indicates that there is substantial demand for this architecture across different industries, with over 60 customers globally interested in it before the product’s general availability.

The presentation aims to provide technical details about the VMware Private AI Foundation with NVIDIA, including its components, use cases, and the benefits it offers to enterprises looking to adopt generative AI while maintaining control over their data.


Introduction to VMware Private AI

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware

Video Links:

Personnel: Chris Wolf

VMware Private AI brings compute capacity and AI models to where enterprise data is created, processed, and consumed, whether that is in a public cloud, enterprise data center, or at the edge. VMware Private AI consists of both product offerings (VMware Private AI Foundation with NVIDIA) and a VMware Private AI Reference Architecture for Open Source to help customers achieve their desired AI outcomes by supporting best-in-class open source software (OSS) technologies today and in the future. VMware’s interconnected and open ecosystem supports flexibility and choice in customers’ AI strategies.

Chris Wolf, the Global Head of AI and Advanced Services at VMware by Broadcom, discusses VMware’s Private AI initiative, which was announced in August 2023. The goal of Private AI is to democratize general AI and ignite business innovation across all enterprises while addressing privacy and control concerns. VMware focuses on providing AI infrastructure, optimizations, security, data privacy, and data serving, leaving higher-level AI services to AI ISVs (Independent Software Vendors). This non-competitive approach makes it easier for VMware to partner with ISVs since VMware does not directly compete with them in offering top-level AI services, unlike public clouds.

Wolf shares an example of VMware’s code generation use case with a 92% acceptance rate by software engineers using an internal solution based on an open-source model for the ESXi kernel. He discusses the importance of governance and compliance, particularly in AI-generated code, and mentions VMware’s AI council and governance practices.

He highlights use cases such as call center resolution and advanced information retrieval across various industries. VMware’s solution emphasizes flexibility, choice of hardware and software, simplifying deployment, and mitigating risks. Wolf also notes VMware’s capability to stand up an AI cluster with preloaded models in about three seconds, which is not possible in public clouds or on bare metal.

The discussion covers the advantages of VMware Private AI in managing multiple AI projects within large enterprises, including efficient resource utilization and integration with existing operational tools, leading to lower total cost of ownership.

Wolf touches on the trend of AI adoption at the edge, the importance of security features within VMware’s stack, and the curated ecosystem of partners that VMware is building. He points out that VMware’s Private AI solution can leverage existing IT investments by bringing AI models to where the data already resides, such as on VMware Cloud Foundation (VCF).

Finally, Wolf previews upcoming Tech Field Day sessions that will go into detail about VMware’s collaborations with NVIDIA, Intel, and IBM, showcasing solutions like Private AI Foundation with NVIDIA and WatsonX SaaS service deployment on-premises. He encourages attendees to participate in these sessions to learn more about VMware’s AI offerings.


VMware by Broadcom Private AI Primer – An Emerging Category

Event: AI Field Day 4

Appearance: VMware by Broadcom Presents at AI Field Day 4

Company: VMware

Video Links:

Personnel: Chris Wolf

Private AI as an architectural approach that aims to balance the business gains from AI with the practical privacy and compliance needs of the organization. What is most important is that privacy and control requirements are satisfied, regardless of where AI models and data are deployed. This session will walk through the core tenets of Private AI and the common use cases that it addresses.

Chris Wolf, Global Head of AI and Advanced Services at VMware by Broadcom, discusses the evolution of application innovation, highlighting the shift from PC applications to business productivity tools, web applications, and mobile apps, and now the rise of AI applications. He emphasizes that AI is not new, with its use in specialized models for fraud detection being a longstanding practice. Chris notes that financial services with existing AI expertise have quickly adapted to generative AI with large language models, and he cites a range of industry use cases, such as VMware’s use of SaaS-based AI services for marketing content creation.

He mentions McKinsey’s projection of the annual potential economic value for generative AI being around $4.4 trillion, indicating a significant opportunity for industry transformation. Chris discusses the early adoption of AI in various regions, particularly in Japan, where the government invests in AI to compensate for a shrinking population and maintain global competitiveness.

The conversation shifts to privacy concerns in AI, with Chris explaining the concept of Private AI, which is about maintaining business gains from AI while ensuring privacy and compliance needs. He discusses the importance of data sovereignty, control, and not wanting to inadvertently benefit competitors with shared AI services. Chris also highlights the need for access control to prevent unauthorized access to sensitive information through AI models.

He then outlines the importance of choice, cost, performance, and compliance in the AI ecosystem, asserting that organizations should not be locked into a single vertical AI stack. Chris also describes the potential for fine-tuning language models with domain-specific data and the use of technologies like retrieval augmented generation (RAG) for simplifying AI use cases.

Finally, Chris emphasizes the need for adaptability in AI solutions and mentions VMware’s focus on adding value to the ecosystem through partnerships. He briefly touches on technical implementation, including leveraging virtualization support for GPU resources and partnering with companies like IBM Watson for model serving and management. He concludes by providing resources for further information on VMware’s AI initiatives.


Dell Technologies APEX Cloud Platform Cluster Expansion

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

Michael Wells, a Tech Marketing Engineer at Dell Technologies, presents a demonstration on scalability and cluster expansion using the APEX Cloud Platform, specifically focusing on adding worker nodes to an OpenShift cluster. The process involves searching for new nodes, running compatibility checks to ensure they match the existing cluster, and then configuring settings such as the node name, IP address, TPM passphrase, location information, NIC settings, and network settings. The system pre-populates certain values like VLAN IDs based on the existing setup and then validates the configuration before adding the node to the cluster.

He highlights how the APEX Cloud Platform integrates infrastructure management directly into the cloud OS experience, offering a unique solution for different cloud operating models. He also discusses the advantages of installing Red Hat OpenShift on bare metal, which includes better performance due to the absence of a hypervisor, reduced licensing requirements, and a smaller attack surface. Additionally, he explains the benefits of lifecycle management of both OpenShift and hardware together, simplifying the deployment process and providing developers with more direct access to hardware resources.

Wells also touches on the topic of OpenShift virtualization, explaining that running virtual machines inside of OpenShift as pods allows for pod-to-pod networking and avoids the need for routing traffic through an ingress controller. This setup can be more efficient for workloads that need to communicate with other OpenShift services.


Dell Technologies APEX Cloud Platform Lifecycle Management

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

Michael Wells, a Tech Marketing Engineer for the APEX Cloud Platform at Dell Technologies, demonstrates the lifecycle management process for updating Red Hat OpenShift and Azure clusters on the platform. The process involves:

  1. Configuring support portal access with a username and password to check for online updates from the Dell support site.
  2. Using a local update process when no online updates are available by uploading and decompressing an update bundle.
  3. Running pre-checks to ensure the cluster is healthy and in a suitable state for updating.
  4. Reviewing the update details, including versions of software to be updated.
  5. Executing the update, which includes hardware (BIOS, firmware, drivers), OpenShift software, core OS, CSI, and Apex Cloud Platform Foundation software, all in a single workflow to optimize efficiency and minimize reboots.
  6. Applying updates to Azure clusters in a similar fashion, including compliance checks and cluster health pre-checks.
  7. Temporarily disabling lockdown mode on servers during the update process and re-enabling it afterward.
  8. Performing a rolling update across nodes, with each node being updated one at a time in a non-disruptive manner.

The update process is designed to be efficient, reducing downtime by controlling the sequence of updates and using parallel staging where possible. The system provides detailed progress information and time estimates throughout the process.


Dell Technologies APEX Cloud Platform Management Experience

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

In this presentation, Michael Wells, Tech Marketing Engineer at Dell Technologies, discusses the management experience of the APEX Cloud Platform. He highlights the platform’s ability to provide a consistent hybrid management experience across different environments without requiring users to leave their usual management interfaces.

Wells demonstrates the integration of Dell APEX Cloud Platform within the OpenShift web console, showing how users can view node information, cluster status, CPU and memory usage, and manage hardware components directly from the console. He mentions that the platform is set to support hosted control planes (formerly HyperShift) and discusses the ability to expand or remove worker nodes within the cluster.

He also covers the platform’s update mechanism, security features (including certificate management), and support capabilities, such as dial-home alerts and integration with Cloud IQ for hardware-related issues. Additionally, Wells touches on how hardware alerts are integrated into OpenShift alerting, allowing users to leverage existing monitoring and notification setups.

Wells then shifts to discussing the Azure side of things, showing similar capabilities within the Windows Admin Center for Azure Stack HCI, including physical views of nodes, detailed component information, and compliance checks.

Finally, he emphasizes the consistency of the Dell APEX Cloud Platform across different cloud operating systems and how it integrates infrastructure management with cluster management tools used by administrators. He notes the upcoming VMware integration and the ability to lock infrastructure settings for security.


Dell Technologies APEX Cloud Platform Hardware Configurations

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

Michael Wells, Tech Marketing Engineer for Dell Technologies, discusses the hardware configurations for the APEX Cloud Platform.

  • The APEX Cloud Platform uses specialized configurations of PowerEdge servers called MC nodes, specifically the MC660 (1U 10 drive) and MC760 (2U 24 drive).
  • The nodes support Intel scalable fourth-generation processors with 2 to 4 terabytes of memory per node, which is currently limited by supply chain issues rather than technical constraints.
  • There are options for NVMe and SSD storage configurations, as well as Nvidia GPU support, with the 1U supporting single-width cards and the 2U supporting both single-width and double-width cards.
  • Michael mentions a white paper released in November of the previous year about implementing OpenShift AI and a generative AI solution on the APEX Cloud Platform, using Lama 2 and RAG to build a chatbot trained against Dell’s technical documentation.

Michael explains that the MC nodes have a subset of components that are continuously validated to ensure support and control over the configurations. This approach excludes the possibility of using existing servers customers may already have, as the solution requires common building blocks for simplicity and manageability.

There’s also a mention of the possibility of connecting to PowerFlex storage, which supports various operating systems and allows for the connection of bare metal, hypervisors, and other systems. This could be a way for customers to use existing hardware and gradually transition to the APEX Cloud Platform.


Dell Technologies APEX Cloud Platform Cluster Deployment

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Michael Wells

Michael Wells, a Tech Marketing Engineer at Dell Technologies, discusses the APEX Cloud Platform and its deployment process for Microsoft Azure and Red Hat OpenShift. He explains that the deployment for both platforms involves a similar set of steps, such as node discovery, configuration settings, and network information. APEX Cloud Platform for Azure is built on Microsoft’s HCI OS as part of the new Premier partner tier, which allows for deeper integration and collaboration with Microsoft.

The deployment results in a fully configured cluster, an OpenShift cluster on one side and an Azure Stack HCI cluster on the other. The OpenShift cluster includes Red Hat Core OS, Kubernetes, and Dell SDS storage, while the Azure Stack HCI cluster uses Storage Spaces Direct, Hyper-V, and the Microsoft SDN stack. Both deployments include the APEX Cloud Platform Foundation software, which integrates with the cloud OS management experience.

Michael also discusses licensing, entitlements for advanced cluster management and security included with the OpenShift Platform Plus subscription, and the unique capabilities of the Cloud Platform Foundation software. He emphasizes that the APEX Cloud Platform family is designed to offer the same types of results and efficiencies across different cloud OSes.

Lastly, Michael hints at upcoming features, such as the addition of Dell SDS support for the APEX Cloud Platform for Microsoft Azure, which will allow for greater scalability and storage independence.


Dell Technologies APEX Block Storage for Public Cloud Multi-AZ Storage Resilience

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Kiruthika Gopal

Krithika Gopal, a Product Manager at Dell Technologies, discusses the deployment and resilience of APEX Block Storage, particularly in a scenario where one of the three availability zones (AZs) is taken offline. The demonstration uses SQL Server 2022 to illustrate the process, but the principles apply to any application.

The APEX Block Storage cluster is set up with six storage instances, two in each AZ. Krithika emphasizes the importance of verifying the system’s health before simulating an outage by shutting down all instances in one AZ. Using PowerFlex Manager, they browse block volumes and check the health statistics and metadata manager to ensure everything is connected and functioning correctly.

The simulation involves manually stopping two instances in one AZ through the AWS portal and observing the impact on the cluster. Despite a temporary dip in transactions per minute (TPM), the cluster remains online, and the application continues to function. The cluster demonstrates self-healing capabilities, as the rebuild process completes in under 30 seconds, restoring the cluster to a healthy state with four nodes.

Next, Krithika restarts the two stopped instances to observe how the cluster rebalances the workload. With all six instances running again, the cluster quickly returns to normal performance after a brief dip during the rebuild. This test confirms the resilience and self-healing nature of the APEX Block Storage cluster in the event of an AZ outage.


Dell Technologies APEX Navigator for Multicloud Storage APEX Block Storage for AWS Deployment

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Chad Gray

In this presentation, Chad Gray from Dell Technologies is demonstrating how to deploy the APEX Block Storage for AWS using APEX Navigator. He explains the process in four easy steps:

  1. Select Product, Cloud, and Region: Chad selects APEX Block Storage for AWS with Navigator, version 4.5.1, and chooses a region available in the US.
  2. Connect Cloud Account: He selects a previously set up cloud account that was added to APEX Navigator.
  3. Deployment Configuration: Here, Chad provides a deployment name, selects a performance tier (balanced or performance optimized), sets the minimum usable capacity and IOPS, chooses the availability level (single AZ or multi AZ), and decides whether to deploy to an existing VPC or create a new one. He opts to create a new VPC and inputs IP ranges. He also names a key pair for SSH access into the storage instances, which will be stored in AWS Secret Manager.
  4. Review Configuration and Deployment: Chad mentions that there’s a free 90-day evaluation license, and he reviews the AWS resources that will be deployed, noting that they will incur costs.

The deployment can be monitored through APEX Navigator, and the process takes around two hours to complete. The demonstration shows how APEX Navigator simplifies the setup of APEX Block Storage in AWS by automating the deployment, which would be more complex if done manually.


Dell Technologies APEX Navigator for Multicloud Storage AWS Account Connection Demo

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Chad Gray

Chad Gray of Dell Technologies presented APEX Navigator, a product designed to simplify the management of multicloud storage, particularly block storage for AWS. He discussed the product’s five focus areas: security, deployment, management, monitoring, and mobility. Gray emphasized the importance of secure access to customer AWS accounts and explained how APEX Navigator uses AWS roles and policies to access these accounts without the need for exchanging access keys.

During the presentation, Gray demonstrated how to connect an AWS account to APEX Navigator using a custom trust policy and permission policy generated by the platform. He also discussed federated login capabilities with identity providers such as Active Directory, allowing for single sign-on across Dell’s APEX and Cloud IQ services.

Gray mentioned that all the steps he demonstrated in the UI can be automated through APIs, and that Dell recently released a Terraform provider for APEX. He highlighted the availability of infrastructure as code examples for teams using tools like Terraform.

Lastly, Gray showed how to audit access and account management activities within APEX Navigator and within the AWS account using CloudTrail. He pointed out features like tagging sessions with job names and IDs, and passing the source identity of the user for better traceability of actions taken within the customer’s AWS environment.


Dell Technologies APEX Block Storage For Public Cloud Deep Dive

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Kiruthika Gopal

Kiruthika Gopal, a product manager, presents the key features and differentiators of APEX Block Storage. The main points of differentiation include extreme performance, with millions of IOPS for a single volume; flexibility and scalability, with the ability to scale up to 512 storage nodes and independently scale compute and storage; multi-AZ durability, offering the option to deploy storage clusters within a single or multiple availability zones without requiring additional capacity; and hybrid cloud mobility, allowing seamless data movement between on-prem and cloud across regions.

APEX Block Storage offers up to one petabyte per volume, multi-AZ deployment without extra capacity penalties, thin provisioning, snapshots and clones without additional fees, and asynchronous replication with a feature called snap mobility.

The product also provides multi-AZ durability by spreading storage nodes across availability zones, offering protection against entire rack failures. This feature has been tested in on-prem environments for over a decade and is now extended to the public cloud.

Questions from the audience cover topics such as the upgrade process for new instance types, the potential for split-brain scenarios in multi-AZ deployments, the testing methodology used to generate performance numbers, and the orchestration of testing and recovery processes.

Kiruthika also discusses cost savings through thin provisioning and snapshot savings, and the scalability of APEX Block Storage, with deployments ranging from 10 terabytes to multiple petabytes. The pricing model is subscription-based, factoring in capacity and the number of storage nodes.

Finally, Kiruthika touches on the ease of deployment using Dell APEX Navigator, which can set up the necessary AWS infrastructure and deploy the software in four simple steps based on inputted IOPS and capacity requirements.


Dell Technologies APEX Block Storage For Public Cloud Overview

Event: Cloud Field Day 19

Appearance: Dell Technologies Presents at Cloud Field Day 19

Company: Dell Technologies

Video Links:

Personnel: Kiruthika Gopal

Krithika Gopal, a product manager at Dell Technologies, is discussing the features and benefits of Dell APEX Block Storage, which is part of Dell’s universal storage layer initiative aimed at bringing Dell storage to the public cloud. APEX Block Storage is based on PowerFlex IP, a software-defined storage solution that has been around for over a decade, known for its performance, scalability, and flexibility.

APEX Block Storage is available on both AWS and Azure through their respective marketplaces or directly from Dell. It aims to lower the total cost of ownership (TCO) for customers and allows for the deployment of high-performance, mission-critical applications in the public cloud. It also supports seamless data mobility across different environments and offers a unique Multi-AZ Durability feature to increase resiliency.

Krithika highlights the product’s ability to deliver 100x better performance compared to existing public cloud storage, based on internal testing. She explains that APEX Block Storage can scale up to hundreds of nodes, achieving millions of IOPS. The product is designed to not compete with public cloud providers but to enhance the customer experience by addressing needs not currently met in the public cloud.

The discussion also covers the technical aspects of deploying APEX Block Storage, including the use of EC2 instances on AWS and virtual machines on Azure, as well as the integration with Dell Data Domain Virtual Edition for data backup. Additionally, Krithika addresses questions about the product’s performance, cost-effectiveness, software client requirements, and the six-nines availability claim.


Policy Assistant and Experience Insights with Cisco Secure Access

Event: Tech Field Day Extra at Cisco Live EMEA 2024

Appearance: Cisco Security Presents at Tech Field Day Extra at Cisco Live EMEA

Company: Cisco

Video Links:

Personnel: Fay Lee, Justin Murphy

This presentation by Fay Lee and Justin Murphy, focuses on Cisco Secure Access and the integration of Generative AI (Gen AI) and ThousandEyes technology for enhancing security and user experience.

Justin Murphy introduces Cisco Secure Access, which is a cloud-provided security solution offering secure access to apps and the internet. It includes features such as proxy capabilities, data loss prevention (DLP), malware inspection, firewall, and intrusion prevention systems (IPS). He focuses on the data loss prevention aspect and the use of AI applications like ChatGPT by employees, emphasizing the need to monitor their use and prevent data leaks. Cisco has added an AI assistant to simplify policy creation, which can automate repeatable processes and reduce deployment time by about 70%. The assistant can create rules for different users and groups, and monitor and control access to AI applications, preventing the upload of sensitive data. Justin demonstrates how the AI assistant works through a live demo and a video, showing how it can block an employee named Jeff from uploading code to ChatGPT.

Fay Lee then takes over to discuss Experience Insights, a new feature in Cisco Secure Access powered by ThousandEyes technology. It aims to reduce the mean-time-to-response for user experience issues and is integrated into the Secure Access dashboard. Experience Insights allows administrators to monitor endpoint performance, network performance, and the performance of the connection from Cisco’s cloud infrastructure to the applications users are accessing. Fay provides a live demo of the Experience Insights feature, showing a map of connected users, details of their connection, and the performance of commonly used SaaS applications. She explains how administrators can drill down into specific user data to troubleshoot issues and how integration with ThousandEyes provides additional insights.

The presentation ends with a Q&A session where the audience asks about the capabilities and integration of the AI assistant and Experience Insights, the limitations of the ThousandEyes integration, and the possibility of integration with other Cisco products like Meraki.


Cisco Event-Driven Automation with Shangxin Du

Event: Tech Field Day Extra at Cisco Live EMEA 2024

Appearance: Cisco Cloud Networking Presents at Tech Field Day Extra at Cisco Live EMEA

Company: Cisco

Video Links:

Personnel: Shangxin Du

Shangxin Du, a technical marketing engineer from Cisco’s data center switching team, discusses Event-Driven Automation (EDA) in network operations. EDA is a method that automates network configuration changes in response to specific events, aiming to streamline repetitive tasks and mitigate risks during network incidents.

Initially, Shangxin outlines how customers currently manage network configuration, using tools like Ansible, Terraform, Python, or SSH to automate tasks individually or through controllers like Cisco’s ACI for more centralized management. He also touches on the concept of Infrastructure as Code (IaC) and CI/CD pipelines for more integrated change management.

Next, he discusses network observability, emphasizing the importance of monitoring the network for operational data, which is vital for understanding the network’s real-time status. He explains how Cisco’s Nexus OS supports streaming telemetry, and how ACI uses a centralized controller (APIC) to manage configurations and operational data.

Shangxin then introduces the concept of Event-Driven Automation, which combines configuration automation with monitoring to automatically respond to network events. This can help in automating low-risk repetitive tasks, remediating incidents, and enriching support tickets with relevant data for quicker resolution.

He provides a demonstration of EDA using Ansible Rulebooks, which define sources, rules, and actions based on network events. The demo includes two use cases:

  1. Auto-segmentation in ACI, where endpoints are automatically moved to the correct Endpoint Group (EPG) based on MAC address mapping.
  2. Auto-remediation in Nexus OS, where a leaf switch is removed from the forwarding path if multiple uplinks go down, to prevent it from affecting network traffic.

Shangxin concludes that EDA offers limitless possibilities, allowing any source of events to trigger any automation response, depending on the rules defined. He also answers a question about the possibility of implementing a low-code solution for EDA in the Nexus world, similar to what’s available in other Cisco solutions like DNA Center. He suggests that while it’s a good idea, the current approach is to use existing tools and infrastructure for automation due to the diversity of customer preferences and practices.