Home / BI Tech / HighPoint’s Rocket 7638D Boosts AI with GPU-Storage Link

HighPoint’s Rocket 7638D Boosts AI with GPU-Storage Link

Sep 30, 2025 Interview

Tray DorbainBusiness Strategy Consultant

I’m thrilled to sit down with Chloe Maraina, a Business Intelligence expert with a deep passion for big data and a visionary approach to data management and integration. With her extensive background in data science, Chloe brings a unique perspective to the intersection of cutting-edge hardware and AI workloads. Today, we’re diving into the innovative world of GPU and storage interconnection, focusing on a groundbreaking solution that’s accelerating AI and machine learning tasks. Our conversation explores how direct connections between GPUs and storage are transforming data processing, the technology behind bypassing traditional bottlenecks, and the impact on performance for large-scale datasets. Let’s get started.

Can you start by explaining what this new PCIe 5.0 switch card is and why it stands out in the realm of AI and machine learning hardware?

Absolutely. This PCIe 5.0 switch card, like the Rocket 7638D, is a game-changer in how we handle data flow for AI and machine learning workloads. It’s designed to create a direct link between Nvidia AI GPUs and NVMe storage, cutting out the middleman—namely, the CPU and system memory. What makes it unique is its ability to leverage high-speed PCIe 5.0 technology to facilitate this direct connection, which drastically reduces latency and boosts efficiency. It’s tailored for environments where massive datasets need to be processed quickly, something that’s critical for AI training and inference tasks.

How does this direct connection between GPUs and NVMe storage actually function with a card like this?

The magic lies in a feature called GPUDirect, which Nvidia introduced with their Ampere generation of GPUs. With this card, data can move straight from NVMe storage to the GPU without being routed through the CPU. It uses PCIe 5.0 lanes and specific hardware like a compatible switch to enable peer-to-peer direct memory access, or P2P DMA. This means the GPU can pull data directly from storage as if they’re speaking the same language, streamlining the entire process and avoiding unnecessary detours through other system components.

What are the biggest advantages of bypassing the CPU and system memory in data processing for AI workloads?

Bypassing the CPU and memory offers a couple of huge benefits. First, it slashes latency—data gets to the GPU much faster since it’s not waiting for the CPU to process or buffer it. This is critical for time-sensitive tasks like real-time inference. Second, it offloads a ton of work from the CPU and system memory, freeing them up for other tasks. This not only improves overall system efficiency but also prevents bottlenecks, especially when you’re dealing with enormous datasets that would otherwise bog down traditional setups.

Could you dive deeper into Nvidia’s GPUDirect technology and how a card like this supports its capabilities?

Sure. GPUDirect is a technology Nvidia rolled out to enable direct data transfers between GPUs and other devices like storage, starting with their Ampere architecture. It’s a big deal for AI workloads because it eliminates the overhead of CPU involvement, which can be a major slowdown with large data volumes. A card like the Rocket 7638D supports this by using PCIe Gen 5 for ultra-high bandwidth and incorporating hardware that allows P2P DMA. This ensures the GPU and storage can communicate directly, which is essential for maintaining speed and predictability in data-intensive applications.

The card uses a specific Broadcom switch. Can you explain why this component is critical to its functionality?

The Broadcom PEX 89048 switch is a key piece of the puzzle because not all PCIe Gen 5 switches support the P2P DMA capability required for GPUDirect. This particular switch is designed to handle that direct communication between devices, making it possible to build systems that fully utilize GPUDirect workflows. For system integrators, this means they can confidently assemble high-performance setups knowing the hardware foundation supports these advanced features without compatibility hiccups.

What kind of storage performance and capacity can users expect from an adapter like this?

Users can expect some seriously impressive numbers. The card can support up to 16 NVMe drives, which translates to a potential storage capacity of up to 2 petabytes. That’s massive, especially for AI training datasets. On top of that, it delivers bandwidth up to 64 GB/s, which in real-world terms means lightning-fast data transfers. For applications like model training or real-time data processing, this kind of performance ensures that GPUs aren’t sitting idle waiting for data—they’re getting fed at full speed.

How does this card manage to support multiple GPU nodes without compromising on storage performance?

It’s designed with dedicated bandwidth allocation in mind. The card has 48 PCIe 5.0 lanes, with a portion reserved for NVMe storage and the rest for connectivity to multiple GPUs. This setup ensures that each GPU gets its own slice of high-speed access without stepping on the toes of storage performance. Scenarios like distributed AI training or multi-GPU rendering benefit immensely from this, as it prevents any single component from becoming a bottleneck, even under heavy load.

The adapter is described as especially useful for large-scale training datasets. Can you elaborate on how it enhances machine learning workflows?

Absolutely. For machine learning, especially with large-scale training datasets, the bottleneck often comes down to how quickly you can stream data to the GPU for processing. This adapter allows datasets to flow directly from NVMe storage to GPUs without delays, cutting down total training time significantly. Beyond training, it also helps with tasks like real-time data ingestion, where immediate GPU responses are needed, and high-speed data preparation or augmentation, ensuring the entire pipeline—from raw data to model output—runs smoothly and efficiently.

What is your forecast for the future of direct GPU-storage interconnections in AI and machine learning?

I think we’re just scratching the surface of what’s possible with direct GPU-storage interconnections. As AI models grow larger and datasets become even more massive, the demand for solutions that eliminate latency and maximize throughput will only increase. We’ll likely see broader adoption of technologies like GPUDirect across various industries, not just AI, but also in areas like scientific computing and real-time analytics. Hardware will continue to evolve with even higher bandwidth standards and tighter integration, making these direct connections the norm rather than the exception. It’s an exciting space to watch!

HighPoint’s Rocket 7638D Boosts AI with GPU-Storage Link

Related Publications

Subscribe to our weekly news digest.