The New Compute Ratios: Detailed calculation of the CPU, GPU, and LPU chip counts and ratios within the Vera-Rubin superpod.
Disaggregated Decoding: Explanation of how the Groq LPX chip is specialized for the Feed-Forward Network (FFN) operation of the decode phase, separating it from the attention calculation on the Rubin GPUs.
Inference Speed Frontier: Analysis of how the addition of LPX racks enables new low-latency benchmarks, advertised at over 1,000+ tokens/sec/user.
AI Infrastructure Deployment: Guidance on selecting between Vera-Rubin NVL72 (high throughput) and adding LPX (low latency/interactivity) based on workload.
Speculative Decoding (SpecDec): An unaddressed analysis of how LPX's massive SRAM bandwidth is perfectly suited to accelerate SpecDec by running smaller "draft" models.
The VLIW Debate: Examination of whether Groq's Very Large Instruction Word (VLIW) architecture is a core differentiator, concluding that its main benefit is guaranteeing performance for customer Service Level Agreements (SLAs).