We provide customers with various communication products at reasonable prices and high quality products and services
Training scenario: Huawei breaks global record with 698 GiB/s performance
In the 3D U-Net training test with the highest storage bandwidth requirements, Huawei OceanStor A-series storage achieved three global firsts while maintaining GPU utilization of over 90%.
The OceanStor A800 with a single 8U dual node architecture can support the training needs of 255 H100 GPUs and continuously provide a stable bandwidth of 698 GiB/s.
The OceanStor A600 with a single 2U dual node architecture can support the training requirements of 76 H100 GPUs, with a bandwidth of 108 GiB/s per U and 104 GiB/s per client.
Checkpointing scenario: OceanStor A-series storage leads second place by 6.7 times
In the Checkpointing test, the Huawei OceanStor A series showed outstanding performance when simulating a single 8-card server with 8 concurrent scenarios
In the llama3 8b scenario, the single client read and write bandwidth reached 40.2 GiB/s and 20.5 GiB/s respectively, ranking first.
In the llama3_70b scenario, the single client bandwidth reaches 68.8 GiB/s and the write bandwidth is 62.4 GiB/s, leading the second place by 6.7 times and ranking first.
Huawei OceanStor A-series storage continues to innovate, accelerating the implementation of large-scale model applications
Faced with stronger computing power demands in the future, Huawei OceanStor A-series storage achieves linear performance growth with the number of clients and storage nodes through multi-dimensional technological innovation. It can provide stable cluster bandwidth of up to 100 TB, efficiently support large-scale training data access, and achieve full process acceleration for large-scale training and promotion scenarios.
This series of storage has excellent high scalability, and its cluster supports horizontal expansion of EB level capacity to ensure the storage needs of massive data; In terms of data resilience, achieving 99.999% high reliability through architectural innovation; Innovatively constructing a new data paradigm based on PB level KV Cache global shared resource pool, while ensuring inference accuracy, the first token latency (TTFT) can be reduced by up to 90%, and the inference throughput in long sequence scenarios can be increased by more than 10 times, significantly optimizing the inference experience; At the same time, the built-in RAG knowledge base supports multimodal retrieval of scalars, vectors, tensors, graphs, etc., greatly reducing the threshold for using AI large models.
Looking ahead to the future, Huawei OceanStor A-series storage will continue to deepen its cultivation, launching leading products and solutions for HPC, AI large model training/promotion and other scenarios, and achieving a comprehensive intelligent future with customers.
Email: Lilicheng0510@163.com
Flat/Rm P, 4/F, Lladro Centre, 72 Hoi Yuen Road, Kwun Tong, Hong Kong, China