Researchers in South Korea have introduced OmniXtend, a groundbreaking technology designed to eliminate chronic memory shortages encountered during the training of large-scale artificial intelligence (AI) models. This innovative architecture utilizes standard Ethernet networks to connect physically separated server resources and accelerators into a single, expansive pool.
The issue of memory limitations arises as AI continues to evolve rapidly, leading to an exponential increase in the volume of data required for training. Developers face a significant challenge: even when maximizing graphics processing unit (GPU) power, the built-in memory quickly becomes insufficient. This constraint, referred to as the “memory wall,” severely impacts the efficiency of AI operations.
Traditionally, expanding memory capacity required the purchase and installation of new, expensive servers. OmniXtend transforms this approach by allowing memory to be shared between computers over a standard Ethernet network, creating a unified virtual space.
One of the key advantages of this new technology is its ability to overcome the limitations of conventional server connections, such as those using PCIe interfaces, which cannot link devices over long distances. By employing standard Ethernet, OmniXtend enables the connection of multiple physically distant devices.
During testing with large language models, the technology demonstrated significant benefits:
- Cost Savings: Memory expansion for AI can now occur without the need for server replacements or data center overhauls;
- Increased Speed: In tests where neural networks faced memory shortages, performance dropped significantly. However, after connecting the extension via Ethernet, productivity more than doubled;
- Operational Stability: Specialized boards and data transmission engines allowed computers to exchange information in real-time without substantial delays.
South Korean scientists have already showcased a working system at major technology summits, including RISC-V events in Europe and the United States. They are currently leading a dedicated working group under the Linux Foundation to establish OmniXtend as an open global standard for configuring AI networks.
Looking ahead, the researchers plan to transfer this technology to companies that manufacture hardware and software for data centers. Additionally, there are intentions to adapt the system for use in onboard computers for vehicles and ships, as well as to optimize collaboration among various types of processors.
The introduction of OmniXtend technology by South Korean researchers aims to resolve memory limitations in AI training by enabling memory sharing over Ethernet networks. This innovation promises cost savings, increased processing speed, and enhanced operational stability, with plans for broader implementation across various industries.
