october 2005

apeNEXT:

Excellence in custom HPCs

apenext box

One of the most powerful HPC platforms in the world, apeNEXT is a generation of custom supercomputers capable of delivering application-specific solutions for the most demanding problems.

Perfectly suited for lattice and mesh computation, apeNEXT enables the direct, switchless interconnection of thousands of programmable computational units. Its unique features allow for a limitless growth of computational power without bottleneck risks.

ApeNEXT is the result of a long-standing collaboration between leading European laboratories (INFN-IT, DESY-DE, University of Paris-Sud-FR) and Exadron, the HPC division of the Eurotech group.

From its inception, the APE collaboration aimed at developing and delivering a very high-performance, special-purpose architecture for Quantum Chromo Dynamics, traditionally one of the hardest and most intensive computational challenges.

Since then, the Ape supercomputers have evolved into extremely dense, cost-effective platforms for computational problem solving in Physics and Engineering.

apeNEXT architecture

apenext architecture
ApeNEXT is highly modular: its basic element is the 16-way motherboard, where memory, computational units and communication links are tightly coupled.

Up to 16 motherboards can be fitted into one crate and up to two crates can be installed in one rack.
The full rack configuration provides 512 custom specific computational units, each directly connected to the neighbouring ones in a user-configurable topology.
Any number of racks can be easily added by simply connecting them together.

apeNEXT architecture 2

apeNext specifications

  • Array of independent processing nodes;
  • Each node is a complete single-chip processor with its private memory banks (256MB to 1GB);
  • Double precision floating-point performance of approximately 1.6 GFlops;
  • VLIW based control structure;
  • Software-controlled program cache;
  • Prefetch queues for local and remote memory access;
  • Six bidirectional links for low latency, high bandwidth, data communication;
  • 3D torus topology;
  • Additional "7th" link for fast I/O;
  • Low latency global tree for synchronization;
  • Serial network for slow control, bootstrap and debugging
  • Hosted by a cluster of Linux PCs;
  • Programmable in TAO and in C.

Proven solutions for:

  • Quantum Chromo Dynamics
  • High Energy and Nuclear Physics

downloadapenext .pdf