Architecture
Each Compute or IO node is a single ASIC with associated DRAM memory chips. The ASIC integrates two 700 MHz PowerPC 440 embedded processors, each with a double-pipeline-double-precision Floating Point Unit (FPU), a cache sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs give each Blue Gene/L node a theoretical peak performance of 5.6 GFLOPS. Node CPUs are not cache coherent with one another.
By integration of all essential sub-systems on a single chip, each Compute or IO node dissipates low power (about 17 watts, including DRAMs). This allows very aggressive packaging of up to 1024 Compute nodes plus additional IO nodes in a standard 19" cabinet, within reasonable limits of electrical power supply and air cooling. The performance metrics in terms of FLOPS per Watt, FLOPS per m² of floorspace and FLOPS per unit cost allow scaling up to very high performance.
Each Blue Gene/L node is attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication, and a global interrupt network for fast barriers. The I/O nodes, which run the Linux operating system, provide communication with the world via an Ethernet network. Finally, a separate and private Ethernet network provides access to any node for configuration, booting and diagnostics.
Two other networks are also available: one is the Gbit/s Ethernet network connecting compute and I/O nodes; the other is the JTAG network for booting, control and monitoring purposes.
|