Architecture of JUMP-1

Concept

In traditional CC-NUMAs, e.g. DASH / FLASH, Alewife and NUMA-Q, the DSM is managed with a cache line size, and data in other clusters is copied into a cache attached to each processor. The consistency protocol is a simple invalidate policy, the interconnection network is a simple mesh or ring, and the directory scheme is based on one-to-one data transfer.

Although such a mechanism works efficiently in those systems with a limited number of processors, it is not suitable for a system with thousands of processors. For example, a large amount of memory for cache and directory is required. The invalidate policy based on one-to-one data transfer often causes a network congestion when many processors share the same data.

In order to address these problems, the following methods are used in JUMP-1.

Structure of JUMP-1

Overview

JUMP-1 has SuperSPARC+ processors as its element processors. Each cluster board has 4 SuperSPARC+s, 16MB SDRAM and a custom processor which manages the distributed shared memory, MBP-light.

Clusters are connected with the interconnection network, RDT (Recursive Diagonal Torus).

Structure of Cluster Boards

This picture shows a cluster board of JUMP-1. It has 4 PEs, L2 cache controllers, cluster bus chips, MBIF (maintenance bus interface), STAFF-Link interface, cluster memory and so on.

Each SuperSPARC+ is connected to the cluster bus via L2 cache controller. The width of cluster bus is 64 bits, and requests sent on cluster bus is processed by MBP-light. MBP-light manages the distributed-shared memory, STAFF-Link (Parallel I/O), MBIF, RDT Router etc.

Interconnection Network

The interconnection network of JUMP-1 is RDT (Recursive Diagonal Torus). To manage distributed-shared memory, it is important to support efficient multicast mechanism. RDT is suitable for multicasting because it includes both torus and fat tree.

And in massively parallel system which has a number of node, it is also important to make the diameter of interconnection network. RDT has a simple structure, but it can keep the diameter small.

Implementation of JUMP-1

MBP-light (MBP=Memory Based Processor)

MBP-light is an ASIC with 4-stage pipelined 16bit RISC core. It's connected to cluster bus and the interconnection network, RDT. MBP-light is the heard of JUMP-1, manages the distributed shared memory, I/O (STAFF-Link), system monitoring(MBIF) and so on.

Overview of MBP-light
352pins TBGA / 0.4um embedded array
Random Logic 106,905 gate
Internal Memory 44,848 bit

RDT (Recursive Diagonal Torus)

The RDT router was developed in our laboratory. It has both CMOS and ECL logic to drive RDT directly.

Overview of RDT Router Chip
299pinCMOS SOG, 0.5um
125k gates, ECL device
Bi-CMOS gate 0.11ns typ.
CMOS gate 0.06ns typ.

JUMP-1 Group
Last modified: Mon Oct 15 14:18:09 JST 2001