HPCN at ENEA

PQE-1 - Operating System

The operating system is based on the SUN-Solaris 2.5 ä technology. Some customizations were made in order to match with the PQE1’s characteristics.
Some features are:

Parallel File System (PFS);
Parallel I/O (PI/O);
Resource Management (scheduling, and load balancing within multiuser operation);
Single System Image;
File System Availability;
Node Status Monitor;
Communication Network.


Parallel File System (PFS) and Parallel I/O (PI/O)
In the PFS, all operations on files are parallelized, from the naming management to the readind/writing operations.
The PI/O subsystem guarantees the input/output is not a bottleneck for the operating system, and in particular it allows to exploit to the best the parallelism offered by the PFS.
A process checkpointing feature integrated with the PFS and PI/O is also provided.

Single System Image
The Single System Image feature shows the hybrid MIMD-SIMD system to both the user and the system manager as a single platform.

File System Availability
In PQE1 the filesystems are controlled by two different nodes. If one of them becomes not available, the other one works for both and the filesystems remain accessible and the applications are not blocked (as it happened in the CS2 in the past).

Node Status Monitor
It is able to analyze the reason of a node failure, and reboot it.

Communication Network
The CS2’s interconnection network has multiple communication paths between nodes. Point-to-point communications can effectively work even with problems on the links, on condition that these links are identified and extracted from the configuration. Nevertheless a problem remained in the broadcast communications, where a common tree specified by a top switch is used. In case of a tree failure, a new top switch must be selected. The top switch selection and the map of the available nodes must be distributed to different system kernels.
In PQE1, there is a network monitor that automatically detects errors and enriches the intra-kernel communications to support the top-switch change.

Resource Management
The high-level scheduling subsystem makes the partitioning of the processing nodes towards the waiting programs by using a set of algorithms based on the length of the programs’ queue. Moreover, the scheduling algorithms decide the partition of the SIMD nodes to be associated to each MIMD node, on the basis of the computation each MIMD node is implementing. In order to permit the dynamic sharing of a set of nodes among more applications, the scheduling subsystem uses the checkpointing features provided by the PFS.
In PQE1 the Resource Manager, based on "PandoraÔ ", is enriched with new features such as "gang schedule" and "restart".

Parallel Debugger
The parallel debugger is based on "Totalview", which is a de-facto standard for parallel debuggers.
This system is updated to the new version of the Operating System.
 

For more details, see the Documentation and the References.

Back      Home page