SSP 1998 Project Summary:
[EPCC home] [SSP home] [2001 projects] [2000 projects] [1999 projects] [1998 projects] [1997 projects] [1996 projects] [1995 projects] [1994 projects] [1993 projects]

A Middleware Environment for Parallel Java on the Hitachi SR2201

Student

Elson Mourao, University of Coimbra

Supervisor

Hon Yau, EPCC


1. Executive Summary

This document is a proposal for a summer student project concerned with the Hitachi-EPCC collaborative Java project for the Hitachi SR2201 distributed memory multiprocessor high-performance computer. The aim is to produce a middleware which will allow the SR2201 to operate as a compute-server in a commercial enterprise environment, to which clients may submit their Java codes for improved execution. The need for such an environment comes from the strong adoption of object-orientated technology (including Java-based technologies), and the increasing prevalence of commodity PC desktop clients. In such an environment, there is assumed to be an established set of Java codes which can be executed on desktops, but which would benefit from access to a remote Java engine. The use of Java is particularly notable for this application as it has much of the machinery to allow distributed object computing, as well as to send objects to remote recipients on the network. The successful completion of this project will provide the student with a broad range of understanding to the current state-of-the-art in Java-based distributed computing, and to the project partners with a means of integrating the parallel Java for SR2201 work into the commercial environment.

2. Overview of the Hitachi-EPCC Collaborative Java Project for the SR2201

In today's commercial enterprise organisations, one of the next major technological waves will come from the use of distributed object amongst existing client-server networks. This observation comes from the solid adoption of object-orientated technology in the commercial environment, as exemplified by languages such as C++, Java and Smalltalk, and the need to integrate the increasing expanse of PC desktops amongst existing legacy systems. The ongoing joint Hitachi Europe Limited (HEL) and EPCC project is a direct reflection of this perceived direction in computing deployment; it is an attempt to allow the execution of multi-threaded Java programs on the Hitachi SR2201 distributed memory multiprocessor high-performance computer. For the successful completion of this project, the following technological components need to be implemented:
  1. A port of the Sun Microsystems Java Virtual Machine onto the SR2201.
  2. Bindings through which Java programs can access the inter-processor communications susbsystem of the SR2201. In particular, this means the production of Java bindings to the existing 'C' MPI communications library on the SR2201.
  3. The definition and implementation of a Java API library whereby programmers may write code which can run as separate threads in a distributed memory fashion over the processors of the target machine. The aim is to have this library mask the use of MPI calls from the user, hence allowing the same Java code to be executed largely unchanged on a single-processor or even a shared-memory multiprocessor machine.
  4. The definition and implementation of a middleware runtime environment whereby the user can execute Java bytecodes onto a remote SR2201 compute-server from a Java-compliant desktop client.
The first three items are currently in progress within EPCC, whilst the fourth will form the kernel of this proposed summer student project.

3. Functional Requirements of the Middleware Environment

This environment has two components executing on the client and server sides. On the server-side running on the SR2201, a lightweight server process will run which accepts and handles requests from a client. These requests may be communicated either through the use of remote method invocation (RMI) or sockets; the former would be more elegant but may not be technologically feasible on the target platform, whilst the latter needs slightly more elaborate codes but should be more portable. This compute-server executes compiled Java programs either stored as a set of class libraries stored on its local filesystem, or received remotely from authenticated clients. The following subsections group the functionalities required into three sets, with a sketch of how they would be implemented on the client and server sides; the technologies concerned are further elaborated in 4 below.

3.1 Services

These functions are concerned with providing information between the client and server, regarding obtaining the state, and interactions with, the compute server.
  1. Query status of the target compute-server, in terms of processor load, available processors, memory, and temporary disk space.
    Server: Use of Java Native Invocation to interface between 'C' system routines and the JVM.
    Client: AWT for the GUI display.
  2. Control of the compute-server, from interactions with the client-side GUI.
  3. Report back on the status of the executions on the compute-server back to the client.
    Server: A means of sending interrupt signals to the executing program on the compute-server.
    Client: Display status results, and also allow interruptions to the execution on the server.
  4. User authentication.
    Server: java.security package for public/private key authenticating the client.
    Client: java.security package for generating the client's authentication certificate.

3.2 Dynamic Class Loading

One of the most exciting aspects of Java technology is the ability to write agent programs which can be migrated from one computer to another. This is possible due to the ability for the Java virtual machine to load in arbitrary classes at runtime, to then instantiate them and execute the methods therein. There are a few possible routes through which these classes can be dynamically accessed, which the student should easily be able to implement incrementally:
  1. Load a class from the compute-server's local filesystem. This approach has the obvious benefit of being totally secure.
    Server: java.lang.ClassLoader for the dynamic loading of classes.
    Client: The appropriate interface from the GUI.
  2. Load a class from a network resource, via socket connection.
    Server: java.lang.ClassLoader.
    Client: The appropriate interface from the GUI.
  3. Load a class from a URL resource. Note that this is essentially what happens when an applet is downloaded into a desktop browser, and should be a simple variation of the socket connection case.
    Server: java.lang.ClassLoader.
    Client: The appropriate interface from the GUI.
  4. Load a class via RMIClassLoader. Here, RMI is used instead of byte streams for loading in the remote class. The feasibility of this approach should be investigated, and can maybe implemented independently of the use of RMI for the client-server communications mechanism.
    Server: java.rmi.server.RMIClassLoader.
    Client: Necessary to write client code for this use of RMI?
  5. Load a class via the transmission of an intermediate Java archive (Jar) file. This has the advantage of allowing several classes to be transmitted at once, but needs deflation and inflation methods to be written at the client and server ends.
    Server: Inflation of received Jar files, as well as java.lang.ClassLoader.
    Client: Deflation of Jar file, and GUI interface to transmit file to the server.
  6. Verification of the loaded class. It should not, for example, over-ride any of the java.* classes held in the compute-server.
    Server: Appropriate methods to check the namespace of the loaded class is acceptable.

3.3 Invocation of Dynamically Loaded Classes for Serial Execution

Having dynamically loaded in the appropriate classes, the compute-server now invokes the methods therein. This is possible if the target classes implement a pre-agreed interface. This interface will also reflect the parallelism execution of the classes, but to start with will simply be the existing java.lang.Runnable interface for the execution of serial programs. In terms of implementation, this will require the use of the instantiation methods in the java.lang.Class and java.lang.reflect.Constructor classes.

3.4 Invocation of Dynamically Loaded Classes for Parallel Execution

This will depend on the final definition of the distributed parallel threads library.

4. Required Technologies for the Middleware Environment

This project will require the investigation and use of many technologies, some of which are still in the process of being revised (notably by Sun Microsystems' Javasoft division). However, the following technological pieces will almost certainly be used in this project:

  1. java.lang.ClassLoader for the dynamic loading of Java classes.
  2. Java Remote method invocation (RMI), for querying and reporting back on the status of the SR2201 compute-server.
  3. Java sockets and networking, should RMI prove unfeasible for the above role.
  4. Java native invocation (JNI), for accessing the 'C' native functions on the SR2201. Specifically, this would be for discovering the state of the machine.
  5. Java Beans. The expected form for the Java objects which will be exported to the compute server is as a Java Bean. This has the attractive quality of a predefined interface through which the middleware environment can query the state and methods of the object.
  6. Java archive format ('Jar' files) are a means of packaging Java classes into one bundle which can then be sent between machines for remote execution.
  7. Java reflection APIs to determine the properties of a Java class, including how to create one of its instances.
  8. Java object serialization. This is the mechanism whereby the state of an object (as defined by its non-transient field variables) is packaged up to be sent to the remote server. Unfortunately, it currently cannot be used to export entire objects (i.e., Java Bytecodes) and requires the serialized-class to be within the Class search-path of the compute-server.
  9. Implementation of the launching of single and multiple Java Virtual Machines from the SR2201, onto other processors.

5. Technical Questions Remaining

Java is a complex technology, in terms of the number of class libraries available to the programmer. Hence there are as yet some unanswered questions which will have to be determined during the earlier phases of this summer's project. Some of these involve technical investigations of the Java APIs, but others involve the basic design of the proposed middleware.
  1. The choice between Java RMI and sockets. Extensive testing of the Java RMI object request broker has not yet been performed for the target compute-server.
  2. Where should the remote classes be accessed? The technologies listed in 3.2 contain options which may be redundant, that is not all of them need to be implemented.
  3. Definition of the parallel distributed threads library will need to be finished prior to its invocation from this middleware.
  4. Determining how to package Java classes at runtime into Jar format, for distributing between distributed machines. The author has verified that unpacking Jar archive files can be done from within the existing 'java.util.zip.*' package APIs.

6. Workplan for the Middleware Environment

It should first be stated that all the functionalities described in 3 are unlikely to be met at the first iteration of the Middleware environment. Instead, the architecture of this middleware will be developed with the progression of the project, as greater understanding of the capabilities and limitations of the Java programming environment is made. In particular, Java classes will first be loaded dynamically.

In terms of the time available to the summer student (10 weeks), the chronological steps for the writing of the middleware environment may be listed as follows:

  1. (6-10th July) EPCC HPC courses, and familiarisation with the aims of the project.
  2. Design for the compute-server, in particular whether it should use RMI and/or network sockets for the client-server communications.
  3. Write a server which is able to dynamically load, instantiate and invoke objects stored on the server's local filesystem, include limited security protection against hostile classes. Extend the server's classloader to obtain classes from remote clients. Extend the server to read in Jar files, to unpack & executes the classes stored therein. These Jar files should also contain a means of defining the runtime environment, via serialized state information.
  4. Ascertain and implement available system calls to determine the state of the compute-server.
  5. Design client functionalities, including the GUI.
  6. Write the server for interacting with the compute-server. Extend client to include the packaging of Jar files to the server. Bind interactions between the client and the compute-server, via the client's GUI.
  7. Implement security authentication mechanisms between the client and the compute-server.
  8. Extend the server for multiprocessor applications, as the master to a set of slaves which is able to receive and handle their states, as well as send signals to them.
  9. (7-11th September) Write report.

Steps (1) and (9) above are each fixed to one week's duration. The steps in between are deliberately ambitious to accommodate the skills of the summer student, and written in rough order of importance to the goals of the joint EPCC-HEL project. How steps (2) to (8) are fitted into the available eight weeks of the programme will be dependent on the experiences of the student.


Webpage maintained by mario@epcc.ed.ac.uk