JavaGene EPCC Summer Scholarship Programme

	SSP 1998 Project Summary:

[EPCC home] [SSP home] [2001 projects] [2000 projects] [1999 projects] [1998 projects] [1997 projects] [1996 projects] [1995 projects] [1994 projects] [1993 projects]

JavaGene: Java Middleware to Access Distributed Molecular Biology Databases

Student

Scott Trudeau, Oakland University, USA

Supervisors

Martin Simmen, Institute for Cell and Molecular Biology, University of Edinburgh
Hon W Yau, EPCC, University of Edinburgh

Modern biology is entering the era of "big science". With projects underway to obtain complete DNA and protein sequence information for humans and other medically relevant organisms, the community databases (eg the EMBL database in Cambridge, UK) which store sequence information are showing massive growth. For researchers to fully exploit this data, flexible software solutions allowing complex querying of the databases, data retrieval, and comparision of users' local sequences against the database are vital. Many such utilities are currently available through servers on the WWW, for example the Sequence Retrieval System (http://srs.ebi.ac.uk:5000/). Although SRS is widely used, being forms-based users are clearly restricted in their ability to customise it to their own specialised needs.

This project will explore the integrated access of these distributed databases through the use of a middleware application written in Java. In this three-tier architecture, the JavaGene server will sit between the client and data-server layers, orchestrating the access and output of information. Java has a particularly rich set of API libraries which can be exploited for such network-centric applications; indeed this is one of the major application areas in which Enterprise users are currently targetting their Java investment.

The users for this work are molecular biology researchers, who are already familiar with accessing the aforementioned databases on the Internet via their desktop browsers - typically running on an Apple Macintosh. This project will aim to functionally expand the access these users will have to these databases, and integrate it with their local filesystem and an Oracle database server at the EPCC.

An interesting contribution towards enriching browser functionalities for the molecular biology community has been the Colour INteractive Editor for Multiple Alignments (CINEMA) project (http://www.biochem.ucl.ac.uk/bsm/dbbrowser/CINEMA2.1/) from University College London, UK. CINEMA is a Java Applet running on the client browser, and allows users to manipulate and align multiple gene sequences in a graphical manner. Whilst CINEMA does have functionalities to access remote databases, it is a 2-tier design and hence is limited by all the well-known security restrictions of Java Applets.

From the point of view of this project, CINEMA also includes a published Java interface to allow users to write so-called pluglets; hence another goal of this project is to exploit this existing GUI infrastructure, by developing a pluglet which interfaces to JavaGene.

The JavaGene server can be broadly split into three groups of functionalities, reflecting the server-side (eg the databases), middle-layer (ie where JavaGene executes), and client-side (eg desktop browsers) layers of the proposed architecture. The functionalities to be implemented, in rough order of priority, are:

SERVER-SIDE:

Access to remote database resources, and in particular molecular biology search engine services.
Use of the Oracle database server at EPCC, for storing local gene sequences in a relational database. Access to this server would be done either via a pure-Java JDBC driver invoked from JavaGene, or via a native JDBC driver on the Oracle server. The latter route, although more efficient, will require writing remote method invocation client-server codes. In addition, JavaGene can then be used as a conduit for local web-servers to access this local database.
Invocations of application servers running on remote machines, for performing computations which are best done by dedicated (and perhaps commercial) packages. This would require writing small server codes on the remote machines which invoke these packages upon receiving the appropriate request from JavaGene, in much the same way as cgi-bin executables are invoked on web-servers.

MIDDLE-LAYER:

Secure access to the filesystem. This would allow users to import and export their JavaGene data between office productivity packages.
Report generator, for producing output into a form which can be used by the abovementioned packages.

CLIENT-SIDE:

Security mechanism. It is anticipated to have different levels of access and a reliable means of authenticating users will be very important.
HTML output through which the user can navigate amongst the different data sources and perform the valid operations on them (e.g., see the contents of a directory, send a set of sequences to be aligned by a designated package); all preferrably from within a browser environment.
Plain HTML output for vanilla browsers.
CINEMA pluglet interface. This will also require the writing of a pluglet to which JavaGene communicates.

In terms of Java technology, this project is expected to make use of JDBC, remote method invocations, and Java networking. The experiences from this project will allow the end-user to implement a customised data access environment for his department.

H W Yau <hwyau@epcc.ed.ac.uk>
Last modified: 29th July 1998

The final report for this project is available here.

Webpage maintained by mario@epcc.ed.ac.uk