|
SSP 1998 Project Summary:
|
|
JavaGene: Java Middleware to Access Distributed Molecular Biology Databases
Student
Scott Trudeau, Oakland University, USA
Supervisors
Martin Simmen, Institute for Cell and Molecular Biology, University
of Edinburgh
Hon W Yau, EPCC, University of Edinburgh
Modern biology is entering the era of "big science". With projects
underway to obtain complete DNA and protein sequence information for humans
and other medically relevant organisms, the community databases (eg the
EMBL database in Cambridge, UK) which store sequence information are showing
massive growth. For researchers to fully exploit this data, flexible software
solutions allowing complex querying of the databases, data retrieval, and
comparision of users' local sequences against the database are vital. Many
such utilities are currently available through servers on the WWW, for
example the Sequence Retrieval System (http://srs.ebi.ac.uk:5000/). Although
SRS is widely used, being forms-based users are clearly restricted in their
ability to customise it to their own specialised needs.
This project will explore the integrated access of these distributed
databases through the use of a middleware application written in Java.
In this three-tier architecture, the JavaGene server will sit between the
client and data-server layers, orchestrating the access and output of information.
Java has a particularly rich set of API libraries which can be exploited
for such network-centric applications; indeed this is one of the major
application areas in which Enterprise users are currently targetting their
Java investment.
The users for this work are molecular biology researchers, who are already
familiar with accessing the aforementioned databases on the Internet via
their desktop browsers - typically running on an Apple Macintosh. This
project will aim to functionally expand the access these users will have
to these databases, and integrate it with their local filesystem and an
Oracle database server at the EPCC.
An interesting contribution towards enriching browser functionalities
for the molecular biology community has been the Colour INteractive
Editor for Multiple Alignments (CINEMA) project (http://www.biochem.ucl.ac.uk/bsm/dbbrowser/CINEMA2.1/)
from University College London, UK. CINEMA is a Java Applet running on
the client browser, and allows users to manipulate and align multiple gene
sequences in a graphical manner. Whilst CINEMA does have functionalities
to access remote databases, it is a 2-tier design and hence is limited
by all the well-known security restrictions of Java Applets.
From the point of view of this project, CINEMA also includes a published
Java interface to allow users to write so-called pluglets; hence another
goal of this project is to exploit this existing GUI infrastructure, by
developing a pluglet which interfaces to JavaGene.
The JavaGene server can be broadly split into three groups of
functionalities, reflecting the server-side (eg the databases), middle-layer
(ie where JavaGene executes), and client-side (eg desktop browsers) layers
of the proposed architecture. The functionalities to be implemented, in
rough order of priority, are:
-
SERVER-SIDE:
-
Access to remote database resources, and in particular molecular biology
search engine services.
-
Use of the Oracle database server at EPCC, for storing local gene sequences
in a relational database. Access to this server would be done either via
a pure-Java JDBC driver invoked from JavaGene, or via a native JDBC driver
on the Oracle server. The latter route, although more efficient, will require
writing remote method invocation client-server codes. In addition,
JavaGene can then be used as a conduit for local web-servers to access
this local database.
-
Invocations of application servers running on remote machines, for performing
computations which are best done by dedicated (and perhaps commercial)
packages. This would require writing small server codes on the remote
machines which invoke these packages upon receiving the appropriate request
from JavaGene, in much the same way as cgi-bin executables are invoked
on web-servers.
-
MIDDLE-LAYER:
-
Secure access to the filesystem. This would allow users to import and export
their JavaGene data between office productivity packages.
-
Report generator, for producing output into a form which can be used by
the abovementioned packages.
-
CLIENT-SIDE:
-
Security mechanism. It is anticipated to have different levels of access
and a reliable means of authenticating users will be very important.
-
HTML output through which the user can navigate amongst the different data
sources and perform the valid operations on them (e.g., see the
contents of a directory, send a set of sequences to be aligned by a designated
package); all preferrably from within a browser environment.
-
Plain HTML output for vanilla browsers.
-
CINEMA pluglet interface. This will also require the writing of a pluglet
to which JavaGene communicates.
In terms of Java technology, this project is expected to make use of JDBC,
remote method invocations, and Java networking. The experiences from
this project will allow the end-user to implement a customised data access
environment for his department.
H W Yau <hwyau@epcc.ed.ac.uk>
Last modified: 29th July 1998
The final report for this project is available here.