SSP 1998 Project Summary:
[EPCC home] [SSP home] [2001 projects] [2000 projects] [1999 projects] [1998 projects] [1997 projects] [1996 projects] [1995 projects] [1994 projects] [1993 projects]

JavaGene: Java Middleware to Access Distributed Molecular Biology Databases

Student

Scott Trudeau, Oakland University, USA

Supervisors

Martin Simmen, Institute for Cell and Molecular Biology, University of Edinburgh
Hon W Yau, EPCC, University of Edinburgh
 
Modern biology is entering the era of "big science". With projects underway to obtain complete DNA and protein sequence information for humans and other medically relevant organisms, the community databases (eg the EMBL database in Cambridge, UK) which store sequence information are showing massive growth. For researchers to fully exploit this data, flexible software solutions allowing complex querying of the databases, data retrieval, and comparision of users' local sequences against the database are vital. Many such utilities are currently available through servers on the WWW, for example the Sequence Retrieval System (http://srs.ebi.ac.uk:5000/). Although SRS is widely used, being forms-based users are clearly restricted in their ability to customise it to their own specialised needs.

This project will explore the integrated access of these distributed databases through the use of a middleware application written in Java. In this three-tier architecture, the JavaGene server will sit between the client and data-server layers, orchestrating the access and output of information. Java has a particularly rich set of API libraries which can be exploited for such network-centric applications; indeed this is one of the major application areas in which Enterprise users are currently targetting their Java investment.

The users for this work are molecular biology researchers, who are already familiar with accessing the aforementioned databases on the Internet via their desktop browsers - typically running on an Apple Macintosh. This project will aim to functionally expand the access these users will have to these databases, and integrate it with their local filesystem and an Oracle database server at the EPCC.

An interesting contribution towards enriching browser functionalities for the molecular biology community has been the Colour INteractive Editor for Multiple Alignments (CINEMA) project (http://www.biochem.ucl.ac.uk/bsm/dbbrowser/CINEMA2.1/) from University College London, UK. CINEMA is a Java Applet running on the client browser, and allows users to manipulate and align multiple gene sequences in a graphical manner. Whilst CINEMA does have functionalities to access remote databases, it is a 2-tier design and hence is limited by all the well-known security restrictions of Java Applets.

From the point of view of this project, CINEMA also includes a published Java interface to allow users to write so-called pluglets; hence another goal of this project is to exploit this existing GUI infrastructure, by developing a pluglet which interfaces to JavaGene.

 The JavaGene server can be broadly split into three groups of functionalities, reflecting the server-side (eg the databases), middle-layer (ie where JavaGene executes), and client-side (eg desktop browsers) layers of the proposed architecture. The functionalities to be implemented, in rough order of priority, are:

In terms of Java technology, this project is expected to make use of JDBC, remote method invocations, and Java networking.  The experiences from this project will allow the end-user to implement a customised data access environment for his department.



H W Yau <hwyau@epcc.ed.ac.uk>
Last modified: 29th July 1998

The final report for this project is available here.
Webpage maintained by mario@epcc.ed.ac.uk