SSP Project Summary:
Simulating Web Caching on a LAN
[EPCC home] [SSP home] [2001 projects] [2000 projects] [1999 projects] [1998 projects] [1997 projects] [1996 projects] [1995 projects] [1994 projects] [1993 projects]

Student

Hernani Pedroso, University of Coimbra, Portugal

Supervisor

Martin Westhead, EPCC


With the growth in size and importance of the Internet for academia, industry and commerce, the problem of making the most efficient use of the available bandwidth becomes more acute. The problems of Internet congestion are difficult to understand because of the large and highly unstructured topology of the network, and the dynamic nature of its operation.

EPCC is interested in applying the resources of High Performance Computing to the task of simulating Internet traffic. This project involves building a simulation model of a local area network and investigating different possible caching schemes for users of workstations on the network. It will use an existing simulation package built by the Department of Computer Science (DCS) called HASE which provides a simulation engine, graphical model building facilities, data recording and visualisation facilities. HASE runs on sun workstations and has (apparently) been ported to the T3D.

The Hierarchical computer Architecture design and Simulation Environment (HASE)allows for the rapid development and exploration of computer architectures at multiple levels of abstraction, encompassing both hardware and software. A computer network in these terms is considered a generalisation of an architecture. Within HASE simulation objects are called entities. The package includes graphical entity design and edit facilities, entity library creation and retrievalmechanisms,ananimator, and statistical analysis and experimentation tools for deriving system performance metrics.

The LAN chosen for study is the DCS network which involves interconnected ethernets linking over 200 machines this is then connected to the FDDI ring at of EdLan the University network.

The work will involve the following steps

The intention is that the student work on an incremental construction of a simulation model. The modelling language HASE++ is part of the HASE package and specifically tailored to discrete event simulation. The student will start by building a simple ping-pong model with a ping class that generates packets, a pong class that receives packets and bounces them back and a network package that connects the two. In the first instance the network package will be a vanilla network that passes the message along with just a short delay. The ping pong exercise can be built on to construct a task farm model and a multiple-client, single-server model.

With some familiarity with the modelling language the student should be ready to develop the vanilla network model to an ethernet model which incorporates contention and bridges. The earlier message passing examples should also work on this network description. The next stage would be to develop the client/server model constructed earlier to model browser requests and web server replies. A hybrid of the browser and server descriptions can then be used to construct a web cache description. These stages will involve a number of decisions about the what are the most important aspects to model.

Once the model descriptions of the network and the packet generators and sinks are in place. The next stage is to look at the particular network to be modelled. The DCS network has been chosen because it is large enough to have a non-trivial topology, but small enough and accessible enough that a reasonably realistic model of it should be realisable in a short period of time. Constructing a model that reflects the topology can be done using the graphical front end to HASE. However topology is not the only aspect that is important, it would also be useful to model the background activity. This can be measured by placing collecting logs of the distribution of ethernet packets, the number of collisions etc. that occur. The simulation model once built then needs to be run and tested validated against the real data collected from the network.

The final part to the project is to carry out an experimental investigation placing one or more Web caches at different points in the network and comparing the relative performance under different conditions.

There are a number of possible extensions to the project including a comparison of the new HTTP standard with the old (under some circumstances a two fold increase in speed is claimed). The network model could also be extended to take in part of the FDDI ring.

The project report would examine the modelling challenges and trade-offs that needed to be made, and present the conclusions of the experimental study.


The final report for this project is available here.
Webpage maintained by mario@epcc.ed.ac.uk