SSP 1995 project summary:

[EPCC home] [SSP home] [2001 projects] [2000 projects] [1999 projects] [1998 projects] [1997 projects] [1996 projects] [1995 projects] [1994 projects] [1993 projects]

OCHRE-P: Optical Character Recognition in Parallel

This project involved the development of a fast and accurate prototype OCR system to meet the needs of high-volume document processing applications such as forms processing.

OCR is not yet in widespread use in such applications because of the heavy computational demand of accuracte OCR. Parallel computing can help to change this by:

increasing raw throughput.
making more accurate but more computationally intensive algorithms practical.

A number of parallelisation strategies were evaluated and a prototype OCR system called OCHRE-P modelled on a task farming paradigm was implemented. The parallel framework used lines of text as the unit of decomposition.

A feed-forward neural network with a single hidden layer was used for pattern classification. Using a novel boundary scanning technique along with a scaled bitmap to provide the input feature vector, a recognition accuracy of around 97% was obtained for correctly segmented upper-case letters from a variety of fonts, including some that did not form part of the training set.

Parallel performance on a cluster of seven Ethernet workstations and on a Transputer Computing Surface were compared: the shared bus architecture of the cluster delivered near-optimal scalability (see Figure) and would seem to be more suited to this problem.

[IMAGE: Graph showing linear speedup on a
Sun Workstation cluster (2352 bytes)]

This suggests that a high performance OCR system could be designed relatively cheaply using industry standard PCs and an inexpensive local area network.

Possible extensions to the work were also considered.

Mark Forbes worked on this project.

Compressed PostScript of the project's final report is available here (179184 bytes) .

Webpage maintained by mario@epcc.ed.ac.uk