ACCESS TO WIMP INTERFACES FOR THE BLIND

R. Whitrow, K. Pugh, N. Sherkat (Nottingham Trent University, Burton Street, Nottingham, UK)

RIASSUNTO:
La natura grafica delle interfacce WIMP ha condotto ad una difficolta' di accesso per i disabili visivi , sia per i deficit visivi assoluti che per i deficit parziali.
Questa relazione illustra le caratterisiche di una speciale interfaccia WIMP accessibile a non vedenti e a parzialmente non vedenti; inoltre sono presentati i risultati di recenti esperienze condotte in ambiente Windows.
ASBTRACT:
The development of WIMP interfaces has led to a reduction in the accessibility by blind and partially sighted users of modern software systems. The graphical nature of the WIMP based interface means that aids devised for text based systems are difficult or impossible to use by the blind and partially sighted users. The complexity of navigation around a multiple window screen makes use impossible for those who previously had used text based systems. Graphics and character recognition is essential if the modern software is to be both accessible and usable by the blind. A system is described which uses a specialised interface to the standard WIMP input output system to provide real time monitoring necessary to provide the required performance for real time operation. A series of experiments were performed within Microsoft Windows and results are presented. Contextual information such as layout is used to increase recognition capability.

Introduction

This paper is concerned with the problem of locating, identifying and recognising objects in the WIMP interface. The WIMP interface is recognised as a major advance to aid simplicity of operation for the sighted user of computer systems. For the blind and partially sighted the converse is the case. The WIMP interface is becoming ever more common and thus there is a growing need to provide access for the blind. Effective access means spacial awareness, icon manipulation and ability to interact with the application. Text based access by the blind is relatively straightforward in that screen OCR is required with appropriate auditory output. The interaction of text with graphics requires identification.

On-line interface constraints

Two problems require a solution for the blind to be able to navigate a WIMP environment. The first is time. The system is interactive and thus screen analysis and changes must be performed in real time and unobtrusively. The second is the requirement to present results in an understandable and simple fashion. This is in the form of auditory output and by use special sliders attached to the keyboard rather than by use of the mouse.
A two stage process to screen analysis has been adopted involving location and subsequent identification of objects. Location is achieved by producing an auditory map which permits user navigation. Identification is found by making use of the position and objects class.

Object classification

Object classification is found by identifying common characteristics such as size and location with respect to other ojects. The characteristics that are common to many window systems are chosen in order to avoid system dependence. A window may be of any size but generally has one or two totle bars, several control buttons and likely to be larger than 50X50 pixels. Icons are small and usually located within windows and directly above text. They are surrounded by a border of non white pixels. Control buttons have a specific shape and are frequently located at the ends of title bars.

Object location

There are two main routines for finding objects: an edge finder and an object finder. The edge finder uses a standard four way search until it returns back to the start of the search. The search allows us to construct an object rectangle which totally encloses the object and is used as an aid to recognition. Rectangle cases include text, icons and windows.

Screen analysis

The two routines make it possible to locate and classify all screen objects. Scanning a multi window view takes place from the top left of a screen until a non white pixel is found. The object rectangle is found, stored and further scanning performed to find other windows. The whole screen is scanned and windows may be separate or overlapping.
After window detection we move to icon and text detection which is performed on each window in turn. Scanning is performed using the coordinates of each window already found. When an icon is found a search for text underneath it also takes place. The process is repeated until the whole window has been searched when all icon sized are stored. Sufficient information now exists for navigation of the screen although unique identification has not yet been performed. Typical analysis time is about 3 seconds.

Icon identification and decoding

Icons are the almost infinite variety and thus difficult to identify. We have a database of pre-stored icon encodings within the WIMP interface. We use this as a basis for identification. Since the same icon can be used to identify a number of different facilities, shape alone is not sufficient for operation. For each row of the icon a count of the number of non white pixels is amde. A similar count is made for each column and also in the textual region. The sries of numbers is stored in a database.
Identification takes place by comparing with the stored database of encoded icons. When a match is found the user is informed by auditory command.

Effectiveness of approach

The screen analysis is about 3 seconds on a 33Mhz 486 computer. Although the icon encoding and recognition is semplicistic, it has proved to be 100% effective for Microsoft Windows systems. Windows are also recognised by encoding the title bar name.

Aiding the identification process

To aid identification a database of known objects was established. This is a file containing: - names of all windows in target WIMP system - for each named window a related file of all installed icons. At present this data has been manually crafted. The data base allows the user to discover what windows are installed on a system and what icons are available. It can also be used to resolve ambiguity where recognition is indecisive.

Optical character recognition

Standard text based system enable easy recognition by directing the ascii version of the screen map to a speech outputsystem. In the WIMP environment there is no such memory map of the character codes. Identificatio must be from the pixels which may represent varying fonts (1). >

Character extraction and recognition

The object and edge finding routines were adapted for this purpose.
File Edit Search Layout Mark Tools Font Graphics Help WIMP Screen Recognition Analysis of the WIMP environment can be split into two main stages. The first stage to determine the type of the object on the screen. (I.e Icon, Text, Window, Menu, etc) and their location. The next to uniquely identify the object. A prime consideration is speed, since the WIMP screen recognition is to be performed on-line in real time. Current research has concentrated on developing fast algorithms that are as accurate as possible. Splitting analysis into the above two stages fragments the analysis which makes it less noticeable to the use. The first stage is undertaken on initial screen display. The unique identification is performed as and when the user requires it. In order to perform the first stage of identification, (identifying objects as to type) it is necessary to discover characteristics common to each class of objects. The two most useful characteristics for this purpose have proved to be an objects size and it's location in respect to other objects. For instance:
Fig. 1 - Actual screen
There were a few cases where characters touched with some fonts (ff, tt, xt, ecc.). These were recognised as a complete entity and separated after the pixel recognition.
Characters could have been encoded in many ways (2), but in fact we used the same approach as for icon recognition. We have also introduced run length encoding for recognition. The method is simple and fast but does limit us to about six fonts. Provided the user specifies whixh six fonts are required this is not seen as a major restriction.
Where pattern recognition has proved imperfect we have used a lexical recogniser to improve results (3). In Fig.1 we show an actual screen. In Fig.2 shows the attempt at recognition.
File Edit *earch Layout Mark Tools Font Graphics Help
WIMP Scre*n *ecogn***on Analys*s of t*e WIMP env*ronment can be split into two ma*n stages. The **r*t stage to determine th* type of the objects on t*e sc*een. (*.e *con, Text, Window, Menu, etc) and their loc*tion. The *ext to uniquely iden*ify the objects. A p*ime consideration is speed, sin*e *** W*mp s**een rec*gni**on is to be performed on-line *n r*al time. Curr*nt research has c**centrated on develop*ng f*st algor*thms **at are *s *ccurate as possible. *plitting analysis into the ab**e two stages *ragments the analysis t wh*** ma**s it less notice*ble to the user. The first stage is undert***n on in*tial screen display. The un**ue identification *s perform*d a* and when the us*r re*uires it. *n order to perform the fir** stage of iden******n, (identifying objects as t* ty**) *t *s necess*ry to d*scover characteri*tics comm*n to **ch class of object. The two most useful character*sti** for ***s purp*se have proved to be an objects size and i*'s location *n respect to other object*. For instance:
Fig. 2 - Recognition from screen -no lexicon
This show that with a simple procedure we are able to recognise those characters which are 'clean' data. The use of the lexical recogniser is here proving to be most valuable. We use it to make a 'best guess' from the characters that have been found at the pattern level.

Conclusions

We have presented a method which allows reliable navigation and recognition around a windows environment. Windows may be overlapping and recognition applies to menus and dialogue boxes. Specialised OCR for a limited number of fonts has been developed and needs to be extended (4).
Software optimisation will be performed to further reduce process times.

REFERENCES

1) Rosenbaum W., 'Multifont OCR postprocessing system', IBM Journal of Research, 19, 398-421, July 1975
2) Freeman H., 'On the encoding of arbitrary geometric configurations', IRE Trans on Elec Comp., 260-268, une 1961
3) Wells C., Evett L., Whitrow R., 'Fast dictionary look-up for contextual word recognition', Pattern recognition, 23(5), 501-508,1990
4) 'Electronic Documents', Learned Information Ltd, (Special Issue on OCR), Vol,1 (5), May 1992

IL FUTURO DI RV É NELLA RETE
Vrml e Java consentiranno di realizzare applicazioni per tutte le piattaforme e disponibili a tutti gli utenti Internet.

Associazione non a scopo di lucro, Via Brenta, 7 - 40134 Bologna Tel.051.6249028 Fax 051.245491
This page is maintained by Luigi Taruffi (webmaster@taruffi.it).