R. Whitrow, K. Pugh, N. Sherkat (Nottingham Trent University, Burton Street,
Nottingham, UK)
RIASSUNTO:
La natura grafica delle interfacce WIMP ha condotto ad
una difficolta' di accesso per i disabili visivi , sia per i deficit visivi assoluti
che per i deficit parziali.
Questa relazione illustra le caratterisiche di una speciale interfaccia
WIMP accessibile a non vedenti e a parzialmente non vedenti; inoltre sono
presentati i risultati di recenti esperienze condotte in ambiente Windows.
ASBTRACT:
The development of WIMP interfaces has led to a
reduction in the accessibility by blind and partially sighted users of modern
software systems. The graphical nature of the WIMP based interface means
that aids devised for text based systems are difficult or impossible to use by
the blind and partially sighted users. The complexity of navigation around
a multiple window screen makes use impossible for those who previously
had used text based systems.
Graphics and character recognition is essential if the modern software
is to be both accessible and usable by the blind. A system is described
which uses a specialised interface to the standard WIMP input output
system to provide real time monitoring necessary to provide the required
performance for real time operation. A series of experiments were
performed within Microsoft Windows and results are presented. Contextual
information such as layout is used to increase recognition capability.
Introduction
This paper is concerned with the problem of locating, identifying and
recognising objects in the WIMP interface. The WIMP interface is recognised
as a major advance to aid simplicity of operation for the sighted user of computer
systems. For the blind and partially sighted the converse is the case. The WIMP
interface is becoming ever more common and thus there is a growing need to
provide access for the blind. Effective access means spacial awareness, icon
manipulation and ability to interact with the application.
Text based access by the blind is relatively straightforward in that screen
OCR is required with appropriate auditory output. The interaction of text with
graphics requires identification.
On-line interface constraints
Two problems require a solution for the blind to be able to navigate a WIMP
environment. The first is time. The system is interactive and thus screen analysis
and changes must be performed in real time and unobtrusively. The second is the
requirement to present results in an understandable and simple fashion. This is in
the form of auditory output and by use special sliders attached to the keyboard
rather than by use of the mouse.
A two stage process to screen analysis has been adopted involving location
and subsequent identification of objects. Location is achieved by producing an
auditory map which permits user navigation. Identification is found by making use
of the position and objects class.
Object classification
Object classification is found by identifying common characteristics such as
size and location with respect to other ojects. The characteristics that are
common to many window systems are chosen in order to avoid system
dependence. A window may be of any size but generally has one or two totle
bars, several control buttons and likely to be larger than 50X50 pixels.
Icons are small and usually located within windows and directly above text.
They are surrounded by a border of non white pixels.
Control buttons have a specific shape and are frequently located at the ends
of title bars.
Object location
There are two main routines for finding objects: an edge finder and an object
finder. The edge finder uses a standard four way search until it returns back to
the start of the search. The search allows us to construct an object rectangle
which totally encloses the object and is used as an aid to recognition. Rectangle
cases include text, icons and windows.
Screen analysis
The two routines make it possible to locate and classify all screen objects.
Scanning a multi window view takes place from the top left of a screen until a
non white pixel is found. The object rectangle is found, stored and further
scanning performed to find other windows. The whole screen is scanned and
windows may be separate or overlapping.
After window detection we move to icon and text detection which is
performed on each window in turn. Scanning is performed using the coordinates
of each window already found. When an icon is found a search for text
underneath it also takes place. The process is repeated until the whole window
has been searched when all icon sized are stored. Sufficient information now
exists for navigation of the screen although unique identification has not yet been
performed. Typical analysis time is about 3 seconds.
Icon identification and decoding
Icons are the almost infinite variety and thus difficult to identify. We have a
database of pre-stored icon encodings within the WIMP interface. We use this
as a basis for identification. Since the same icon can be used to identify a number
of different facilities, shape alone is not sufficient for operation.
For each row of the icon a count of the number of non white pixels is amde.
A similar count is made for each column and also in the textual region. The sries
of numbers is stored in a database.
Identification takes place by comparing with the stored database of encoded
icons. When a match is found the user is informed by auditory command.
Effectiveness of approach
The screen analysis is about 3 seconds on a 33Mhz 486 computer. Although
the icon encoding and recognition is semplicistic, it has proved to be 100%
effective for Microsoft Windows systems. Windows are also recognised by
encoding the title bar name.
Aiding the identification process
To aid identification a database of known objects was established. This is a
file containing:
- names of all windows in target WIMP system
- for each named window a related file of all installed icons.
At present this data has been manually crafted. The data base allows the user
to discover what windows are installed on a system and what icons are available.
It can also be used to resolve ambiguity where recognition is indecisive.
Optical character recognition
Standard text based system enable easy recognition by directing the ascii
version of the screen map to a speech outputsystem. In the WIMP environment
there is no such memory map of the character codes. Identificatio must be from
the pixels which may represent varying fonts (1).
>
Character extraction and recognition
The object and edge finding routines were adapted for this purpose.
File Edit Search Layout Mark Tools Font Graphics Help
WIMP Screen Recognition
Analysis of the WIMP environment can be split into two
main stages. The first stage to determine the type of the
object on the screen. (I.e Icon, Text, Window, Menu, etc)
and their location. The next to uniquely identify the object.
A prime consideration is speed, since the WIMP screen
recognition is to be performed on-line in real time. Current
research has concentrated on developing fast algorithms
that are as accurate as possible. Splitting analysis into the
above two stages fragments the analysis which makes
it less noticeable to the use. The first stage is undertaken
on initial screen display. The unique identification
is performed as and when the user requires it.
In order to perform the first stage of identification, (identifying
objects as to type) it is necessary to discover characteristics
common to each class of objects. The two most useful
characteristics for this purpose have proved to be an
objects size and it's location in respect to other objects.
For instance:
Fig. 1 - Actual screen
There were a few cases where characters touched with some fonts (ff, tt, xt,
ecc.). These were recognised as a complete entity and separated after the pixel
recognition.
Characters could have been encoded in many ways (2), but in fact we used
the same approach as for icon recognition. We have also introduced run length
encoding for recognition. The method is simple and fast but does limit us to about
six fonts. Provided the user specifies whixh six fonts are required this is not seen
as a major restriction.
Where pattern recognition has proved imperfect we have used a lexical
recogniser to improve results (3).
In Fig.1 we show an actual screen. In Fig.2 shows the attempt at recognition.
File Edit *earch Layout Mark Tools Font Graphics Help
WIMP Scre*n *ecogn***on Analys*s of t*e WIMP
env*ronment can be split into two ma*n stages. The
**r*t stage to determine th* type of the objects on t*e
sc*een. (*.e *con, Text, Window, Menu, etc) and their
loc*tion. The *ext to uniquely iden*ify the objects.
A p*ime consideration is speed, sin*e *** W*mp
s**een rec*gni**on is to be performed on-line *n r*al
time. Curr*nt research has c**centrated on develop*ng
f*st algor*thms **at are *s *ccurate as possible. *plitting
analysis into the ab**e two stages *ragments the analysis
t wh*** ma**s it less notice*ble to the user. The first
stage is undert***n on in*tial screen display. The un**ue
identification *s perform*d a* and when the us*r re*uires
it. *n order to perform the fir** stage of iden******n,
(identifying objects as t* ty**) *t *s necess*ry to d*scover
characteri*tics comm*n to **ch class of object. The two
most useful character*sti** for ***s purp*se have proved
to be an objects size and i*'s location *n respect to other
object*. For instance:
Fig. 2 - Recognition from screen -no lexicon
This show that with a simple procedure we are able to recognise those
characters which are 'clean' data. The use of the lexical recogniser is here
proving to be most valuable. We use it to make a 'best guess' from the
characters that have been found at the pattern level.
Conclusions
We have presented a method which allows reliable navigation and recognition
around a windows environment. Windows may be overlapping and recognition
applies to menus and dialogue boxes. Specialised OCR for a limited number of
fonts has been developed and needs to be extended (4).
Software optimisation will be performed to further reduce process times.
REFERENCES
1) Rosenbaum W., 'Multifont OCR postprocessing system', IBM Journal
of Research, 19, 398-421, July 1975
2) Freeman H., 'On the encoding of arbitrary geometric configurations',
IRE Trans on Elec Comp., 260-268, une 1961
3) Wells C., Evett L., Whitrow R., 'Fast dictionary look-up for
contextual word recognition', Pattern recognition, 23(5), 501-508,1990
4) 'Electronic Documents', Learned Information Ltd, (Special Issue on
OCR), Vol,1 (5), May 1992
IL FUTURO DI RV É NELLA RETE
Vrml e Java consentiranno di realizzare applicazioni per tutte le piattaforme e disponibili a tutti gli utenti Internet.
Associazione non a scopo di lucro, Via Brenta, 7 - 40134 Bologna
Tel.051.6249028 Fax 051.245491
This page is maintained by Luigi Taruffi (webmaster@taruffi.it).