TEDUB: AUTOMATIC INTERPRETATION AND PRESENTATION OF TECHNICAL DIAGRAMS FOR BLIND PEOPLE

engineerbeetsAI and Robotics

Nov 15, 2013 (3 years and 6 months ago)

95 views

TEDUB: AUTOMATIC INTERPRETATION AND PRESENTATION OF TECHNICAL DIAGRAMS FOR BLIND
PEOPLE


M. Horstmann (TZI, University of Bremen, Germany), C. Hagen (University of Bamberg, Germany), A.
King (Department of Computation, UMIST, Manchester, UK), S. Dijkstra
(FNB, Amsterdam, Netherlands),
D. Crombie (FNB, Amsterdam, Netherlands), D. G. Evans (Department of Computation, UMIST,
Manchester, UK), G. T. Ioannidis (TZI, University of Bremen, Germany), P. Blenkhorn (Department of
Computation, UMIST, Manchester, UK),
O. Herzog (TZI, University of Bremen, Germany), Ch. Schlieder
(University of Bamberg, Germany)

Contact: Mirko Horstmann, Universitaetsallee 21
-
23, 28359 Bremen, Germany, Fax: +49 421 218 7196,
email: mir@tzi.de


Abstract: This paper describes the advances

of the software developed in the context of the TeDUB
project ("Technical Drawings Understanding for the Blind"), which aims at providing blind computer
users with an accessible representation of technical diagrams. The TeDUB system consists of two
separa
te parts: one for the (semi
-
) automatic analysis of images containing diagrams from a number of
formally defined domains and one for the representation of previously analysed material to blind
people.

1. Introduction


One of the problems for blind and vis
ually impaired people is to make use of information contained in
graphics. While a number of mature techniques (e.g., optical character recognition, screen readers or
braille devices) allow them to access text from documents, even if they originate from pr
inted material,
the content of informational graphics like technical diagrams typically remains inaccessible. The usual
approaches to address these difficulties are tactile diagrams and the use of manually created meta
-
data
like textual descriptions (see K
urze 1995). However, both approaches necessitate active human
intervention. For tactile diagrams, the existing data has to be carefully redesigned (see Levi and Amick
1982). Other possible solutions rely on specialised (and often expensive) hardware like t
ouch tablets and
combine them with sound (e.g., the TACIS system, Gallagher and Frasch 1998).


The TeDUB system makes graphical information accessible using semi
-
automatic and automatic analysis
of graphical content and the import of file formats that con
tain semantic information, and it presents
the information to the blind user through a specialised navigation interface. It is intended to handle
technical drawings (diagrams that conform to certain standards) from arbitrary domains and
demonstrates this f
or three domains: analogue and digital electronic circuits, certain UML (Unified
Modelling Language) diagrams and architectural floor plans. It consists of several parts for the
interpretation and presentation of such diagrams. Since first results were pub
lished by Petrie et al.
(2002) and Födisch et al. (2002), several prototypes have been developed. The system's architecture has
received fundamental revisions and is now able to process three types of diagram input: bitmap
graphics, vector graphics and fil
e formats containing semantic information. The navigation interface has
matured considerably based on the results of intensive user evaluations of the first prototypes.

2. System Architecture


The TeDUB system consists of two main parts, DiagramInterprete
r and DiagramNavigator.
DiagramInterpreter analyses existing diagrams and converts them into a representation that can be
used by DiagramNavigator, which provides blind users with an interface to navigate and annotate these
diagrams through a number of inp
ut and output devices.

2.1 DiagramInterpreter


The TeDUB system is able to handle diagrams at different levels of abstraction: Bitmap graphics, such as
acquired through standard scanner hardware or found on web pages, vector graphics as typically
produced

with graphics programs like Corel Draw and file formats with semantic content
-

the XMI
format (XML Metadata Interchange) or formats specific to CAD
-

and other modelling software fall into
this category.



Figure 1: Architecture of DiagramInterpreter


D
iagramInterpreter's core is the knowledge processing unit. It operates on a network of hypotheses and
processes them incrementally until a semantic description of the whole diagram is found. The image
processing unit analyses bitmap images and generates a
first set of hypotheses based on the geometric
information therein. Vector graphics files, which already contain explicit information about geometric
primitives, can be used via DiagramInterpreter's SVG (Scalable Vector Graphics) import functionality.
The
Annotator allows a sighted user to interact with the interpretation process by inserting hypotheses
manually and thus improving the quality of the interpretation as well as adding useful information not
contained in the original diagram. All domain depende
nt aspects of DiagramInterpreter are externalised
as formalised knowledge. Therefore, the system is designed to minimise the effort to incorporate a new
type of diagram.

2.2 DiagramNavigator


DiagramNavigator is the user interface component of the system.

It presents the diagram content
obtained by DiagramInterpreter to the user. It also performs XSL transformation of XMI
-
format UML
diagrams exported from UML design tools like Rational Rose or ArgoUML into the same TeDUB form,
presented by the same user in
terface (Figure 2). The great advantage of this latter approach is that the
information contained in the diagram is converted perfectly into the TeDUB format: the variable result
of image analysis of bitmaps is avoided.


In both cases the information is m
odelled as a set of nodes which can be navigated either hierarchically
or as a collection of connected graphs. Output is screen
-
reader independent and utilises 2D and 3D
sound. Input is via the keyboard or an optional tactile tablet. An inexpensive commerc
ial games force
feedback joystick is used as a simple input and output tactile device. With the exception of the tactile
tablet, the interface is designed to use inexpensive and commercially
-
available devices: this is important
if the system is to have any

real application in future.



Figure 2: The User Interface

2.3 The TeDUB Representation Format for Diagrams


Analysed data is exchanged between DiagramInterpreter and DiagramNavigator in an XML
-
based format
that contains the semantic information from the original image. As depicted in section 3,
DiagramNavigator is able to communicate hierarchical and spatial inf
ormation, which is represented by
two types of edges: part
-
whole
-
relationships and relative positions for defined pairs of objects. Figure
3(a) and Figure 3(b) show parts of the representation of an example diagram from the architectural
domain.



Figure
3(a): Representation of an architectural diagram: a section of the hierarchy



Figure 3(b): Representation of an architectural diagram: a section of the connectivity presentation. The
network of connected nodes does not necessarily respect the hierarchy le
vels.


As an advantage of this representation, information is only included if it is relevant to the user: in a UML
class diagram, e.g., the exact geometric path of an association between class nodes does not help in the
understanding of the diagram. On t
he other hand, the combination of the hierarchical representation of
floor plans (see section 3) and the spatial layout of the rooms makes it easier for a blind user to find his
way through the depicted building. By conveying the semantic hierarchical stru
ctures of the diagram the
user does not have to build up the structures himself through painstaking synthesis of the simple nodes
making up the diagram.

3. Automatic Interpretation of Images


The main goal of DiagramInterpreter is to build an interpreted
representation of a given diagram. As
noted above, the TeDUB system allows to process three different types of input data. Of these, bitmap
and vector graphics have to be interpreted in order to be presented to blind users in a meaningful way.

3.1 Knowled
ge Modelling


Automatic interpretation is performed by processing components of the diagram in a partonomic
hierarchy of different abstraction levels, from lowest (geometric primitives like "straight line", "curve" or
"rectangle") to highest (functional un
its like "room" in the architectural domain or "full adder" in an
electronic circuit). In a partonomic hierarchy, two elements are related if one is a part of the other. In
the architectural domain, e.g., several hypothesised lines or arcs may be parts of
a door or a window,
while several windows, doors and surrounding walls may be parts of a room. The knowledge
management unit uses a data
-

and model
-
driven aggregation process which creates a complete
interpretation of a diagram by stepwise inferring new pa
rts from existing ones.


Diagrams in bitmap or vector graphics format may be of varying quality. This is especially true for
graphics from scanned documents where noise and other distortions can lead to missing or ambiguous
information. But information ma
y also be ambiguous in vector graphics formats, e.g., if a text
annotation must be assigned to one of two nearby objects. The inference mechanism deals with this
uncertainty by treating elements from the diagram as hypotheses about its parts. Each hypothes
is is
assigned a value that represents the confidence in its correctness.


The TeDUB system aims at being domain independent. Therefore, all domain
-
specific aspects are
externalised as formalised knowledge (ontologies) and new types of diagrams are made a
ccessible to
the system by specifying the corresponding ontologies. The core of the formal language consists of
aggregation rules for the definition of concepts. Obviously, concepts on lower levels of abstraction are
the least domain
-
dependent and are suit
ed to be modelled in a reusable way. Concepts on the lowest
level must also be pre
-
defined in order for the several modules of DiagramInterpreter to communicate
with each other.

3.2 Image Processing


In the case of bitmap graphics (the lowest level of inp
ut to the TeDUB system), images are first analysed
by the image processing module which provides the means for an extraction of image features. The
module follows the usual image processing pipeline of pre
-
processing, segmentation and feature
extraction (s
ee Abmeyr 1994) with an emphasis on the extraction of lines. The goal is to obtain an initial
set of simple hypotheses describing geometric properties of the image that serve as input to the
knowledge management module.


The positions of lines in the imag
e are determined using a skeletonisation approach, which determines
the approximated centre lines of all components (an approach also used by Dosch et al. 2000). In a next
step, these lines are transferred into graphs of connected components. The nodes of
these graphs
define crossings, end points and corners of lines, while the graphs' edges contain information about the
line segments
-

their thickness, curvature and other properties necessary for the subsequent
classification.


Currently, elements from th
e input graphics are classified by the image processing module as one of four
concepts, which also constitute the pre
-
defined set of hypotheses necessary for the communication
between modules: straight line, arc, line graph and textbox. A line graph contai
ns connected line
segments such as adjoining walls in a floor plan. A textbox describes the position and content of text
lines in the diagram. The actual recognition of the contained text is done by an external OCR engine.

4. User Interface for Navigation


The TeDUB system is designed to communicate semantic information to the user, rather than precise
component orientation and spatial position. The diagram content is formed into a connected network of
nodes. There is also a compositional hierarchy, so a n
ode may be a high
-
level aggregation of basic
components or a low
-
level component. Figure 3(a) shows an example for an architectural diagram. The
user navigates starting from the root, top node of the diagram, and so encounters the semantic
structures befor
e the simple components. This is intended to allow blind users to access the important
high
-
level information as immediately and quickly as possible. The actual implementation of this
hierarchical navigation is modelled upon Microsoft Windows Explorer: the

user can move around all the
diagram contents using the cursor keys, a mechanism prompted by the observation in early evaluation
studies, which showed that users were familiar and comfortable with such an interface. It utilises simple
earcons (non
-
speech
sounds) in the style of Brewster (1998) as context and feedback sounds to
supplement the text
-
based user interface, such as a tone to indicate the end of a list or the lack of child
nodes of the current node. A miscellany of functions support common naviga
tion and communication
tasks, for example allowing annotation to be applied to any node, the ability to retrace one's steps with
a back function like that in a web browser, a search function for finding nodes by content or type, the
ability to hide or show

different types of nodes, and simple editing abilities.


Spatial and connection information (which may or may not be important, depending on the diagram
domain and the task undertaken by the user) is orthogonal to the hierarchical information, connecting

nodes within levels (and possibly between them). An example is given in Figure 3(b).


The presentation of this connectivity and spatial information presents more problems than the text
-
based hierarchical information. The interface therefore includes a nu
mber of different functions driven
by an inexpensive commercial games joystick as an unsophisticated tactile device. A map function allows
the user to locate nodes within the diagram space by directly associating a joystick position with the
corresponding
point on the diagram: as the user moves the joystick, they hear the names of the nodes
encountered at that location. The user can also use the joystick to explore the connections of one
particular node: when the joystick is pointed in the direction of a ne
ighbour its name is given. In both of
these functions spatialised 3D audio is used to reinforce and confirm the tactile and text output.
Standard computer sound cards are now capable of complex 3D effects, so no specialised equipment is
required. Further u
ser interface functions have been developed for communicating architectural
diagrams, including an attempt to use the force feedback abilities of the joystick to delineate the shape
of a room
-

spatial rather than connectivity information
-

and these will
be evaluated with users in the
next round of evaluation.


The user interface is designed to be screen
-
reader independent: it is built with standard Microsoft
Windows controls (such as text boxes and buttons) and complies with Microsoft accessibility and u
ser
interface design guidelines. This allows users to use their familiar and reliable screen reader to access
the diagram information, although it restricts the interface design (for example, making it impossible to
vary speech location or voice to communi
cate different spatial information). Expert diagram users, such
as blind software engineers, can be expected to be screen reader experts and quite possibly Braille
users: it is sensible to allow them to utilize their strong screen reader skills rather than

presenting them
with a dedicated but perhaps less useful self
-
voicing speech interface.


To support navigation through UML diagrams a generic tactile overlay and touch tablet will be used to
provide connection information between nodes in a similar way t
o that suggested by Blenkhorn and
Evans (1998).

5. User Evaluation of the First Prototype


This section presents the results of the evaluation of the first TEDUB prototype software. The evaluation
focused on the different user interfaces and how participa
nts navigated through the diagrams. The test
group consisted of eleven visually impaired participants. Each participant familiarised themselves with
the software remotely. This was followed by a face
-
to
-
face interview using a semi
-
structured question
sched
ule. Three had previous knowledge of the domain being evaluated (electronic circuits) and eight
had no previous knowledge of this field.


During this evaluation, participants were asked to explore diagrams, to search for information and to
calculate the o
utput of a diagram given the input (as in the domain of digital diagrams). Following these
tasks, the interviewer administered the question schedule, examining the participants' experience of the
system and looking in detail at the specific components; the

different interfaces and the ease with which
participants could solve the initial tasks and find and understand the appropriate information.

5.1 Comparison of Interfaces


The first part of the study concerned the use of the different interfaces, namely c
omputer keyboard,
screen reader, 2D sound, 3D sound and joystick.


In general, users of the system experienced no difficulties in working with the keyboard, the screen
reader and the joystick. It was important that as far as possible the keyboard function
s followed the
keyboard commands with which users were already familiar. There were no problems encountered in
the use of screen readers and the system performed well in this respect. The use of 2D sounds like
warning and error signals was highly appreciat
ed. The support of 3D sounds for the spatialisation of the
diagrams seemed less effective. Where users could read the same information without supporting 3D
sounds, they preferred to do so. In addition the 3D sound system is not very easy to use in office
or
educational environments. It was found that participants had to get used to the joystick, but after some
practice it was felt to be quite a natural way to explore the diagrams.


By using a combination of these interfaces, all the participants were able

to build up a coherent
representation of the diagrams. Participants demonstrated an ability to find the relevant information in
the most efficient manner and indicated that any kind of redundant information should be skipped. One
new functionality was rec
ommended: that the system should give more feedback about where the user
is located within the system (i.e., current item and current level). In this respect, a function that warns if
all the components on a level have been visited could help to reduce any

uncertainty on the part of the
user.

5.2 Navigation through Hierarchical Information


For the second part of the evaluation, we investigated whether or not users were able to understand
the way information is structured in the TeDUB diagrams created by t
he prototype software.


The key result from the evaluation concerns the manner in which participants build up their mental
representation of the diagrams being examined. Existing research (see Miller 1956) reports that the
human short term memory is able
to process 7 plus or minus two items simultaneously (although later
research is stating that this is less than 7). These items can be solitary objects (such as numbers, words,
or pictures) but also composite objects (i.e., related pieces of information tha
t together form larger,
meaningful objects) thereby stretching the total amount of to be processed items. This was found to be
particularly important for the way in which information is presented in the TeDUB system when dealing
with more complex diagrams.



The results from those participants who performed the tasks with the more complex hierarchical
diagrams showed a confirmation of the hypothesis that clustering information can exceed the total
amount of information that can be processed. In these diagrams,

the information was clustered into a
smaller number of composite items as the user navigated up to a higher level. Where it was possible to
group related items into meaningful composite objects, users found it much easier to gain an overview
of larger and

more complex diagrams. This strategy will be taken into account for the presentation of
information in the other domains being examined.

6. Conclusion


This paper provides an overview of the TeDUB project, its technical modules and the evaluation of its
current state. The evaluations of the user interface are promising: The user interface features and
functions are effective and well accepted by the users and the concept of providing information at
different levels of abstraction has turned out to be very

useful. Also, first results of the automatic
interpretation of diagrams from the investigated domains are encouraging. The current TeDUB system is
a good basis for further development in the project.

Acknowledgement


The work presented here is funded by
the Information Society Technologies Programme of the
European Commission under the project "TeDUB: Technical Drawings Understanding for the Blind" and
the contract number IST
-
2001
-
32366.

References


Abmeyr, W. (1994), "Einführung in die digitale Bildvera
rbeitung", B. G. Teubner, Stuttgart


Blenkhorn, P. and G. Evans (1998), "Using speech and touch to enable blind people to access schematic
diagrams", in Journal of Network and Computer Applications vol. 21, 1998, pp.17
-
29.


Brewster, S. (1998), "Using No
n
-
speech Sounds to Provide Navigation Cues", in ACM Transactions on
Computer
-
Human Interaction, 5(2), 1998, pp. 26
-
29.


Dosch, Ph., K. Tombre, C. Ah
-
Soon, and G. Masini (2000). "A Complete System for the Analysis of
Architectural Drawings", in Internation
al Journal on Document Analysis and Recognition, 3(2), pp. 102
-
116, Dec. 2000.


Födisch, M., D. Crombie and G. Ioannidis (2002), "TeDUB: Providing access to technical drawings for
print impaired people", in Proceedings Conference and Workshop on Assistive

Technologies for Vision
and Hearing Impairment: Accessibility, Mobility and Social Integration, 2002.


Gallagher, B. and W. Frasch (1998), "Tactile Acoustic Computer Interaction System (TACIS): A new type
of Graphic Access for the Blind", in Technology f
or Inclusive Design and Equality Improving the Quality of
Life for the European Citizen, Proceedings of the 3rd TIDE Congress, 23
-

25 June 1998, Helsinki, Finland.


Kurze, M. (1995), "Giving Blind People Access to Graphics (Example: Business Graphics)",
Proc. Software
-
Ergonomie '95 Workshop "Nicht
-
visuelle graphische Benutzungsoberflächen", Darmstadt, 22 Feb. 1995.


Levi, J. M. and N. S. Amick (1982), "Tangible graphics: producers' views", in Tactual Perception, editors
W. Schiff and E. Foulkes, Cambridg
e University Press, Cambridge, UK, 1982, pp. 417
-

429.


Miller, G.A. (1956), "The magical number seven, plus or minus two: Some limits on our capacity for
processing information", in Psychological Review, 63, pp. 81
-
97.


Petrie, H., Ch. Schlieder, P.
Blenkhorn, D. G. Evans, A. King, A.
-
M. O'Neill, G. Ioannidis, B. Gallagher, D.
Crombie, R. Mager and M. Alafaci: "TeDUB: A System for Presenting and Exploring Technical Drawings
for Blind People" in 8th International Conference on Computers Helping People
with Special Needs
2002: pp. 537
-
539.