Voice Activated Function Keyboard (VAFK) with a USB Interface Final Report Fall Semester 1999 by

notownbuffAI and Robotics

Nov 17, 2013 (3 years and 1 month ago)

61 views




Voice Activated Function Keyboard (VAFK) with a USB Interface

Final Report

Fall Semester 1999




by




Jeff Boody

Greg Milburn

Mark Jurjovec






Prepared to partially fulfill the requirements for

EE401






Department of Electrical Engineering

Colorado

State University

Fort Collins, Colorado 80523







Report Approved _______________________________

Project Advisor



1

Abstract


The computer industry revolves around fast changing technology which strives to
improve performance, reliability and usability.

This project is designed to combine
several new technologies to solve some major problems in the computer industry. One
problem in the computer industry is in implementing an effective speech recognition
system on personal computers. Current speech rec
ognition systems typically use large
software programs which can use significant system resources to get acceptable results.
Another problem in the computer industry is the connection of peripheral devices to
personal computers. Current peripheral connec
tion methods are difficult to configure and
also difficult to design.

We are attempting to solve these problems by using new technologies from the
speech recognition and communications industries. Our approach to solving the complex
software speech recogn
ition system is to use a generic speech recognition chip (Sensory
Voice Direct chip) to recognize words. Our current design is suitable for implementing a
small set of commands (~15). Our approach to solving the peripheral communication
problem relies on

a new communication protocol called the Universal Serial Bus (USB).
We will be using a generic USB chip (Cypress CY7C63000A) which can be programmed
easily by using assembly code. These two new technologies allow us to design a
powerful peripheral devic
e that is very simple and easy to use.

These speech recognition and USB technologies are proving to be powerful tools
in the computer industry. The use of speech recognition in computers has the potential of
significant productivity gains. Computers will

require speech recognition systems that
are speaker independent and support large vocabularies. The speech recognition chip
technology is available to implement this type of a system on computers. The USB
technology is proving to be an excellent alterna
tive for many peripheral devices. Our
project shows how the USB technology can be used to simplify a design and make our
product easier to use.


2


TABLE OF CONTENTS





Title

0


Abstract

1


Contents

2


List of Figures

3

I.

Introduction

4

II.

Backgro
und of Speech Recognition Systems

7

III.

Background of the Universal Serial Bus (USB)

9

IV.

Speech Recognition Module

14

V.

Voice Direct/USB Circuit Interface

17

VI.

USB Circuit Results

23

VII.

Conclusion

27

VIII.

References

29



3

List of Figures and

Tables




Figure 1

VAFK Commands

5

Figure 2

USB Topology

10

Figure 3

Speech Recognition Pin Diagram

14

Figure 4

Voice Direct/USB Circuit Interface Signals

17

Figure 5

Block Diagram of Voice Direct/USB Circuit Interface

19

Figure 6

USB Scan Codes for
VAFK commands

20

Figure 7

State Diagram for Interrupt Signal

22

Figure 8

USB Chip Comparison

23

Figure 9

USB Firmware Flowchart

25



4

Introduction:

Voice Activated Function Keyboard (VAFK) with a USB Interface




Bill Gates of Microsoft recently said th
at “speech is not just the future of
Windows, it’s the future of computers.” The Voice Activated Function Keyboard is a
step in that direction. The VAFK, will allow users to navigate Windows applications
through various predetermined voice activated comm
ands. This is opposed to using the
traditional confusing menu bars or complex keystrokes. Hence, by utilizing the power of
speech recognition the VAFK makes Windows applications quicker and easier to
navigate. The Voice Activated Function Keyboard is t
hen in essence a glimpse into the
future by providing Windows users the convenience of speech recognition today.


The VAFK is designed to be a stand alone peripheral that takes advantage of the
voice
-
user interface as a way to make software more usable.

The VAFK is a speaker
dependent device. This means that it will only recognize the speaker who programmed it.
In order to prevent spurious speech or background noise from generating unwanted
functions the VAFK employs a push
-
to
-
talk key. The push
-
to
-
t
alk key, when depressed,
activates the system which then prompts the user for a command. This type of
implementation has two advantages over continuous listening systems. The first
advantage is that it insures that the device only listens to what the use
r intends. The
second main advantage is that the VAFK does not require valuable processing power
when it is not being used. Initially the VAFK will be designed to recognize only a
limited number of commands. The prototype commands are listed in the tabl
e below
along with their corresponding key equivalents.




5

Windows NT Command

Key Equivalent

Close

Alt
-

F4

Task Switch

Alt
-

Tab

Copy

Ctrl
-

c

Paste

Ctrl
-

v

Select All

Ctrl
-

a

Start Menu

Ctrl
-

Esc

Security Menu

Ctrl
-

Alt
-

Esc

Help

F1

Figure 1
: VAFK Commands



The VAFK is realized in hardware with three distinct stages: speech recognition,
the speech circuit to USB circuit interface and the USB. The first stage consists of,
Sensory’s VOICE DIRECT, speech recognition module. The VOICE DIRECT m
odule
maps spoken commands to system control functions. The second stage then translates
these control functions to their appropriate USB keyboard scan codes which are then
interfaced with the USB circuit. By utilizing USB technology the VAFK provides th
e
convenience of being a plug and play peripheral. Most PC’s, including notebooks, on the
market today are fully USB
-
ready. The USB circuit is primarily firmware located on a
USB chip (Cypress CY7C63000A) with a minimal amount of additional hardware. Th
e
USB circuit is responsible for configuring the chip as a keyboard and transmitting /
receiving data. Detailed descriptions and implementations of each stage follows in
subsequent chapters.

The Voice Activated Function Keyboard, is like adding instant ca
pabilities to
one’s PC. There have been other attempts to add speech recognition to PCs however
these attempts have had limited success. Recently, Apple Computer, Inc. released their
new operating system, Mac OS 9, which supports voice control. The Appl
e system
provides many of the same features as the VAFK including the push
-
to
-
talk key option.
However, since the Mac OS 9’s speech recognition features are actually embedded within

6

the operating system itself these features can only be used on Apple Comp
uter hardware.
One disadvantage of this speech recognition method is that since it is implemented in
software, it may consume significant system resources. The VAFK on the other
-
hand
can be installed on any PC which supports USB peripherals, including th
e Mac OS 9, and
it will not consume system resources.


7

Chapter 2: Background of Speech Recognition Systems


Speech recognition is the process by which a computer maps an acoustic speech
signal to text or other signals that are understood by the computer.
Some of the current
uses of speech recognition include dictation machines and command input devices.
There are several different forms of speech recognition which may vary in complexity
and styles.

One variation on speech recognition is the type of spea
ker that the system is
designed to understand. A speaker dependent system is developed to operate for a single
user. This method generally requires the user to “train” the system by repeating words
which the system then stores into a database. The advan
tages of this system is that it is
easier to develop, cheaper to buy and it is also more accurate. The disadvantage is that it
is not as flexible as speaker adaptive and speaker independent systems. Speaker
independent systems are developed to operate fo
r any speaker of a particular type (i.e.
American English). However, this system is the most difficult to develop, it is more
expensive and it has a lower accuracy than speaker dependent systems. A speaker
adaptive system is designed to adapt its operati
on to the characteristics of new speakers
and it’s design difficulty lies between the speaker dependent and independent systems.
Another variation in the speech recognition systems is the size of their vocabulary. The
vocabulary sizes range from “small”
(tens of words) to “medium” (hundreds of words) to
“large” (thousands of words) and also “very
-
large” (tens of thousands of words). The
size of the vocabulary can significantly affect the complexity, processing requirements
and the accuracy of the speech
recognition system. One application of a system that has

8

a small vocabulary is a toy (i.e. robot) which has a speech recognition chip (these are
available commercially from Sensory, Inc.). The robot can accept commands such as
“GO”, “Turn” and “Stop.” A

typical application of a very
-
large vocabulary would be
dictation machines which “type as you speak.” The final variation on the complexity of
the system is the manner words are received from the user. Some systems allow for
continuous speech while othe
rs only allow isolated
-
words. An isolated
-
word recognition
system only operates on a single word at a time or by requiring a pause between each
word. This is the simplest method of recognition to implement because it is easier to find
the end points of a

word. There is also less interference caused by other words that are
spoken in the same sentence. A continuous speech recognition system is designed to
operate when words can be spoken together. Continuous speech is more difficult to
handle because of
several effects. When words are spoken together it is more difficult to
determine where the first word ends and the second word begins. Another problem
called “coarticulation” also occurs when words are spoken together. “Coarticulation” is
when the phon
emes (basic sounds of spoken languages) of one word is affected by the
surrounding words.

The main variations in speech recognition systems is the type of speaker they are
designed for: speaker dependent, speaker independent and adaptive systems. The siz
e of
vocabulary also varies with each system and finally some systems allow for continuous
speech while others only allow for isolated
-
word speech. Speech recognition is currently
a technology that is developing rapidly.


9

Chapter 3: Background of the Univ
ersal Serial Bus (USB)


The Universal Serial Bus (USB) technology is a relatively new communication
standard for computer peripherals. The USB standard was designed to handle increasing
demands for easy
-
to
-
use products which could be implemented in a flex
ible and cost
effective method. The USB handles these demands through plug
-
and
-
play, a standard
device framework, automatic error handling, standardized hardware and a wide variety of
data transfer types.

Plug
-
and
-
play allows a peripheral device to be c
onfigured easily and
automatically. When the USB host detects attachments or detachments of peripherals it
loads appropriate software drivers and it determines the bus protocol that the device will
be using. Since these steps happen automatically, comput
er users no longer need to
worry about selecting the right serial port, installing expansion cards, or the technical
headaches of dip switches, jumpers, software drivers, IRQ settings, DMA channels and
I/O addresses. Operating systems which support the US
B have many standard software
drivers built in for the USB devices. Since the drivers are standardized, users will not
need to install software when they purchase new hardware. Another plug
-
and
-
play
feature is that the USB can distribute power on the bus,

so many peripheral products no
longer require separate power supplies. Plug and Play is an industry
-
wide specification
that makes it easy to expand PC functionality.

The USB bus is organized in a "tiered star topology" which means that some USB
devices,

called USB "hubs", can serve as connection ports for other USB peripherals.
Only one device needs to be plugged into the PC. Other devices can then be plugged into
the hub (See diagram below).


10


Figure 2: USB Topology (figure from USB Specification)

Hubs

play an integral role in expanding the world of the PC user. Since hubs can be
embedded in devices such as keyboards and monitors, users don’t have to worry about
expanding their computers.
A hub consists of two portions: the Hub Controller and the
Hub R
epeater. The Hub Repeater is a protocol
-
controlled switch between the upstream
port and downstream ports. The Host Controller provides the interface registers to allow
communication to/from the host. Hub
-
specific status and control commands permit the
ho
st to configure a hub and to monitor and control its ports. The USB host consists of
specialized hardware located on a computer motherboard and software which is usually
provided by the operating system. The host is responsible for the detection of attac
hment
and removal of USB devices, managing control flow between the host and USB devices,
managing data flow between the host and USB devices and providing power to attached
USB devices. Another component of the bus topology is the cable which connects th
e
devices using four wires (VDD, GND, D+, D
-
). These wires are power, ground, and the
differential data drivers. To minimize end user termination problems, USB uses a “keyed
connector” protocol. The physical difference in the Series “A” and “B” connecto
rs insure

11

proper end user connectivity. The “A” connector is designed to connect to upstream
hubs. All USB devices must have an “A” connector. The “B” connector connects to the
USB device which allows vendors to provide detachable cables. There are two

types of
connectors to eliminate illegal loopback connections at hubs.

The device framework of the USB can be broken down into several categories.
The device “enumeration” process (configuration), request processing and the bus
protocol. The enumeration

process begins as soon as you plug the USB device into a
hub. The device must go through several stages before it is enabled for use. When a hub
detects that a new device has been attached, it sends a reset signal to the device and it
requests a unique
address from the host. When the reset is complete, the device is
accessible by sending data to the default address. At this point the device is also powered
by the USB bus although it can only draw up to 100mA. Next the host assigns the device
a new uni
que address which frees the default address for the next USB device to be
attached. In the final step, the host reads the configuration information (device
descriptors) which is stored on the devices ROM.

The USB bus has a standardized protocol which enab
les devices to communicate
with the host. The protocol specifies how “packets” of data are transferred between the
computer and the device. There are three main types of packets: Token, Data and
Handshake. Token packets are used to indicate IN (device t
o host), OUT (host to device)
and SETUP packets (configures the device). Each Token packet also contains address
and endpoint fields. The Handshake packets are used to indicate ACK (receiver accepts
an error
-
free data packet), NAK (device cannot receive o
r transmit data) and STALL
(device has been halted or the request is not supported). Each packet also contains Cyclic

12

Redundancy Check (CRCs


error checking) fields. The CRC for each of the above
packet IDs is just a repetition of the packet ID value (i
.e. ACK = 0010B, ACK+CRC =
00100010B). If an error is found in the packet ID, then the rest of the packet is ignored
and no handshake is returned. The address and endpoint portions of the packet contain
their own CRC. The address and endpoint fields co
ntain 11 bits plus 5 bits for the CRC.
The CRC is capable of detecting and correcting up to 2 errors in the address and
endpoint. Finally there is also a CRC field for the actual Data that is being sent. The
Data packet can contain from 0 to 1023 bytes,

which requires a CRC field of 16 bits.
Finally, the protocol also specifies the types of data transfers that may be used. They
include Control, Interrupt, Isochronous and Bulk transactions.
Control Transfers are used
to configure a device at attach tim
e and can be used for other device
-
specific purposes,
including control of other pipes on the device. Bulk Data Transfers are used for relatively
large and bursty quantities of data and have flexible transmission constraints (i.e. saving a
large file to a
USB hard drive as a background process). Interrupt Data Transfers are
used for input devices which only send a small amount of data at a relatively low
frequency (i.e. keyboards).

Isochronous Data Transfers are used for devices which
require data transfe
rs with minimal delays (also called streaming real time transfers).

Before the USB, communication with peripheral devices was much more
complicated and less reliable. There were several methods that developers could use to
communicate with peripheral devi
ces which include the parallel port, serial ports and
custom internal cards. One problem with the parallel port and serial ports is that only one
device may use them at a time. The serial ports (Comm ports) must also be “declared” as
one of 4 Comm ports
which could have been used by other devices such a internal

13

modems, the mouse and others. If the developer designed a custom internal card, they
would experience some of the same problems. Some additional problems with custom
internal cards includes the
fact that the user may have already used all of their expansion
slots and more technical support is necessary because the devices are harder to install.
Each of these devices must also have custom software drivers installed before the device
can be used.


14

Chapter 4: Speech Recognition Module



The Voice Activated Function Keyboard realizes speech recognition through the
benefit of a technology designed by Sensory Inc., the Voice Direct Module. The Voice
Direct Module is an integrated circuit designed to p
rovide product developers with an
easy
-
to
-
use device for adding speech recognition to virtually any consumer product.
Developers can configure the module in a manner which best suits the needs of the
product it is intended for. The pin diagram for the Vo
ice Direct Module is shown below
and the complete associated module pin description table is available in Appendix A.


Figure 3: Speech Recognition Pin Diagram

The Voice Direct Module itself is available in a kit, which includes all of

the necessary
external devices necessary to perform speech recognition. Items included in the kit are
the microphone, speaker, oscillator, and external memory. Voice Direct’s ease of use and
relatively quick setup time made this technology ideal for the

VAFK.

The Voice Direct Module can be configured in one of two operating modes:
external host
-
controlled or pin
-
configurable stand
-
alone. The stand
-
alone operating
mode is designed to provide a complete recognition system using only the Voice Direct
Mo
dule and the items included in the associated kit previously mentioned. The VAFK

15

employs the Voice Direct Module in the stand
-
alone mode. In order to operate the Voice
Direct Module in this mode one must resistively pull the MODE signal (Module Pin: JP3
-
13) to GND. Once in stand
-
alone mode the functional capabilities of the Voice Direct
Module are then determined by the configuration of a number of I/O pins. Configuration
settings in the design of the VAFK included the training and recognition sensitiv
ity
levels. Both the training and recognition operations have two associated levels of
sensitivity, relaxed and strict. The VAFK has Voice Direct configured in the relaxed
mode for both training and recognition purposes. This is accomplished by leaving
the
TRAIN signal (Module Pin: JP2

11) and the RECOG signal (Module Pin: JP2


10)
open circuit upon powering up the module. The Voice Direct was configured in relaxed
mode to make training easier and reduce the number of recognition errors.


Voice Direc
t is a speaker dependent device, which must be trained by the
designated user before recognition can begin. Training the Voice Direct Module is a
simple user
-
friendly process guided by Voice Direct’s pre
-
programmed speech prompts.
To start the training p
rocess the TRAIN signal (Module Pin: JP2
-
11) must be pulled to
GND for at least 100mS. Voice Direct will then prompt the user to say a word or phrase,
which must be shorter then 3.2 seconds. Voice Direct will then prompt the user to repeat
the word or p
hrase and then calculate an average of the two speech patterns. The speech
pattern’s Voice Direct generates is based on a digital reconstruction of the spoken voice
command. During training several errors may occur causing the Voice Direct to
terminate t
he training process. Some common conditions that may cause errors include
the speaker not being consistent or clear, there being too much background noise, or a
similar word or phrase has already been recorded. If no errors occurred during training

16

the n
ew speech template is added to the existing word set in the 8 Kbyte serial EEPROM.
New words or phrases can be added to the set at any time up to a maximum of 15. Once
trained the Voice Direct Module is ready to begin speech recognition.


The speech rec
ognition process is just as user
-
friendly as the training process.
The speech recognition process uses similar pre
-
programmed speech prompts in order to
guide the user. Recognition begins when the RECOG signal (Module Pin: JP2
-

10) is
pulled to GND for
at least 100mS. Voice Direct will then prompt the user to say a word
or phrase. This in turn produces a new speech pattern which Voice Direct uses to
compare with the stored templates and determine which word has been spoken. If no
match is found Voice
Direct will prompt “Word not recognized” and exit recognition
mode. When a trained word is successfully recognized, the associated output pins will
pulse high for the duration of one second. These outputs are then used by the next stage
of the VAFK’s des
ign that interfaces the Voice Direct Module with the USB circuit.


17

Chapter 5: Voice Direct/USB Circuit Interface


The Voice Direct/USB circuit interface enables the Voice Direct Module to
communicate with the USB circuit. The interface utilizes handshakin
g signals between
the two in order to translate the outputs of the Voice Direct Module into appropriate USB
keyboard scan codes. This was one of the major design areas associated with the VAFK
project. Before any interface design could be started, certai
n specifications needed to be
clarified in terms of how the signals between the two circuits would relate to one another.
The following block diagram depicts the signals required for communication between the
interface and the USB circuit.



Interface

USB Circuit


Key Bus P0.0:7


INT P1.0


Write

P1.1


ACK P1.2


Figure 4: Voice Direct/USB Circuit Interface Signals

The VAFK will be designed with a system of asynchro
nous signals, which
translate the function code generated by the Voice Direct Module to the USB circuit. The
signal protocol established for the interface is as follows:

A. IDLE STATE: INT = Low; Write = Low; ACK = Low;

B. “Talk Key Pressed”, Generate

Interrupt: INT = High; Write = Low; ACK = Low;

C. When data is ready, assert Write: INT = High; Write = High; ACK = Low;

D. Hold Write High until ACK goes High: INT = High; Write = High; ACK = High;


18

E. De
-
assert Write and prepare next key: INT = High;
Write = Low; ACK = High;

F. Wait until ACK goes Low: INT = High; Write = Low; ACK = Low;

G. Repeat steps C to F for each Key’s Scan Code:

H.

Return to IDLE STATE: INT = Low; Write = Low; ACK = Low;


Once the specifications for the signal protocol was estab
lished we were able to
begin designing the Voice Direct/USB Circuit interface. The design of the interface
could have been implemented in numerous ways. However, only two main options were
seriously considered. The first option considered, entailed des
igning the entire interface
circuit as a finite state machine using logic gates and flip
-
flops. While this
implementation seemed possible we determined that it would be far too complicated and
would provide little flexibility to future changes. In additi
on, the immense complexity
associated with this design method would no doubt lead to numerous errors and bugs.
For these reasons we chose to implement the interface mostly in microcode, which inturn
simplifies the design significantly while offering great
er flexibility. With the
microcoding implementation, most of the “handshaking” signals and the necessary data
that needs to be transferred from one circuit to another can be encoded into ROM’s.
Additionally, by using microcode the words and functions tha
t have been defined for this
project can be easily changed or adapted to meet future needs.


The interface itself consists of three vital parts. The first part decodes the output
of the Voice Direct Module and determines which function should be generated
. The
second part of the interface provides the appropriate keyboard scan codes to the USB
controller. The USB controller will then use these scan codes to perform the desired
Window’s NT function on the host computer as if they came directly from a tr
aditional
keyboard. The final part of the interface deals asserting and de
-
asserting the interrupt

19

signal to the USB circuit in accordance to the before mentioned protocol. The block
diagram of the Voice Direct/USB Circuit interface is depicted below.


Figure 5: Block Diagram of Voice Direct/USB Circuit Interface

The Voice Direct Module has eight output pins. These outputs need to be
decoded in order to determine which function must be generated. The data transferred to
the USB controller is dependent

upon which instruction we are trying to execute. Our
interface will be designed to implement the instructions explained earlier. When, an
instruction is recognized, by the Voice Direct Module a certain pattern of bits will be
asserted on it’s output pin
s. Because the output of each recognized word is unique, the
outputs of the speech recognition module can be used to directly address an 8
-
bit ROM.
Each unique location in the 8
-
bit ROM can then contain the unique address of the
instructions scan codes l
ocated somewhere in the 16
-
bit ROM. Hence, the 8
-
bit ROM
performs the necessary decoding of the Voice Direct Module outputs.


A second 16
-
bit ROM and a counter is needed to satisfy a specification of
USB controller, which is the USB controller can only ac
cept one byte of data at a time.


20

The instructions that are to be implemented are more complex that simply implementing
simple keyboard letters and numbers. For example, the instruction “Copy” involves two
bytes of data. The first byte of data that is se
nt to the USB controller is called a modifier
byte. This byte of data tells the controller that function keys such as the Control or Alt
keys have been pressed. The second byte of data tells the controller what other keys have
been pressed along with the

function keys. In the case of the “Copy” instruction, the first
byte of data sent to the USB controller will be the modifier byte corresponding to the
Control key being pressed. The second byte of data will be the keyboard scan code for
the “C” button b
eing pressed. These two bytes of data need to be sent to the USB
controller sequentially according to the interface handshaking signals. The following
table lists the modifier bytes and keyboard codes for each of the commands the VAFK
will implement.

W
indows NT Command

Key Equivalent

Modifier Byte

Key byte

Close

Alt
-

F4

00000100

00111101

Task Switch

Alt
-

Tab

00000100

00101011

Copy

Ctrl
-

c

00000001

00000110

Paste

Ctrl
-

v

00000001

00011001

Select All

Ctrl
-

a

00000001

00000100

Start Menu

Ctrl
-

Esc

00000001

00101001

Security Menu

Ctrl
-

Alt
-

Esc

00000101

00101010

Help

F1

00000000

00101010

Figure 6: USB Scan Codes for VAFK commands

The modifier byte and scan code for each instruction that will be implemented will be
stored in two consecutive R
OM addresses in the 16
-
bit ROM. The location of the
modifier byte for each instruction is stored in the 8
-
bit ROM. When the speech
recognition circuit recognizes that “Copy” was said, it asserts the appropriate outputs.
These outputs are fed into the 8
-
bit ROM and the address of the modifier byte of the
“Copy” command will be on the ROM outputs. The ROM outputs are then loaded into a

21

counter. When the counter is loaded and enabled, its outputs will assert the address in the
16
-
bit ROM that corresponds
to the “Copy” modifier byte. The “Copy” modifier byte
will now be on the outputs of the 16
-
bit ROM, which are input into the USB controller.
The data is now set up and a write signal needs to be sent to the USB controller. This
signal will be bit 15 of
the 16
-
bit ROM’s outputs meaning that the write signal will be
sent to the USB controller at the same time as the modifier byte. When the USB
controller accepts the modifier byte it will send back the Acknowledge signal. This
signal will stay high until
the write signal is unasserted. The Acknowledge signal will be
used to disable the outputs of the 16
-
bit ROM, which will automatically de
-
assert the
write signal. The USB controller will then be waiting for the next byte of data. The
Acknowledge will al
so be fed into the clock signal of the counter. When the
Acknowledge signal is asserted it will cause the counter to increment. When the counter
increments, the second byte of data will be on the outputs of the 16
-
bit ROM. The USB
controller will now ha
ve both bytes of data needed to complete the “Copy” command. At
this point, the Interrupt to the USB controller can be unasserted to tell the USB controller
that no further data is coming.


The Interrupt signal is the only signal that cannot be easily imp
lemented using
microcode. This signal will be generated using a simple sequence detector. This is a
simple finite state machine. It will have two inputs, the Recognize signal and the
Acknowledge signal. The state diagram for this state machine is as fo
llows.








22

Inputs/Outputs : Rec, Ack / Int







1,0



0,1




0,1











Figure 7: State Diagram for Interrupt Signal

The operation of this state machine is simple. Pushing the Recognize button will trigger
the interrupt to the USB controller. The Recognize button is the button that is pressed to
activate the sp
eech recognition circuit. The state machine will wait for the Acknowledge
signal to be asserted twice. When it is asserted twice, two bytes of data have been sent to
the USB controller. The state machine will turn the interrupt off at this point and the

USB controller will perform the command that was input into the speech recognition
circuit.

1
0


1


0
0



23

Chapter 6: USB Circuit Results


The design process for the USB circuit has been primarily high level design with
extensive research. Research has been done on va
rious USB chips, the USB specification
and hardware issues. The high level design includes the USB protocol and firmware.
For a more complete description of the USB, see Appendix A: Background of the USB.

During the course of this project, we have consid
ered using several USB chips
including the Intel 8x931AA, the Cypress CY7C63000A and others. The purpose of a
USB chip is to simplify the design process. These chips are actually microcontrollers
with a “USB Engine” built on top. The “USB Engine” is des
igned to abstract away the
details of how the communication protocol works. For example, this allows you to send
data in parallel to the “USB Engine” and it converts this into the serial packets which can
be sent over the bus. Each of the above chips has

advantages and disadvantages. Here is
a quick overview of their capabilities.


Intel 8x931

Cypress CY7C63000A

Built in Hub

Optional

No

# of external ports

4 x 8 bits

1 x 8 bits, 1 x 4 bits

# of instructions

51

35

External ROM

Yes

No

Internal ROM

Opt
ional (Factory Programmed)

Yes (User Programmed)

Amount of ROM/RAM

64K Bytes/256 Bytes

2K Bytes/128 Bytes

# of pins

64

20

Figure 8: USB Chip Comparison


24

The Intel 8x931 chip is a more advanced chip with more features. The extra ROM and
RAM will allow de
signers to create more complex devices. The extra instructions can
make it easier to program the firmware and allows for more complex designs. The extra
external ports are also useful in creating more complex designs, however two of the ports
are require
d if external ROM is used. The other ports can be programmed to function as
a “keyboard scan utility”, external interrupts, timers and a Serial I/O port. Each of these
utilities are defined and documented by Intel so that other developers can easily inte
grate
these features into their products. The built in hub will allow designers to add up to 4
additional downstream ports to their device. This could be used on a computer monitor
to provide a central point which additional USB devices could easily be a
ttached. The
hub is essentially a “repeater” which sends the commands from the host to the
downstream devices. The Cypress CY7C63000A chip on the other hand is designed for
less complex designs. Less complex designs will not require a hub, they will not

need as
many external ports, they won’t need a complex instruction set, and they won’t need as
much memory. The Cypress chip is ideal for such peripheral devices as keyboards, a
mouse, joysticks and others. The Cypress chip has another additional advant
age which
make it ideal for peripheral devices. Since the ROM is all internal, there will be less
external circuitry which will reduce the cost and the complexity. The final difference
between these two chips is in the starter kits which are provided. T
he Intel chip is about
$200 while the Cypress chip is only $100. The Intel kit and the Cypress kit both come
with all necessary circuitry attached (USB port, memory, clock oscillator, …) and they
both come with software necessary for compiling the assembl
y code. (Note: the Cypress
kit also contains two backup USB chips) The Cypress kit on the other hand also comes

25

with some very useful development tools. They have included a complete sample
application (Thermometer) which is very useful in understanding

some of the
implementation details. Finally, the kit also contains an EPROM writer with software
that can write the firmware programs into the chip. We have decided to use the Cypress
kit because it contains a sample application, it is easier to program

and less external
hardware required for our application.

About 90% of the USB design will be writing firmware routines to control the
USB chip. The firmware routines are written in assembly code and compiled into
machine code by an assembler. This machi
ne code can be directly downloaded to the
chip by using the EPROM writer. The assembly code can be broken down into two
major functions, the USB protocol and the keyboard interface. A flowchart for the
firmware routines is given below.









Figure 9: USB Firmware Flowchart

The code for the USB protocol will be very similar to the sample application provided in
the kit. The USB protocol code is responsible for handling standard device requests,
E
nter Main Routine from
Device Reset

Initalize Chip (Registers,
Ports, …)

Poll for Interrupt

Read Keyboard Data

Send Data to Host

Perform Enumeration by
Host Computer


26

performing enumeration (configuring the chi
p as a keyboard) and initializing the device.
The USB protocol function is called when the host sends a command to the device,
causing a function interrupt. The enumeration process of the USB chip has some
additional constraints placed on it by the USB s
pecification. Until the enumeration
process is completed the device may only draw up to 500

A. The host will read the
configuration descriptor which will tell it the maximum power required by the device.
When the enumeration process is complete then the

device may consume up to this
maximum power. The device must also respond to the default address (0) until a new
unique address is assigned by the host. There are also some keyboard specific functions
which are responsible for initializing the speech ci
rcuit, determining when an interrupt
occurs, reading which key/keys are being pressed and sending the keys to the computer.
In order to initialize the speech circuit, the firmware just has to enable power to the
speech circuit. Power is initially disable
d to the speech circuit because the USB device
can only draw up to 500

A according to the USB specification. After the device has
been enumerated, it may draw an amount which was determined in the enumeration
phase. In order to determine when an interrup
t has occurred, the “main” function loops
continuously, polling the interrupt pin to determine when it is active. When the interrupt
is detected, a function will be called to read in the data. After all the data is read in,
another function will transmit

the data to the host.


The Universal Serial Bus technology has been very useful in the design of the
VAFK for several reasons. The USB protocol simplified the transmission of data from
the keyboard to the computer, it simplified the hardware design and i
t was more flexible
then traditional methods.


27

CONCLUSION


Speech recognition is no longer a technology restricted to science fiction.
Today, most major corporations recognize the tremendous potential of such technology
and are beginning to incorporate
it into everyday products. Kim Silverman, manager of
spoken language technologies at Apple, said, “Apple’s design philosophy is that people
should just be able to take the machine out of the box and command it by voice.”

The
goal of speech recognition tec
hnology is to mimic the fundamental human ability of
listening. Thus freeing the hands and eyes and requiring less training to use.


With this in mind the VAFK has been designed to take advantage of new cutting
edge technologies such as speech recognition

and the USB protocol to create a
revolutionary product. The VAFK takes advantage of a high performance, speech
recognition chip to simplify the design process and to reduce the cost of development.
The chip provides high accuracy (99%) recognition, a se
lf contained module with a user
-
friendly interface. The Voice Direct Module was ideal to interface with the USB bus due
to its simple output format. This was critical for our design which was based on
providing users with plug
-
n
-
play which could easily b
e realized by the USB protocol.
The plug
-
n
-
play concept is to make computers as easy to use as possible. The VAFK is
an alternative computer interface which makes computers much easier to use by using
speech recognition and the USB.


Next semester, we
plan to take the VAFK from the design stage to the hardware
implementation stage. The hardware implementation stage includes programming
ROMs, hardwiring circuits, and constructing a case. We also plan on replacing the
independent power source of the spe
ech recognition circuit with the USB built in power

28

supply. We will look into enhancing the VAFK by using the Voice Directs external host
mode. The external host mode will allow us to increase the number of functions
recognized and add multiple users. W
e may also use a more advanced speech
recognition chip such as one with continuous recognition, speaker independent
recognition, speaker verification and speech synthesis.


29

References


[1] Cypress Semiconductor Corporation. “CY7C6300A, CY7C6300A, CY7C6300
A,
CY7C6300A Universal Serial Bus Microcontroller,” San Jose: Cypress
Semiconductor Corporation, 1998.


[2]
Intel Corporation. “8
x
931AA, 8
x
931HA Universal Serial Bus Peripheral Controller
User’s Manual,” Mt. Prospect: Intel Corporation, 1997.


[3] Sensory Inc. “Voice Direct Data Book, Interactive Speech” Sensory Inc. 1998.


[4] Sensory Inc. “Voice Direct Speech Recognition Kit” Sensory Inc. 1999.


[5] “Universal Serial Bus Specification, Revision 1.1,”
Compaq Computer Corporation,
Intel Corpo
ration, Microsoft Corporation, NEC Corporation. 1998.


[6] “USB Device Class Definition for Human Interface Devices (HID), Version 1.1,”
USB Implementers’ Forum. 1999.


[7] “USB HID Usage Tables, Version 1.1,” USB Implementers’ Forum. 1999.


[8] William S.

Meisel, “Voice Control on PC’s,” Speech Recognition Update, vol. 76, pp
4
-
5, Oct. 1999.