Smart Cameras: A Review

aroocarmineΤεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

157 εμφανίσεις

Smart Cameras: A Review
Yu Shi
, Serge Lichman
Interfaces, Machines And Graphical ENvironments (IMAGEN)
National Information and Communications Technology Australia (NICTA)
Australian Technology Park, Bay 15 Locomotive Workshop
Eveleigh, NSW 1430, Australia
*Corresponding Author. Tel.: +61 2 8374 5565; Fax: +61 2 8374 5527.
E-mail Addresses:,

Smart cameras are cameras that can perform tasks far beyond simply taking photos and recording videos. Thanks to
the purposely built-in intelligent image processing and pattern recognition algorithms, smart cameras can detect
motion, measure objects, read vehicle number plates, and even recognize human behaviors. They are essential
components to build active and automated control systems for many applications, and they will play significant role in
our daily life in the near future. This paper aims to provide a first comprehensive review of smart camera technologies
and applications. Here, we analyse the reasons behind the recent rapid growth of the smart cameras, discuss different
categories of them and review their system architectures. We also examine their intelligent algorithms, features and
applications. Finally we conclude with a discussion on design issues, challenges and future technological directions.
Keywords: smart cameras, pattern recognition, machine vision, computer vision, video surveillance, embedded
1 Introduction
What is a smart camera? Different researchers and camera manufacturers offer different definitions.
There does not seem to be a well-established and agreed-upon definition in either the video surveillance
or machine vision industries, probably the two most active and advanced applications for smart cameras
at present. For the purpose of this paper, we define a smart camera as a vision system in which the
primary function is to produce a high-level understanding of the imaged scene and generate application-
specific data to be used in an autonomous and intelligent system. The idea of smart cameras is to convert
data to knowledge by processing information where it becomes available, and transmit only results that
are at a higher level of abstraction. A smart camera is ‘smart’ because it performs application specific
information processing (ASIP), the goal of which is usually not to provide better quality images for
human viewing but to understand and describe what is happening in the images for the purpose of better
decision-making in an automated control system. For example, a motion-triggered surveillance camera
captures video of a scene, detects motion in the region of interest, and raises an alarm when the detected
motion satisfies certain criteria. In this case, the ASIP is motion detection and alarm generation.
The important differences between a smart camera and “normal” cameras, such as consumer digital
cameras and camcorders, lie in two aspects. The first is in camera system architecture. A smart camera
usually has a special image processing unit containing one or more high performance microprocessors to
run intelligent ASIP algorithms, in which the primary objective is not to improve images quality but to
extract information and knowledge from images. The image processing hardware in normal cameras is
usually simpler and less powerful with the main aim being to achieve good visual image quality. The
other main difference is in the primary camera output. A smart camera outputs either the features
extracted from the captured images or a high-level description of the scene, which is fed into an
automated control system, while for normal cameras the primary output is the processed version of the
captured images for human consumption. For this reason, normal video cameras have large output
bandwidth requirements (in direct proportion to the resolution of the image sensor used), while smart
camera can have very low data bandwidth requirements at the output (it can be just one bit in the simplest
case, with ‘1’ meaning ‘there is motion’ and ‘0’ meaning ‘there is no motion’, for example). These
differences are illustrated in figure 1.
video to TV display or digital display
for human consumption
app. specific data
generation and
meta data to an automated
control system for decision making
image/video output
generation and

Figure 1: Differences between a normal camera (a) and a smart camera (b).
Smart cameras can exist where a camera is not expected to be. A good example is the ubiquitous optical
mouse for PC. Most optical mice contain a miniature digital video camera inside the mouse casing. They
work by shining a bright light onto the surface below, then using a camera to take up to 1 500 pictures a
second of that surface. An intelligent image processing circuit inside the mouse performs image
enhancement and calculates the mouse motion based on image difference between successive frames.
This difference is then used to displace the mouse cursor on the screen. The optical mouse is a good
example of a smart camera in three respects: firstly it’s a stand-alone camera with camera and processing
in a single embedded device; secondly the camera is used not to take pictures or video for human
consumption, but to produce a feature vector (motion vector of in x and y directions) to represent the
object (the mouse in this case) displacement; thirdly it shows that smart cameras are not restricted to a
niche market, but can be adopted ubiquitously.
Strictly speaking, a smart camera is a stand-alone, self-contained device that integrates image sensing,
ASIP and communications in one single box. It is designed for a special type of application (for example,
surveillance, and industrial machine vision). However, there are other types of vision systems that are
often referred to as smart cameras as well, such as PC-based smart cameras. We’ll analyze these different
types of smart cameras in section 3. The term ‘smart camera’ in this paper covers both stand-alone smart
cameras and other types of smart cameras, as described in section 3.1, unless specified otherwise.
The advent of smart cameras can be traced back to the early 1990s when PCs became popular and video
frame grabbers became available. Early solid state CCD (Charge-Coupled Device) cameras of the mid-
1970s were analog cameras. Later digital signal processing (DSP) technologies pushed analog CCD
cameras into the digital era with enhanced image quality, but the output of most of these cameras was still
analog (e.g. NTSC/PAL signals). Frame grabbers allowed CCD cameras with analog output to be
connected to computers and digitized for versatile processing by computers. This marked the beginning of
smart camera systems, with the camera performing image capture and computer carrying out intelligent
processing tasks such as motion detection and shape recognition. The first applications were in the area of
industrial machine vision and surveillance.
The real interest in and the growth of smart cameras started in late 1990s and early 2000s, spurred by
factors such as technological advancements in chip manufacturing, embedded system design, the coming-
of-age of CMOS (Complementary Metal Oxide Semiconductor) image sensors and so on. Market
demands from surveillance and machine vision also played significant roles. Advanced smart camera
systems often integrate the latest technologies in image sensors, optics, imaging systems, embedded
systems, computer vision, video analysis and communication, networking and etc.
The heart of smart cameras is the intelligent ASIP algorithms and the hardware that runs them. Image
feature extraction and pattern recognition are probably among the most widely used algorithms in smart
cameras. In a way, a smart camera can be thought of as an image feature extractor or a visual pattern
recognizer. Research in computer vision, image understanding and pattern recognition has yielded many
algorithms and solutions that can be used by smart cameras. However, the performance and robustness of
the ASIP algorithms when deployed into cameras operating under real-world conditions are among the
most important issues facing the development and commercialization of new smart cameras.
In the remainder of this paper, we analyze the main reasons behind the rapid growth of smart cameras
(section 2), review system architectures of different smart cameras (section 3), review the state-of-the-art
smart camera systems and ASIP algorithms for some applications (section 4), and finally discuss some
design issues and conclude with some thoughts about technical challenges and future technological
directions (section 5).
2 The Rapid Growth of Smart Cameras
2.1 Coming of Age of CMOS Image Sensors
The advent of CMOS image sensors (CIS) in late 1990s played an important role in the development of
smart camera technology and systems, and has potential to make smart camera smaller, cheaper and more
pervasive. Compared to CCD, CIS have several advantages which make them excellent candidates for
smart camera front-end. These include smaller size, cheaper manufacturing cost, lower power
consumption, the ability to build a camera-on-a-chip, the ability to integrate intelligent processing circuits
onto the sensor chip, and significantly simplified camera system design.
Most CIS’s are manufactured using the same process by which semiconductor chips (CPUs, memories,
etc) are made. This means that many semiconductor manufacturers can make CIS, which drives up
competition and reduces cost. CCD sensors, by contrast, are made using special chip manufacturing
process and there are only a few manufacturers in the world, mostly in Japan. CCD-based camera chip-
sets usually include at least three or four chips: a CCD pixel array, CDS (Correlated Double Sampling), a
timing generator, and ADC (Analog-to-Digital Converter). In the case of CIS, all these functions can be
integrated onto one single chip, making it a real camera-on-a-chip with light in and pixel out. This greatly
simplifies camera system design and reduces cost. Compared with the CCD chip-set, there are many more
sources from which a CIS can be purchased, even a single item at a time, which is very difficult to
achieve in the case of CCD. All this makes it much easier for more researchers, students, and camera
manufacturers alike to develop smart cameras of their own.
Probably the most important advantage of CIS over CCD lies in its ability to have image sensor array
and intelligent image processing circuits side by side on the same chip. This makes a single chip smart
camera possible. One example is a vision-based single-chip fingerprint reader with on-chip CIS, a
processing circuitry performing pattern matching and a memory storing templates of one or several user
fingerprints for real-time comparison and identification [1].
A recent market survey by Gartner Dataquest [2] estimated that there are about 40 suppliers of CIS
world-wide, and that the global CIS market would increase from $3.2 billion in 2005 to 5.6 billion by
2008. The survey showed that automobile, medical imaging and surveillance applications are among the
emerging markets for CIS products.
2.2 Research in Computer Vision and Pattern Recognition
What makes a camera smart is the intelligent ASIP - the application-specific information processor
built into the camera system. The advancement in academic and industrial research in real-time image
processing and understanding, pattern recognition, machine learning, computer vision and video
communication continues to provide a large library of intelligent algorithms for use by smart cameras for
different applications. As an example, Intel’s OpenCV (Open Source Computer Vision) Library [3] has
been very popular with academic researchers and students working on smart camera projects. Every year,
numerous international journals, conferences and workshops give researchers world-wide forums to
present their innovative work in areas such as computer vision and pattern recognition. A lot of the work
presented can be seen as embryos of future smart cameras. Recently, first ever international conferences
and workshops have been held focusing on the design of embedded vision systems.
2.3 Embedded System Technologies
A stand-alone smart camera is essentially an embedded vision system. Compared with PC-based
systems, an embedded system is usually subject to many constraints on the design, implementation and
production of the device which encapsulates it, such as low power, limited resources, real-time processing
and low cost. An embedded vision system is even more challenging to design due to video processing’s
insatiable demand for computing power and memory resources. In the last decade, embedded vision
systems have made great progress thanks to the increasing affordability of powerful processors and
memory chips, availability of real-time operating systems, low complexity intelligent algorithms and the
coming-of-age of system development software and tools.
Functional integration seems to be a trend in consumer electronics and ICT (Information and
Communications Technology). For example, many cellular phones now come with a camera and can play
music and receive radio. Some webcams have built-in intelligence such as face tracking. Functional
integration can seemingly make a normal camera become smart. For example, a camera with an
integrated voice/sound detection component can take a picture of the surrounding area when a human
voice is detected, or it can take a picture in a direction from which a gun-shot has been detected [4].
2.4 Socio-Economical Drivers
Thanks to Moore’s law, semiconductor chips and computer hardware continue to shrink in size, reduce
in cost and gain in performance. This has driven the prices of cameras, frame grabbers and computers
down and made smart camera systems, especially PC-based systems, more affordable to research and
development on one hand and to the market and end-users on the other. As hardware constraints (cost-
wise) are lifted, software developers have more freedom to write "smarter" algorithms.
One of the most significant developments in surveillance and security industries in the last several years
has been the wide use of CCTV (Closed Circuit Television) cameras and their impact on crime, terrorist
attacks, and on the general public. It is noticeable that after the 9/11 event in the US, video surveillance
has received more attention not only from the academic community, but also from industry and
governments. The recent terrorist attacks in the London Underground in mid-2005 and the successful use
of CCTV by police in identification of perpetrators have intensified the talk about a new generation of
intelligent video surveillance systems based on smart cameras. In fact, surveillance and security demands
are an important driving force behind the ever-increasing scale of academic and industrial research in
advanced vision algorithms such as object tracking and identification, and human behavior analysis.
2.5 Market Demands and Analysis
2.5.1 Digital Video Surveillance
The first generation of CCTV cameras (1980s-1990s) was mostly analog cameras with limited
functionality and high cost. Digital CCTV cameras and the use of DVR (Digital Video Recorders)
represented the second generation (2G, 1990s-now). Digital CCTV cameras built using CCD and CMOS
image sensors provide better video quality, some intelligent functions such as motion detection, electronic
PTZ (Pan-Tilt-Zooming), and networking. The 2G CCTV systems have become mass market products,
fuelled by improved affordability and society’s increasing concerns over safety and security. According
to estimates made in 2004 by market research firm Datamonitor [5], digital video surveillance is a high-
growth segment within the overall surveillance market estimated at 55% CAGR (Compound Annual
Growth Rate) between 2003 and 2007. In dollar terms, between 2003 and 2007 the market will grow from
US$1.3bn to US$7.4bn globally.
However, the 2G CCTV systems are not “smart” enough to help prevent crimes or terror attacks, even
though they proved very useful in post-event identification of crime perpetrators. The 2G CCTV systems
are mostly not automated systems and rely strongly on trained security personnel to perform image
analysis, object tracking and identification. The increasing number of cameras makes this difficult for
real-time analysis by security personnel. Network bandwidth is another important issue affecting real-
time processing needed for crime prevention. The intelligent video surveillance system (IVSS) (also
called the third generation CCTV system) will try to provide solutions to these problems. Smart cameras
will be one of the fundamental building blocks of the IVSS, making it possible to build and deploy
automated, distributed and intelligent multi-sensory surveillance systems capable of tracking humans and
suspected objects, analyzing human behaviors, and etc. Many market research firms have predicted
significant growth in intelligent video systems and smart cameras. For example, the market researcher
Frost & Sullivan [6] has forecast that the US$153.7 million video surveillance software market is
expected to witness a healthy CAGR of 23.4% from 2004 to 2011 to reach US$670.7 million.
2.5.2 Industry Machine Vision
Industrial machine vision is probably the birth place of smart cameras, at least in terms of the
systematic use of commercial smart cameras. It is also one of their most active playgrounds. Most
machine vision smart cameras are stand-alone cameras. The demand for these cameras has been steadily
increasing over the years. The major end user industries are in robotics, semiconductor, electronics,
pharmaceutical, manufacturing, food, plastics and printing. The tasks these smart cameras usually
perform include bar-code reading, part inspection, flaw detection, surface inspection, dimensional
measurement, assembly verification, print verification, object sorting, OCR (optical character
recognition) and maintenance. A recent survey on machine vision products from a Europe based market
research firm IMS Research [7] has discovered that smart cameras are rapidly accounting for a greater
share of the machine vision market revenue. Demand for smart cameras is primarily driven by the
increasing demand for better production efficiency and quality control in industries such as manufacturing
and medicine / pharmaceutical. The survey revealed that whilst the sale of more traditional PC-based
products (cameras and frame grabbers) has fallen, sales of smart cameras and compact vision systems
have continued to grow. The survey predicts that the machine vision market in Europe will grow at an
average rate of 11.6% each year to 2006. The highest levels of growth, approaching 20%, are forecast for
the smart sensor and cameras product groups resulting in more than doubling in value in dollar terms. The
same trend has also been forecast by the same company for the Asia-Pacific market [8]. An estimate
provided by the annual market study by the AIA (Automated Imaging Association) for the 2003 North
American machine vision smart camera market is about $57 million US dollars, with growth at 15% per
year in terms of revenues and 20% per year in terms of units [9].
2.5.3 Other Significant Markets
Other important markets for smart cameras are ITS (Intelligent Transport Systems), automobiles, HCI
(Human Computer Interface), medical/healthcare, games, toys, video conferencing, biometrics.
3 Review of Smart Camera System Architectures
In recent years, smart cameras have attracted considerable attention from academic and industrial
research and development (R&D) organizations. However, to the best of the authors’ knowledge, a
systematic approach to analyzing smart cameras has yet to be agreed-upon. In this section we firstly
present one approach to classify smart camera systems and provide an analysis of their system
architectures, followed by a review of some R&D activities on the design of smart cameras as embedded
3.1 Classification of Smart Cameras
Smart cameras can come in different system and physical configurations. Figure 2 shows one proposed
classification of different types of vision systems and smart cameras.
Vision Systems
Vision Systems
PC based
Vision Systems
Network based
Vision Systems
Vision Systems
Smart Cameras
Non Stand-alone
Smart Cameras
Single Chip
Smart Cameras
Smart Cameras
Smart Cameras
Other types of
Smart Cameras

Figure 2: One proposed classification of vision systems and smart cameras.
As shown in Figure 2, stand-alone smart cameras are a subset of embedded vision systems. Non-stand-
alone embedded smart cameras are sometimes called compact vision systems. Compact vision systems
are usually composed of general purpose cameras connected to an external embedded processing unit in a
separate box to provide ASIP and communication/networking functionality. Single-chip smart cameras
can be thought of as a special case of smart cameras because they require special system design
considerations and are usually used in carefully targeted applications. Non-stand-alone smart cameras can
be thought of as virtual smart cameras because from user point of view the cameras are smart, even
though the ASIP which makes them smart may be performed by an external unit, like a hardware
accelerator board, a local PC or a networked PC. PC-based smart cameras, consisting of a general purpose
video camera, a frame-grabber of some sort and a PC, of which the CPU performs the ASIP, is a very
common and inexpensive platform for researchers, academics and students to conduct research on smart
cameras. Sometimes a normal camera is connected to a PCI (Peripheral Component Interconnect)
processing board within a PC. In this case, the PCI board may perform most of the ASIP and output
generation, while the PC provides a flexible operator interface or additional processing power. This kind
of system is a special case of a compact vision system and a PC-based system. A digital CCTV
surveillance system with intelligent features is an example of a network-based smart camera system, and
the next generation of distributed intelligent video surveillance systems will be the exciting test ground
for smart cameras, especially stand-alone smart cameras. Hybrid vision systems may give rise to some
special types of smart cameras. This category may also include smart camera systems that may need some
kind of human intervention to help provide high accuracy data output.
3.2 Analysis of Different Types of Smart Cameras
3.2.1 Common Characteristics
The common basic components of a normal digital video camera (consumer, professional or industrial)
include optics, solid-state image sensor (CCD or CMOS), image processor(s) and supporting hardware,
output generator, and communication ports. The main tasks performed by the image processor(s) are to
provide color interpolation, color correction or saturation, gamma correction, image enhancement and
camera control such as white balance and exposure control. The output generator can be an NTSC/PAL
encoder to provide standard TV-compatible output, or a video compression engine to provide compressed
video streams for communication over network, or digital video output generator such as a Firewire
encoder. Communication ports, such as Ethernet or RS232 provide the basis for networked camera
functionality or camera configuration and firmware upgrading through a PC respectively.
The main basic components of a smart camera typically exhibit all the above essential components of a
normal camera, with the following differences:
• A smart camera has a distinct and powerful signal processing unit to perform image feature
extraction and/or pattern analysis based on application-specific requirements; and
• A smart camera has an output generator to produce a coded representation of the image features
and/or results from the pattern matching, or in some cases, control signals for other devices (e.g.
alarm triggering signal) or actions (e.g. sending a picture of the number plate of a car which is
speeding to police).
System architecture design for smart cameras often involves significant system engineering effort.
Clear application requirements and specifications are crucial to the successful design. Software
architecture, hardware architectures, and network architecture for network-based systems, need to be
jointly designed to maximize resource usage and efficiency, and to reduce cost and time-to-completion.
More detailed design considerations are discussed in section 5.1.
3.2.2 Stand-alone Smart Cameras
A stand-alone smart camera integrates image capture, ASIP and application specific output generation
into a single device casing. A stand-alone smart camera may look very much like a normal industrial
camera or a CCTV camera. While the primary function of a normal camera is to provide raw video for
monitoring and recording, a smart camera is usually designed to perform specific, repetitive, high-speed
and high-accuracy tasks in industries such as machine vision and surveillance. Most of the industry
machine vision cameras are stand-alone smart cameras. While a normal video camera may only cost
anywhere between US$50 and US$2 000, a machine vision smart camera can cost between US$1 000 and
$6 000 per unit [10] and beyond, depending on the functionality and level of customization.
Many pattern recognition techniques involve two types of processing tasks, data-intensive tasks such as
image enhancement and feature extraction, and math-intensive tasks such as statistical pattern matching.
While data-intensive tasks require high speed hardware to deal with high pixel volume and high frame
rate, math-intensive tasks often require high performance processors to deal with issues such as pipelining
and floating-point arithmetic. For demanding applications, camera hardware architecture may be based on
a heterogeneous- and multiple-processor platform, with one or more processor(s) capable of
implementing parallel processing (e.g. an FPGA - Field Programmable Gate Array) performing data
intensive tasks, and a DSP and/or a RISC (Reduced Instruction Set Computer) processor performing
math-intensive tasks. A smart camera built for face detection and recognition application by Broers et al.
[11] is such an example. The system employs an FPGA and a parallel processor Xetal working in SIMD
(Single Instruction Multiple Data) mode, to perform data intensive operations such as face detection. A
high performance DSP, TriMedia, with a VLIW (Very Long Instruction Word) core is used to perform
high level programs such as face recognition. The system architecture can be represented as in Figure 3.
Image sensor and
AFE/ADC Blocks
network interfaces
Math-Intensive Processing
System Data Bus
Data-Intensive Processing

Figure 3: A stand-alone smart camera system architecture for face recognition [11].
3.2.3 Single-Chip Smart Cameras
Single-board or single-chip smart cameras are a special kind of stand-alone smart camera. Single chip
smart cameras take advantage of the integration capability of CMOS image sensors by building intelligent
ASIP circuits onto the image sensor chip, potentially releasing the host computer of cumbersome pixel
processing tasks and minimizing the data transfer between camera and computer. In some cases, pixel-
level ADC and processing can be achieved [12], which can lead to a brand new level of signal and image
processing methodologies. Single-chip smart cameras make it possible to design very efficient, very
small, low power and low cost cameras (when a large volume is produced). As examples, the VISoc
single chip smart camera [13] integrates a 320x256 pixel CMOS image sensor, a RISC processor, a vision
co-processor and I/O onto a single chip, which has been fabricated in a 0.35µm process on an area of
about 36mm
, and a typical power dissipation of about 1W at 3.3V at 60MHz. Moorhead et al. [14]
designed a smart CMOS camera chip which integrates an edge detection mechanism directly into the
sensor array. Lee et al. [15] reported the design of a 30 frames/second VGA-format CMOS image sensor
with an embedded massively parallel processor, for real-time skin-tone detection.
In some applications single chip smart camera can bring distinct advantages. For example,
et al. argue that, compared with conventional multi-chip fingerprint readers, a single-chip smart camera
based fingerprint reader can have advantages of being much smaller, allowing much simplified
integration into mobile devices such as mobile phone, being low in cost, and having improved security
[1]. The main disadvantage of the single-chip smart camera lies in the cost of chip design and
manufacturing, unless a large volume of units can be produced to justify the initial capital investment.
Nevertheless, a single-chip smart camera is a smart sensor that has potential to make vision systems
pervasive, especially when connected to wireless sensor networks.
3.2.4 Embedded System based Smart Cameras
This category of smart cameras most often consists of a camera (usually a general purpose one) and an
external embedded processing unit connected to it. For example, an embedded system based smart
camera could be a general-purpose camera connected to a high performance video processing board,
which itself is connected to a PC, either through a PCI slot or through a RS232 port. This kind of
configuration is not too different from a PC-based system. Many 2G digital CCTV systems with some
intelligent features belong to this category.
The necessity of having a dedicated and embedded processing unit in this type of smart cameras is due
to the fact that PC, while flexible and versatile, is far from being adequate to perform intensive image and
video processing and pattern recognition tasks, particularly when high-resolution, high frame rate and low
latency processing is required. Another advantage of this kind of system is that once proof-of-concept is
achieved and end-users are identified, it is easier for the system to be converted to a stand-alone smart
camera if required.
Smart cameras used in robotic and automobile applications can also be classified into this category.
These cameras may share computing resources such as a processor and memory with other embedded
devices in the robot and in the vehicle.
3.2.5 PC and Network based Smart Cameras
PC-based smart camera systems are probably most popular within the academic research environment,
as a first step to conducting computer vision and pattern recognition research, and building first prototype
for proof-of-concepts. It is a very simple and inexpensive configuration, as the prices for general purpose
video cameras and PCs continue to fall. Most often, a general purpose camera is connected to a PC
through either a frame grabber or a communication port such as USB, Firewire, CameraLink, or Ethernet.
This type of system relies on the PC’s CPU to perform image analysis, feature extraction and pattern
recognition tasks. The availability of various vision processing libraries for PC platforms makes this kind
of system very popular. PCs also provide a more flexible environment for building user interfaces.
USB cameras, Firewire cameras and network cameras allow digital images to be transferred directly
from camera to a PC or an embedded processing hardware, avoiding signal integrity loss caused by DAC
(digital to analog conversion) inside many CCTV cameras and ADC by frame grabbers. For high-
resolution cameras, Firewire cameras are starting to become popular and affordable, but CameraLink
remains dominant, especially for high bandwidth and high performance applications.

The 2G CCTV system is a network based video surveillance system (NVSS). An NVSS with built-in
intelligent surveillance features can be loosely considered as a network of virtual smart cameras. An
NVSS is composed of four main layers: a CCTV camera (sensor) layer, a network layer, a central
computer layer and a trained security personnel layer (Figure 4). As discussed in section 2.5.1, in most of
the currently deployed NVSSes, the ASIP tasks such as object tracking and identification and threat
detection are typically performed mostly by trained security personnel. However, human monitoring of
surveillance video is a very labor-intensive task. It is generally agreed that watching video feeds requires
a higher level of visual attention than most every day tasks. Specifically vigilance, the ability to hold
attention and to react to rarely occurring events, is extremely demanding and prone to error due to lapses
in attention. A recent study by the US National Institute of Justice found that, after only 20 minutes of
watching and evaluating monitor screens, the attention of most individuals will degenerate to well below
acceptable levels [16]. The next generation of video surveillance systems - intelligent video surveillance
systems (IVSS) – will try to solve these problems by providing automated video surveillance and crime
preemption abilities. The IVSS will seek a re-distribution of ASIP tasks among the four layers in the
NVSS system, notably shifting processing load from security personnel to central computers or DVR (in
short-term), and probably more importantly to the surveillance cameras – that is, the introduction of
(stand-alone) smart cameras to replace passive or dumb CCTV cameras (in mid- and long-term). The use
of smart cameras would greatly reduce the bandwidth problem caused by the increasing number of
cameras present in the system and enhance surveillance system performance, as sending raw pixels over
the network is less efficient than sending the results of intermediate analysis results. Smart cameras can
also help in decentralizing the overall surveillance system, which can lead to improved fault tolerance and
the realization of more surveillance tasks than with traditional cameras [17].

sensor layer network layer
central computer
(server) layer
security persona
Camera 1
Camera 2
Camera N
network layer

Figure 4: Four layers of a network based video surveillance system (NVSS).
3.3 Research in Smart Cameras
as Embedded Systems

Video processing is notoriously hungry for computation horsepower, memory and other resources.
Smart cameras as embedded systems have to meet the insatiable demand of video processing on one
hand, and to meet the challenging demands of embedded systems, such as real-time, robustness,
reliability under real-world conditions, on the other hand. This has made smart cameras a leading-edge
application for embedded systems research [18]. Recently there has been a significant increase in research
in building smart cameras as embedded systems. The first IEEE workshop on Embedded Computer
Vision (ECV’05) was held in June 2005 [19]. The workshop addressed issues such as how to design
smart algorithms to efficiently utilize embedded hardware, how to meet real-time constraints in embedded
environment and verification methods for mission-critical embedded vision systems. In particular, the
workshop discussed the suitability of FPGA for embedded vision systems.
Apart from numerous research groups working on developing smart cameras for video surveillance,
there are a number of academic research groups in the world dedicated to research into building smart
cameras as embedded systems. One prominent group is the Embedded Systems Group in Princeton
University’s Department of Electrical Engineering [18]. This group has developed an embedded smart
camera system that can detect people and analyze their movement in real time. They are also working on
a VLSI (Very Large Scale Integration) smart camera. An interesting research activity involving the design
of stand-alone smart cameras is the SmartCam project at University of Technology Eindhoven [20]. This
project investigates multi-processor based smart camera system architectures and addresses the critical
issue of determining correct camera architectural parameters for a given application domain. Another
project bearing the same name is being undertaken by the University of Technology in Graz, Austria [17].
The project aims to develop distributed smart cameras for traffic surveillance applications. They also
investigate various issues involved in making smart cameras as embedded systems, such as resource-
aware dynamic task allocation systems to support real-time requirements.
Many industry research groups and companies are involved in smart camera research for machine
vision, especially in Germany, Japan and the US. There exist some very informative and useful journals
and web portals for the machine vision world, such as IEEE Transactions on Pattern Analysis and
Machine Intelligence, Advanced Imaging Magazine [21], Machine Vision Resources [22], Machine
Vision Online [23].
A search on USPTO (US Patent and Trademark Office) website can reveal many patents filed or issued
in relation to the concept and embodiment of smart cameras as embedded systems. For example, patent
#6 985 780

filed in Aug 2004 under the title of “Smart Camera” [24] made claims about a camera system
that includes an image sensor and a processing module at the imaging location that processes the capture
images prior to sending the results to a host computer. The processing module can perform tasks such as
image feature extraction and filtering, convolution and deconvolution methods, correction of parallax and
perspective image error and image compression.
4 Review of ASIP Algorithms for Smart Cameras and State-of-the-Art Systems
If cameras are extensions of human eyes, the smart cameras are pushing the boundary of possibilities to
become extensions of human brain as well. What makes a camera smart is the intelligent and application
specific information processing (ASIP) algorithms that are built into the software architecture of the
camera systems. In this section we firstly explore some common characteristics of intelligent algorithms
for smart cameras. We then review several categories of algorithms as applied to machine vision,
surveillance and other prominent applications, and some state-of-the-art smart camera systems in use in
these applications areas.
4.1 Common Characteristics of Algorithms for Smart Cameras
The primary function of a smart camera is to conduct autonomous analysis of the content of an image
or video and achieve a high-level understanding of what is happening in the scene. One of the most
commonly adopted approaches is image processing-based pattern recognition, which is a branch of
artificial intelligence. Pattern recognition assumes that the image may contain one or more objects and
that each object belongs to one of several predetermined types or classes. Given a digitized image
containing several objects, the pattern recognition process consists of three main phases, each including
several processing tasks:
• Signal level processing – image enhancement, image segmentation;
• Feature level processing – feature extraction, feature measurements and tracking; and
• Object level processing – object classification and estimation.
This is illustrated in figure 5. Also shown in figure 5 is a semantic-level processing component, which is
central to the output or action side of smart cameras. The main tasks at this level include possible joint
analysis of inputs from additional cameras, other sensory and database inputs, data fusion, event
description, control signal generation. It should be noted that some tasks at different levels or phases may
intersect each other during processing.
extraction and
person, behavior
and event
signal level feature level object level
other camera and/or
sensory and/or
database inputs
classification and
control signal
enhancement and
semantic level

Figure 5: General processing flow of algorithms for pattern recognition and smart cameras.
Image segmentation at signal level is essential to all subsequent processing tasks, aiming at dividing an
image into distinct parts, each having a common characteristic. Image segmentation can be based on
color, texture, shape and motion. Feature extraction is crucial to pattern recognition. This is where the
segmented regions or objects are measured. A measurement is the value of some quantitative property of
an object. A feature is a function of one or more measurements, computed so that it quantifies some
significant characteristic of the object. This drastically reduced amount of information (compared to the
original image) represents all the knowledge upon which the subsequent classification decision must be
based. Object classification outputs a decision regarding the class to which each object belongs. Each
object is recognized as being of one particular type, and the recognition is implemented as a classification
process [25].
For simple applications, not all these levels and tasks are required to be implemented. For example, the
camera in an optical mouse only performs signal- and feature-level processing tasks. On the other hand,
for a particular processing task, different applications can have quite different requirements on the
camera’s performance, robustness and reliability. For example, the requirements for robustness of
processing tasks at all levels are much higher for video surveillance monitoring human movement and
behaviors than for industry machine vision cameras performing parts inspection or sorting.
Tasks at signal- and feature-levels are usually data-intensive and are well suited for hardware-based
implementation to meet speed demands. Tasks at the object level can be math-intensive and may need
high performance processor(s) to complete. Stand-alone smart cameras built on a multi-processor
architecture would have one processor, such as a DSP or an FPGA, to perform tasks at signal- and
feature-levels, and have a high performance DSP or RISC microprocessor to perform statistical object
When designing smart cameras as embedded systems for demanding applications such as surveillance
and automobiles, there are several important and challenging issues that need to be addressed, such as the
development of low-complexity, low-cost algorithms suitable for hardware implementation, and software
and hardware co-design, in order to map algorithmic requirements to hardware resources. These issues
will be further discussed in section 5.1.
4.2 Application: Intelligent Video Surveillance Systems (IVSS)
4.2.1 Current Research in Algorithms for IVSS
Video surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most
active research topics in computer vision and pattern recognition. The IEEE and IEE have organized
many workshops and conferences on intelligent visual surveillance in the last several years and have
published special journal issues that focus solely on visual surveillance or in human motion/behavior
analysis. Hu et al. [26] and Valera et al. [27] recently conducted excellent surveys on various algorithms
and techniques under research and development for video surveillance. They also reviewed some high
profile IVSS systems. Some comments in this section are derived from their papers.
For video surveillance, image segmentation most often starts with motion detection, which aims at
segmenting regions corresponding to moving objects from the rest of an image. Background modeling is
indispensable to motion detection. 3-D models can provide more realistic background descriptions but are
more costly. 2-D models have more applications currently due to their simplicity. However, all modeling
techniques need to find ways to reduce the effect of unfavorable factors such as illumination variation,
moving shadows and so on. Promising techniques for motion segmentation include simple background
subtraction, temporal differencing, and more complex optical flow methods. Skin-color based
segmentation can be very useful when human objects are close enough to the camera and lighting is
consistent. Once segmentation has provided isolated objects, feature extraction and measurements can be
performed on each object. Simple algorithms for feature extraction include image moments, which can
provide geometrical features of the objects. For gesture and behavior recognition, promising algorithms
for feature extraction include MEF (Most Expressive Features), extracted by Karhunen-Loeve projection,
and MDF (Most Discriminative Features), extracted by multivariate discriminate analysis [28]. Since
sometimes it is not easy to specify features explicitly, in some applications when the image size is small
enough, the whole image or transformed image is taken as the feature vector. Examples of algorithms for
object classification are shape-based classification and motion-based classification. After motion
detection and object classification, video surveillance systems generally track moving objects from one
frame to another. Promising algorithms for object tracking can be classified into four categories: region-
based tracking, active contour based tracking, feature based tracking, and model based tracking. Particle
filters have recently become a major way of tracking moving objects.
Human behavior understanding and personal identification are among the most challenging tasks facing
IVSS systems for high-end security applications. Behavior understanding involves the analysis and
recognition of motion patterns, and the production of high-level description of actions and interactions.
Promising approaches and algorithms for behavior understanding include dynamic time warping, finite
state-machine, HMMs (Hidden Markov Models), time-delay neural networks. Personal identification is of
increasing importance for many security applications. The human face and gait are now regarded as the
main biometric features that can be used for personal identification in video surveillance systems. While
face recognition research and development has made a lot of progress in recent years, current research on
gait recognition is still in its infancy.
4.2.2 State-of-the-Art IVSSes
A number of high-profile IVSSes have been reported in recent years. These systems, some deployed in
real-world applications, applied various pattern recognition techniques described in previous sections and
provided features such as people tracking, behavior recognition, detection of unattended objects and so
on. Examples are the real-time visual surveillance system W4 [29], the Pfinder system developed by
Wren et al. [30], the single-person tracking system, TI, developed by Olsen et al. [31], and a system at
CMU (Carnegie Mellon University) [32] that can monitor activities over a large area using multiple
cameras connected by a network.
A few IVSSes based on the use of stand-alone smart cameras have also been reported. The V2 system
developed by Christensen and Alblas [33] is a surveillance system that avoids the disadvantages of the
centralized computer server, and moves many of the processing tasks directly to the camera, making the
system a group of smart cameras connected across the network. The event detection and storage of event
video can be performed autonomously by the camera. Thus, normally, it is only necessary to
communicate with a central point when significant events occur. The VSAM project described by Collins
[34, 35] is a multi-camera surveillance system composed of a network of ‘smart’ sensors that are
independent and autonomous vision modules. These vision sensors are capable of detecting and tracking
objects, classifying the moving objects into semantic categories such as ‘human’ or ‘vehicle’ and
identifying simple human movements such as walking. Desurmont et al. [36] developed a smart network
camera system with three smart cameras to perform people tracking and counting in shopping malls.
Their system uses web services standards and XML-based meta data to implement inter-camera and
camera-to-host coordination. Fleck et al. [37] designed a smart camera that contains an FPGA and a
PowerPC processor to perform face tracking and people tracking, using particle filters on HSV (Hue,
Saturation, Value) color distributions. The camera outputs the approximated PDF (probability distribution
function) of the target state to a host computer.
4.3 Application: Industry Machine Vision
While advanced algorithms for smart cameras for surveillance applications are mostly still in their
research and development stage, due to high complexity and high-level of robustness requirement for
real-world applications, smart cameras for industry machine vision have long established their places in
the market as mature players. Most machine vision cameras are stand-alone and autonomous smart
cameras, where communications with PC or other central control unit is only needed for camera
configuration, firmware upgrading or in some cases output data collection. Most algorithms implemented
in these cameras follow the similar processing flow described in figure 5. One important reason for the
relative maturity of machine vision smart cameras, compared with smart cameras for surveillance, is that
the application requirements for machine vision cameras are much less restrictive compared with those
for surveillance cameras. In other words, many pattern recognition algorithms or techniques have a much
better chance of performing with satisfactory robustness and reliability for machine vision than for
surveillance applications. This is because machine vision cameras mainly deal with conditions such as:
• indoor use, thus good and consistent lighting conditions can be more easily guaranteed;
• minimum problems of occlusion;
• static and known background, thus unusual feature detection is simpler;
• limited object patterns to be recognized; and
• no human movement tracking and recognition is necessary.
There are many proven software packages on the market that can be customized or directly
implemented for programmable machine vision cameras. Most of these packages are for special industry
sectors, but some are general purpose packages, including a few powerful up-market libraries such as
Halcon library [38]. The Halcon library provides algorithms that include shape-based matching to find
objects based on ROI (region of interest) modeling, blob analysis, metrology (both 1D and 3D), edge
detection, edge and line extraction, contour processing, template matching, and color processing.
Thanks to the advancements in embedded system technologies and improved affordability of
processing power, there is a migration of the functionality of what were once only PC-based systems
down to the smart camera level. Artificial intelligence is one of these functionalities. Pulnix America’s
ZiCAM camera, for example, makes use of a hardware neural network to eliminate the need for
programming to execute image-understanding algorithms [39]. It can learn what is required for a machine
vision application, and once taught, operates as a stand-alone smart camera. Wintriss Engineering
manufactured a smart camera which sports a microprocessor, DSP and multiple FPGAs with up to
130,000 gates [40]. The company offers both area- and line-scan versions of their smart cameras, with
line scan version being able to perform imaging-related processes on 5 150 pixel lines at 40 MHz. One
such camera uses an FPGA to perform image sensor control and pixel correction, and the combination of
the compute power in the camera head to run real-time digital filters, lighting correction, streak correction
and input/output capability. Ultimately geometric and photometric manifested flaws are discriminated
based on connectivity analysis, all performed within the camera.
4.4 Application:
Intelligent Transport Systems
and Automobiles
4.4.1 ITS Applications
There is growing awareness and interest in using smart cameras in Intelligent Transport Systems (ITS)
and automobile industries. IEEE organized very recently an international workshop in June 2005 on
Machine Vision for Intelligent Vehicles [41]. Generally speaking, the application and algorithmic
requirements for ITS are quite similar to those of IVSS. These requirements can be quite different for
automobile applications, however, where high-speed imaging and processing are often needed, imposing
higher level of demand on both hardware and software. Increased robustness is also required for car-
mounted cameras to deal with varying weather conditions, speeds, road conditions, car vibrations. CMOS
image sensors can overcome problems like large intensity contrasts due to weather conditions or road
lights and further blooming, which is an inherent weakness of existing CCD image sensors [42].
There have been a number of successful applications of smart camera systems for ITS reported in the
literature. The VIEWS system at the University of Reading [43] is a 3D model-based vehicle tracking
system. Kumar et al. [44] described a real-time rule-based behavior-recognition system for traffic videos.
This system will be useful for better traffic rule enforcement by detecting and signaling improper
behaviors, which is capable of detecting potential accident situations and is designed for existing camera
setups on road networks. Beymer et al. [45] presented a smart camera-based monitoring system for
measuring traffic parameters. The aim of the system is to capture video from cameras that are placed on
poles or other structures looking down at traffic. Once the video is captured, digitized and processed by
onsite smart camera, it is transmitted in summary form to a transportation management centre for
computing multi-site statistics like travel times. Bramberger et al. [42] described an embedded smart
camera for stationary vehicle detection. They discussed the mapping of high-level algorithms to
embedded system components. Dimitropoulos et al. [46] described a network of smart cameras deployed
at the airport to detect and track aircrafts; each camera can autonomously detect aircraft traffic in multiple
locations within its field of view. A camera data fusion module performs data fusion from multiple
cameras to determine the location and size of the aircraft. Other applications for smart cameras for ITS
include vehicle behavior in parking lots, vision based vehicle speed measurement, red-light intrusion at
traffic lights, vehicle number plate recognition. Some authors have expressed the need to integrate smart
traffic surveillance systems with existing traffic control systems to develop the next generation of
advanced traffic control and management system [47].
4.4.2 Automobile Applications
Intelligent vehicles will form an integral aspect of the next generation technology of ITS. Smart
camera-powered intelligent vehicles will have the comprehensive capability of monitoring the vehicle
environment including the driver’s state and attention inside of the vehicle as well as detecting roads and
obstacles outside the vehicle, so as to provide assistance to drivers and avoid accidents in emergencies.
However, building and integrating smart cameras into vehicles is not an easy task: On one hand the
algorithms require considerable computing power to work reliably in real-time and under a wide range of
lighting conditions. On the other hand, the cost must be kept low, the package size must be small and the
power consumption must be low [48]. Applications of smart cameras in intelligent vehicles include lane
departure detection, cruise control, parking assistance, blind-spot warning, driver fatigue detection,
occupant classification and identification, obstacle and pedestrian detection, intersection-collision
warning, overtaking vehicle detection. Below are a few examples.
Stein [49] described a single smart camera-based adaptive cruise control system for intelligent vehicles.
In a paper on obstacle detection using stereo vision, Ruichek [68] focused on a multilevel- and neural-
network-based stereo-matching method for real-time road obstacle detection with linear cameras for use
in vehicles. Xu et al. [50] addressed the problem of pedestrian detection and tracking with night vision
using a single infrared video camera installed on the vehicle. The EyeQ is a single chip smart camera
processor developed by Mobileye [51]. It has been fabricated using 0.18µm CMOS technology, operating
at 120 MHz. It integrates two 32 bit RISC ARM946E CPUs, four Vision Computing Engines, a multi-
channel DMA (Direct Memory Access) and several peripherals and is designed for computationally
intensive applications for real-time visual recognition and scene interpretation for use in intelligent
vehicle systems.
4.5 Other Application Areas
Other important applications for smart cameras include HCI, medical imaging, robotics, games and
toys. Optical mice are widely used. Smart cameras performing gesture recognition will play important
role in the development of multimodal user interfaces. Bonato et al. [52] presented an FPGA-based smart
vision system for mobile robots capable of performing real-time human gesture recognition. The RVT
system developed by Leeser et al. [53] and based on FPGA processing allows surgeons to see live retinal
images with vasculature highlighted in real time during surgery.
5 Smart Camera Design Considerations and Future Directions
In this final section we discuss design considerations for smart cameras as embedded systems, identify
several key issues that need to be addressed by the design and research community, and speculate on the
future directions of smart camera research and development.
5.1 Design Considerations
5.1.1 Design and Development Process
Figure 6 shows a typical design and development process for smart cameras as embedded systems
(excluding single-chip smart cameras). A shown in figure 6, the process can be iterative, especially if the
initial application specification was not complete from the end user point of view.
Proof of Concept
- Algorithm and
Embedded System
Integration and
Field Test -
Engineering Prototyping /

Figure 6: Design and development process for smart cameras as embedded systems.
The system architecture design stage will decide on software and hardware architectures, based on
performance, deadline and cost criteria. Algorithmic design and timing design suitable to the targeted
hardware platform also needs to be defined. The mapping between algorithm requirements and hardware
resources is an important issue. The proof-of-concept stage may use a PC platform for research and
algorithm development. Usually a COTS (Commercial Off-The-Shelf) general purpose camera is used at
this stage. Hardware components need to be acquired, integrated and tested. However, this is not needed
if, during the architecture design stage, a third party camera development platform or hardware
accelerator unit for video processing is identified to be an appropriate solution to hardware platform (see
section 5.1.6 for examples of smart camera development platforms). The algorithm conversion stage
includes tasks such as converting floating-point arithmetic to fixed-point arithmetic, low power and low
complexity version consideration, implementation using HDL (Hardware Description Language). The
Embedded System Integration stage will result in a prototype smart camera using an embedded hardware
platform running embedded versions of algorithms.
5.1.2 System Architecture and Design Methodology
System architecture design will surely depend on application requirements, which can be very simple
(e.g. an optical mouse) but can be very complex (e.g. face recognition). System architecture design has to
consider many factors such as the hardware platform, cost, time to market, flexibility, and so on.
Generally speaking, a heterogeneous, multiple-processor architecture can be ideal for smart camera
development. For example, such an architecture may consist of an FPGA or a DSP as a data processor to
tackle image segmentation and feature extraction, and a high-performance DSP or media processor to
tackle math-intensive tasks such as statistical pattern classification. This kind of system can allow better
exploitation of pipelining and parallel processing, which are essential to achieve high frame rates and low
latency. Some authors have reported work on the impact of hardware system architecture on the level of
implementable pipelining and parallel processing for smart cameras [54, 55]. Some initial work has been
reported on design methodology for embedded vision systems [56, 57].
5.1.3 Embedded Processors
There are generally four main families of embedded processors that can be used for smart cameras:
Microcontrollers, ASICs (Application Specific Integration Circuits), DSPs (Digital Signal Processors)
and PLDs (Programmable Logic Devices) such as the FPGA. Microcontrollers are cheap but have limited
processing power and are generally not suited for building demanding smart cameras. ASICs are powerful
and power-efficient processors, but the design cost and risk are high and they are viable solutions only
when volume is high and time-to-market is well-timed. DSPs are relatively cheap and powerful in
performing image and video processing, but for demanding applications usually more than one DSP
would be needed. DSP-based solutions can be cost-effective for medium-volume production. Recently a
new class of DSP processors, called media processors, has come into the vision market. Media processors
try to provide a good trade-off between flexibility and cost-effectiveness. They typically have a high-end
DSP core employing SIMD (Single Instruction Multiple Data) and VLSI architectures, married on-chip
with some typical multimedia peripherals such as video ports, networking support, and other fast data
ports [58]. Examples of media processors are Philip’s TriMedia, TI’s DM64x, ADI’s (Analog Devices,
Inc) Blackfin.
The FPGA has recently emerged as a very good hardware platform candidate for embedded vision
systems such as smart cameras. One of the most important advantages of the FPGA is the ability to
exploit the inherently parallel nature of many vision algorithms. FPGAs used to be mainly employed as
glue logic between processors and peripherals, but the introduction of on-chip hardware multipliers and
dual-port memory has made FPGAs excellent options for DSP applications. The integration of
microprocessors into FPGA chips (such as Xilinx’ Virtex-II Pro and Virtex-4 chips) made them true
system-on-a-chip solutions. These features, together with the continuous improvements in cost and
maturity of design tools, have made FPGAs very competitive against DSPs and media processors for
many types of embedded vision system designs. In fact, an increasing number of publications on smart
cameras as embedded systems have employed FPGAs as the sole processor or as a data-intensive
processor before a DSP or a media processor, in a powerful heterogeneous multi-processor architecture
[59]. Sen et al. [56] has recently proposed a design methodology for effectively and efficiently
implementing computer vision algorithms on FPGA to build smart cameras. A study to compare the
relative performance of running various image processing routines on DSP, PowerPC, Intel Pentium 4
and FPGA was published on Alacron’s web site [60], in which the FPGA solution was found to produce a
distinct advantage. However, a more standardized performance evaluation mechanism to help processor
selection is much needed.
How should one choose between DSPs, media processors, ASICs and FPGAs? Kisacanin [58] proposed
a practical way to help processor selection based on intended production volume, cost and development
flexibility. He argued that ASICs may be suitable for high volume of over 1 000 000units, DSPs or media
processors for medium volumes between 10 000 and 100 000 units, while for low volumes of under 10
000, FPGAs can be a good viable candidate.
5.1.4 Algorithms Development and Conversion
Algorithm development for embedded systems is quite different from that for PC-based platforms.
Basically it can be a lot more demanding and challenging, especially if FPGA or ASIC processors are
targeted. Usually when designing applications for ASIC or FPGA, one has to understand chip architecture
so that algorithms can be executed efficiently and effectively. Nowadays behavior synthesizers or
algorithmic synthesizers do exist to help designers to forget about the device architecture and focus on
functionality, but they come at the cost of efficiency in terms of chip area or gate counts and power
consumption. Therefore, it is always important to gain an intimate knowledge of the device architecture
of whichever of the ASIC, FPGA or DSP is targeted. This intimate knowledge can also help design
parallel processing and pipelining processing, which can be a very important and effective video
processing technique. Converting floating-point arithmetics to fixed-point and eliminating divisions as
much as possible (by using hardware multipliers and look-up tables, for example) are other design
considerations for algorithm conversion.
5.1.5 Other Factors
Memory System - Smart cameras need flexible memory models to meet requirements such as scalable
frame buffers to cope with increasing image sensor resolutions. As the smart camera may integrate
different types of processors, the memory system should support potentially complex processing pipeline
and parallelism in order to meet the application’s real-time requirements. For single chip smart cameras,
care needs to be taken at design stage to conserve memory [54].
Communication Protocols - There are currently too many data output protocols for cameras, such as
Firewire, CameraLink, GigE, USB. Firewire is maturing but CameraLink remains the bandwidth leader
and very popular with the machine vision users. Unfortunately, the variety of digital interfaces increases
the confusion in the market and put pressure on the camera vendors to support multiple versions of
cameras with different interfaces.
5.1.6 Smart Camera Development Platforms
There have been a number of commercially available programmable smart camera platforms for
developers to design and prototype smart cameras for applications such as machine vision, biometrics,
HCI and surveillance. Philips has introduced the INCA (INtelligent CAmera) series of programmable
cameras [61] which integrate CMOS image sensors of various resolutions and a highly flexible duel-core
processing unit which includes a Xetal processor for computation intensive signal processing such as
feature extraction, together with a high performance TriMedia DSP core for math-intensive processing
tasks such as pattern recognition. The camera comes with an application development kit allowing for fast
prototyping. One application has been designed for face recognition [62], in which the Xetal is used for
face detection and TriMedia for face recognition. Sony has recently released a smart camera development
system XCI-SX1 that integrates an SXGA CCD image sensor (15 frames per second, 34fps at 640x480
resolution) and an AMD GeodeGX533 400Mhz processor running MontaVista Linux operating system
[63]. The camera platform is designed to provide OEMs, systems integrators and vision tool
manufacturers a rugged, robust component, combining the imager, intelligence and interface in a single
plug-in module that is simple to set up and easy to integrate. The IQeye3 IP camera from IQinvision Inc,
powered by a 250 MIPS PowerPC CPU, is a platform for smart IP network camera development [64].
Some signal processing tool development companies provide multi-processor development systems that
can serve as excellent development platforms for smart cameras. For example, Hunt Engineering [65]
provides a development platform HERON based on a Xilinx FPGA and a TI (Texas Instruments) DSP.
They also provide expansion capabilities to integrate video capture, IPs, more DSPs and/or FPGAs for
creating scalable smart camera architectures. Lyrtech also provides similar development systems in its
SignalMaster series of products [66]. These systems generally provide flexible communication ports and
5.2 Key Issues or Challenges
System Design – The proprietary nature of smart cameras can limit choices of hardware, like imagers,
I/O, lighting, lens and the communications format. This may lead to a lack of expandability and flexibility
of PC-based systems. On the other hand, smart cameras don’t have as many software applications and
libraries as already exist for PC/frame grabber-based systems. In terms of design methodology, the easy
integration of intellectual property in the design tool and flow can help foster product differentiation.
Other important system-level issues include smart camera operating systems, development tools.
CMOS Image Sensors – Dynamic range is still one of the key aspects where CMOS image sensors lag
behind CCD. Improvement in this area can lead to more low-cost smart cameras using CMOS image
sensors for machine vision and surveillance applications.
Algorithm Development – Many intelligent pattern recognition algorithms work well in laboratory
conditions but fail when deployed and implemented in real-world conditions (occlusion, lighting
condition changes, unfavourable weather conditions), and embedded system environments (scant
resources, low power, low cost). Robustness and low complexity are among key issues facing researchers
developing algorithms for smart cameras in surveillance, ITS and automobile applications.
Performance Evaluation - This is a very significant challenge in smart surveillance systems. Evaluating
the performance of video analysis systems requires significant amounts of annotated data. Typically,
annotation is a very expensive and tedious process. Additionally, there can be significant errors in
annotation. All of these issues make performance evaluation a significant challenge [16].
Standards Development – There is need for the development of some smart camera standards. In fact,
the European Machine Vision Association (EMVA, [67]) has recently launched an initiative (EMVA
1288 Standard) to define a unified method to measure, compute and present specification parameters for
smart cameras and image sensors used for machine vision applications. More needs to be done in this
Single Chip Smart Cameras – Single-chip smart cameras are an attractive concept, but the
manufacturing cost for the single-chip smart cameras can be high because the feature size for making
digital processors and memory is often different from the one used to make image sensors, which may
require relatively large pixels to efficiently collect light. Therefore, for applications where physical space
and power consumption is not extremely restrictive, it probably still makes sense to design the smart
camera in a multi-chip approach with a separate image sensor chip. Separating the sensor and the
processor also makes sense at the architectural level, given the well-understood and simple interface
between the sensor and the computation engine [54].
5.3 Future Directions
The demand for smart cameras will steadily increase in traditional industries such as surveillance and
industry machine vision, and may also come from new industry and market segments such as healthcare,
entertainment, education and so on. Research interest, economic and social factors will drive continuous
technological and product development. Based on the discussions above, we can discern the following
future directions for smart camera system and technologies.
• At the system design level, continuous effort will be made in the development of a research
strategy or design methodology for smart cameras as embedded systems. Same for the
development of libraries and tools that facilitate algorithm implementation in DSPs and FPGAs.
Research on the general and ‘optimal” architectures for smart cameras and on real-time
operating systems for smart cameras will be undertaken, and the issue of too many digital
interfaces (Firewire, CameraLink, etc) for cameras will be addressed.
• At the ASIP algorithm development level, in order to improve performance and robustness of
existing techniques, research should address issues such as occlusion handling, fusion of 2D and
3D tracking, anomaly detection and behavior prediction, combination of video surveillance and
biometrical personal identification, multi-sensory data fusion [26].
• Multi-modal, multi-sensory augmented video surveillance systems have the potential to provide
improved performance and robustness. Such systems should be adaptable enough to adjust
automatically and cope with changes in the environment like lighting, scene geometry or scene
• Work on distributed (or networked) IVSS should not be limited to the territory of computer
vision laboratories, but should involve telecommunication companies and network service
providers, and should take into account system engineering issues.
• In the machine vision arena, smart cameras will offer more and more functionality. The trend of
distributing machine vision across the entire production line at points before value is added will
continue. Neural network techniques seem to have become a key paradigm in machine vision
that are used either to correctly segment an image in a wide variety of operational conditions or
to classify the detected object. Stereo and 3D-vision applications are also increasingly
widespread. Another trend is to utilize machine vision in the non-visible spectrum.
• New product developments will introduce smart camera-based digital imaging systems into
existing consumer and industry products, to increase their value and create new products.
• Standards development. One area which may need standardization is the metadata format that
facilitates integration and communication between different cameras, sensors and modules in a
distributed and augmented video surveillance system. New communication protocols may be
needed for better communication between different smart camera products.
The authors would like to thank Dr. Xing Zhang from ST Microelectronics and Dr. Julien Epps of
National ICT Australia for their many valuable comments and corrections of parts of this paper.
[1] S. Shigematsu, H. Morimura: A Single-Chip Fingerprint Sensor and Identifier. IEEE Journal of Solid-State
Circuits, Vol. 34, No. 12, December 1999. pp.1852-1859.
[2] M. LaPedus: CMOS Image Sensors Market Consolidates.
[3] Intel Open Source Computer Vision Library.
[4] Chicago Pairing Surveillance Cameras with Gunshot Recognition Systems.
[5] Global digital video surveillance markets.
[6] Frost & Sullivan: Video Surveillance Software Emerges as Key Weapon in Fight Against Terrorism.
[7] Smart Products Can See The Future.
[8] Smart Cameras Drive Machine Vision Growth. Advanced Imaging Journal. October 2005. page 8.
[9] Machine Vision Online: JAI PULNiX Forms New “Smart Camera” Business Unit.
[10] W. Hardin, Smart Cameras: The Last Step in Machine Vision Evolution?
[11] H. Broers, R. Kleihorst, M. Reuvers and B. Krose: Face Detection and Recognition On A Smart Camera.
Proceedings of ACIVS 2004, Brussels, Belgium, Aug.31- Sept.3, 2004.
[12] Pixim Digital Pixel System
Technology Backgrounder.
[13] L. Albani, P. Chiesa, D. Covi, G. Pedegani, A. Sartori, M. Vatteroni: VISoc : A Smart Camera SoC. Proceedings
of the 28th European Solid-State Circuits Conference, pp.367-370, Firenze, Italy, September 2002.
[14] T.W.J. Moorhead, T.D.Binnie: Smart CMOS Camera For Machine Vision Applications. Image Processing and
Its Applications, Conference Publication No.465. IEE 1999. pp.865-869.
[15] M.S. Lee, R. Kleihorst, A. Abbo, E. Cohen-Solal: Real-time Skin-tone Detection with A Single-chip Digital
Camera. Proc. of 2001 Int’l Conference on Image Processing. Volume 3, 7-10 Oct. 2001. Page(s):306 – 309.
[16] A. Hampapur, L. Brown, J. Connel, S. Pankanti, A. Senior, Y. Tian: Smart Surveillance: Applications,
Technologies and Implications. 4th IEEE Pacific-Rim Conference On Multimedia. 15-18 December 2003, Singapore.
[17] SmartCam - Design and Implementation of an Embedded Smart Camera:
[18] W. Wolf, B. Ozer, T. Lu: Smart Cameras As Embedded Systems. IEEE Computer, 35(9):48–53, Sep 2002.
[19] The First IEEE Workshop on Embedded Computer Vision:
[20] SmartCam: Devices for Embedded Intelligent Cameras.
[21] Advanced Imaging.
[22] Machine Vision Resources.
[23] Machine Vision Online:
[24] USPTO.
[25] K. R. Castleman: Digital Image Processing. 1
edition, Prentice Hall, New Jersey, 1996.
[26] W. Hu, T. Tan, L. Wang and S. Maybank: A Survey on Visual Surveillance of Object Motion and Behaviors.
IEEE Transactions on Systems, Man and Cybernetics. Vol. 34, No. 3, August 2004. 334-352.
[27] M. Valera and S.A. Velastin: Intelligent distributed surveillance systems: A review. IEE Proc.-Vis. Image Signal
Process. Vol. 152, No. 204 2, April 2005. 192-204.
[28] Y. Wu, T.S. Huang: Vision-Based Gesture Recognition: A Review. Lecture Notes in Computer Science. Volume
1739, 1999. pp.103-114.
[29] I. Haritaoglu, D. Harwood, and L. S. Davis: Real-time surveillance of people and their activities. IEEE Trans.
Pattern Anal. Machine Intell., vol. 22, pp. 809–830, Aug. 2000.
[30] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland: Pfinder: real-time tracking of the human body.
IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 780–785, July 1997.
[31] T. Olson and F. Brill: Moving object detection and event recognition algorithms for smart cameras. In Proc.
DARPA Image Understanding Workshop, 1997, pp. 159–175.
[32] A. J. Lipton, H. Fujiyoshi, and R. S. Patil: Moving target classification and tracking from real-time video. In
Proc. IEEE Workshop Applications of Computer Vision, 1998, pp. 8–14.
[33] M. Christensen, R. Alblas: V2- design issues in distributed video surveillance systems. Denmark, 2000, pp.1–86.
[34] R.T. Collins, A.J. Lipton, H. Fujiyoshi, and T. Kanade: Algorithms for cooperative multisensor surveillance,
Proc. IEEE, 89, (10), 2001, pp. 1456–1475.
[35] R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, O.
Hasegawa, P. Burt, and L.Wixson: A system for video surveillance and monitoring. Carnegie Mellon Univ.,
Pittsburgh, PA, Tech. Rep., CMU-RI-TR-00-12, 2000.
[36] X. Desurmont, B. Lienard, J. Meessen,, J.F. Delaigle: Real-Time Optimization For Integrated Network Camera.
Proc. of SPIE - Real Time Imaging 2005, San Jose, CA, January 2005.
[37] S. Fleck, W. Strasser: Adaptive Probabilistic Tracking Embedded in a Smart Camera. Proc. Of the 2005
Computer Society Conference on CVPR.
[38] Halcon.
[39] Machine Vision Online: JAI PULNiX Forms New “Smart Camera” Business Unit.
[40] Are Smart Cameras Smart Enough?
[41] MVIV’05:
[42] M. Bramberger, R. P. Pflugfelder, A. Maier, B. Rinner, B. Strobl, H. Schwabach: A Smart Camera For Traffic
Surveillance. Proceedings of the first Workshop on Intelligent Solutions in Embedded Systems (WISES). June 2003.
[43] T. N. Tan, G. D. Sullivan, and K. D. Baker: Model-based localization and recognition of road vehicles. Int. J.
Comput. Vis., vol. 29, no. 1, pp. 22–25, 1998.
[44] P. Kumar, S. Ranganath, H. Weimin, K. Sengupta: Framework for Real-Time Behavior Interpretation From
Traffic Video. IEEE Transaction on ITS, March 2005. Volume 6, No.1. pp. 43-54.
[45] D. Beymer, P. McLauchlan, B. Coifman, and J. Malik: A real-time computer vision system for measuring traffic
parameters. Proc. IEEE Conf. on Computer Vision and Pattern Recognition. pp. 495–502.
[46] K. Dimitropoulos, N. Grammalidis, D. Simitopoulos, N. Pavlidou, M. Strintzis: Aircraft Detection and Tracking
Using Intelligent Cameras. IEEE Int’l Conference on Image Processing. Vol 2, 11-14 Sept. 2005 Page(s):594 – 597.
[47] C. Nwagboso: User focused surveillance systems integration for intelligent transport systems. In Regazzoni,
C.S., Fabri, G., and Vernazza, G. (Eds.): ‘Advanced Video-based Surveillance Systems’ (Kluwer Academic
Publishers, Boston, 1998), Chapter 1.1, pp. 8–12.
[48] G. Stein: A Computer Vision System on a Chip: A case study from the automobile domain. First IEEE
Workshop on Embedded Computer Vision. June 2005.
[49] G. Stein, O. Mano and A. Shashua: Vision-based ACC with a Single Camera: Bounds on the Range and Range
Rate Accuracy, IEEE Intelligent Vehicles Symposium, June 2003, Columbus, OH.
[50] F. Xu, X. Liu and K. Fujimura: Pedestrian Detection and Tracking With Night Vision. IEEE Transaction on ITS,
March 2005, Vol.6 No.1. 63-71.
[51] EyeQ: System-on-a-chip.
[52] V. Bonato, A.K. Sanches, M.M. Fernandes, J.M.P. Cardoso, E.D.V. Simoes, E. Marques: A Real Time Gesture
Recognition System for Mobile Robots. In International Conference on Informatics in Control, Automation, and
Robotics, August 25-28, Setúbal, Portugal, 2004, INSTICC, pp. 207-214.
[53] M. Leeser, S. Miller, H. Yu: Smart Camera Based on Reconfigurable Hardware Enables Diverse Real-time
Applications. Proc. of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[54] W. Wolf, B. Ozer, T, Lu: VLSI Systems for Embedded Video. Proc. IEEE Computer Society Annual
Symposium on VLSI, 2002.
[55] W.Wolf, T. Lv, B. Ozer: An Architecture Design Study for a High Speed Smart Camera. Proceedings of the 4

Workshop on Media and Streaming Processors. Istanbul, Turkey, 2002.
[56] M. Sen, I. Corretjer, F. Haim, S. Saha, J. Schlessman, S. S. Bhattacharyya, W. Wolf: Computer Vision on
FPGAs: Design Methodology and Its Application To Gesture Recognition. Proc. of the 2005 IEEE CVPR.
[57] W. Caarls, P. Jonker, H. Corporaal: Benchmarks For SmartCam development. Proceedings of ACIVS 2003
(Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003
[58] B. Kisacanin: Examples of Low-Level Computer Vision on Media Processors. Proceedings of the 2005 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition.
[59] W.J. MacLean: An Evaluation of the Suitability of FPGAs for Embedded Vision Systems. Proceedings of the
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[60] The Future of High Performance Machine Vision.
[61] Philips Industrial Vision Products.
[62] R. Kleihorst, M. Reuvers, B. Krose and H. Broers: A Smart Camera for Face Recognition. Proc. 2004
International Conference on Image Processing (ICIP’04). Pp. 2849-2852.
[63] Sony introduces first in smart camera series.
[64] IQinvision Smart Camera Systems. IQeye300 Series.

[67] Standard for Measurement and Presentation of Specifications for Machine Vision Sensors and Cameras.
[68] Y. Ruicheck: Multilevel- and Neural-network-Based Stereo-Matching Method for Real-Time Obstacle Detection
Using Linear Cameras. IEEE Transactions on ITS, March 2005, Vol.6 No.1. 54-62.

About the Authors – YU SHI is a Senior Researcher with National ICT Australia in Sydney, Australia. He was
granted his B.Eng in 1982 by the National University of Defense Technology in Changsha, Hunan, China. He later
obtained his M.Eng and PhD in signal processing and biomedical engineering in 1988 and 1992 respectively in
Toulouse, France. He also completed post-doctoral research at Oxford Brookes University in England in the late
1990s. His main research interests are in embedded vision systems, FPGA-based design and applications, multimodal
user interfaces and web services.
SERGE LICHMAN is a Senior Research Engineer with National ICT Australia in Sydney, Australia. He received
M.Eng in Electrical Engineering in 1988 by the Odessa State Polytechnic University in Ukraine. His 12 years of
experience in the area of image and signal processing for commercial software and hardware gave him practical skills
in full product development life cycles, from research to deployment. His work has led to several publications.