Microsoft Kinect Development

seamaledicentAI and Robotics

Nov 17, 2013 (3 years and 8 months ago)

129 views

Microsoft Kinect Development



Richard Isely III

Software Engineering

University of Wisconsin Platteville

iselyr@uwplatt.edu



Abstract


This paper focuses on the creation of the Kinect and the process in which it works. The main
idea for the Kinect was developed in 2005
-
2006 by the founders of the company PrimeSense.
The
y

found a cheap way to build a device to utilize motion tracking and d
epth sensing. Their
reference device went on to the base design for the Kinect.
Microsoft started the project in 2007
and it h
ad many issues to overcome
, including an
unforeseen issue after the release
.

D
evelopers
around the world attempted to create their own drivers and libraries for
running the Kinect on
platforms other than the Xbox. The success of these developers, as well as the release of libraries
by PrimeSense led to the release of Microsoft SDK.
This release included all of the libraries that
Microsoft used over the hardware created by PrimeSense and allowed commercial development
with the Kinect to begin.






What is the Kinect?


The Kinect is a device designed by Microsoft to utilize a person’
s body, movement, and voice as
a controller for a video game. Microsoft’s original intent was for this device to be the future of
video games. The device was designed to work with all Xbox 360 consoles, even consoles made
before the release of the Kinect,
to be used with video games and the “dashboard” or home
screen of the Xbox. Microsoft released a commercial for the Kinect, before its release, that
showed users performing the actions to control the avatars in the game, as well as being able to
video chat

with friends and control all media features of the Xbox by voice or with their hand.
This was Microsoft’s attempt to challenge the Wii game console on the motion gaming front.
The components used in the Kinect to accomplish this have made the Kinect a big
ger hit than
Microsoft could have imagined, but it hasn’t been in the form that was expected.

[4]



History of the Kinect


The history of the Kinect starts with the release of the
Nintendo Wii in 2006. With this release,
Peter Moore, head of the Xbox division at the time, demanded work to be done to compete
against the Wii. The plan was to have two separate teams to come up with an idea for a “Wii
Killer” as Moore stated it. One of
the teams met with the eventual
company that worked with the
depth sensor of the Kinect, Prime Sense. However, the project lost some of its momentum when
Moore left Microsoft in 2007 to work for EA Sports.
[1]



Project Natal


The project got back on track

in 2008 when the project was given approval to work with th
e
technology developed by Prime
Sense.
Alex Kipman was put in charge of project
, which was

given the code name project Natal. The goal of the project was to design a device to include
motion tracki
ng, voice recognition, facial recognition, and depth recognition. Microsoft has
traditionally named large projects after cities, and Kipman gave this code name because it is
his
hometown in Brazil.
Kipman and his team were tasked with building a prototype
for the device,
using a reference device from PrimeSense, to demo for the executives of Microsoft. After the
demo on August 18, 2008, Kipman was given a launch date of Christmas 2010 for the device.
[1]



Motion Tracking Issue


The project hit a few snags along the way, the biggest being the motion tracking solution. The
problems they encountered included needing the user to stand in a T
-
shape in order for the
device to discover them, the device losing the player while doing cert
ain motions requiring them
to have to be re
-
discovered, and the device was only working with certain body types (the body
types the prototype was designed for, the executives). Using the Microsoft Research (MSR)
department the team was able to find a work
around for these issues. The idea was to break the
player’s

depth image, which was already being generated, into
distinguishable

body parts.
The
MSR team was able to develop an algorithm to break the single image into 31 distinguishable
body parts. From th
ese body parts, the final product is able to identify 48 joints to generate a
skeletal image. To get around the initial T
-
pose for identifying the user, MSR proposed the idea
to use computer learning to understand the human body. The team used video resear
ch of people
performing everyday movements around the living room, as well as different active movements.
The data was passed through a decision tree that would allow the device to distinguish any player
depth pixel and put it into one of the 31 body parts
.
[1]



Microphone Array Issue


The other major problem was with the voice recognition with the microphone array. The
problem was
filtering

out the background noise, since the device would most likely be closer to
the stereo of the TV than the user
, who wo
uld be

making
the
commands.
Another MSR team was
used to help develop and use different methods of echoing cancelation and noise suppression
tricks that allowed the audio processing to be much greater than the standard at the time. The last
step was to bui
ld an acoustical model based on the variations of American accents and various
acoustical properties into the microphone array. The model was completed at the end of
September 2010, and the release of the Kinect was on November 4
th
, 2010.
[1
]



The Kinect


The Kinect is a motion sensing device that layers regular two dimensional video with 3D
imaging and depth sensing.
The Kinect isn’t the only device of this kind that can be found on the
market, but the cost
comparisons

between the Kinect and other motion
sensing devices is quite
significant. One can go purchase a Kinect for around $130 where other motion sensing devices
and software run upwards of $1000. Shortly after the Kinect was released by Microsoft, people
were working on a way to hack the device to
use it with their computers rather than an Xbox.
Once this was accomplished, people were able to write motion sensing programs for a much
cheaper cost than ever before. This has led to the creation of many open source SDK’s to allow
users to create apps an
d programs utilizing the Kinect with their computer.

[4]



Figure
1
: This
figure
shows all of the internal components of the Kinect



Hardware


When the project idea was first announced at Microsoft, it was agreed that the project would be
working with the company PrimeSense. PrimeSense is
an Israli company, founded in 2005 by
Aviad Maizels, Alexander Shpunt, Ophir Sharon, Tamir Berliner and Dima
Rais, that developed
the depth sensing technology used in the Kinect.
[6
]

PrimeSense

developed a reference device for
Microsoft in early 2008, which consisted of a RGB camera, an infrared sensor, and an infrared
light source. Microsoft went on to license the devise and the ps1080 chip included with it. The
depth sensor they developed for
Microsoft was unlike any depth sensing devices previously
created. The main difference was in the way it calculated and identified the depth of objects.

[1
]



Video
Camera


The video camera on the Kinect is an RGB color VGA video camera. RGB stands for red, green,
and blue, which is a color model.
The RGB color model is used to put various amounts of red,
green, and blue together to generate or represent different colors. Thi
s kind of camera detects
different amounts of each color in the image it captures to allow the image to be represented
digitally. VGA stands for video graphics array and captures images at a standard of 640 x 480
pixels. The Kinect is capable of taking pic
tures in addition to video capture, and this is the
quality of that image capture. The main functionality of the camera is to help with the facial
recognition. Microsoft put this
ability

into to device to allow the Xbox to associate a player with
an accoun
t on the Xbox. When facial recognition is set up, the system will automatically sign
into the account associated with that user. The video camera captures at a frame rate of 30 fps.
[2
]



Depth Sensor


The Kinect has an infrared projector
and sensor
. The pr
ojector projects a grid across the room in
which the Kinect is located. The result of this projection is thousands, if not millions of little dots
spread out across the surfaces of the room. The naked eye is unable to see any of this as it occurs.
Using an

infrared camera to capture an image of a room in which a Kinect is running will result
in an image as follows.
[1
]



Figure2: Infrared projection on room


This image shows what the infrared camera of the Kinect is capturing. Each dot represents a
beam of infrared light leaving the Kinect and reflecting off the surface. The IR
sensor in this
devise uses a different method than
devices

previously created. Befo
re PrimeSense created the
reference for Microsoft, depth sensing
devices

used

the time of flight
method to calculate depth.
For
each of these beams
generated by the infrared light source, the
device

would
calcul
ate the
dis
tance each object is from the
devi
ce

by how long it would take for the beam to be reflected
back
.
This is a very expensive form of depth calculation. PrimeSense created a device that uses a
pattern in the projected infrared light, and measures the distance between the dots.
Microsoft
initializes each Kinect as they are made by placing them exactly 10 feet from a wall. The Kinect
then stores this data to be used as a base for calculations
, using a scale to base it off of the initial
data collected
.
[3
]



Data Collecting


Figure2

can als
o be used to see the range at which the Kinect is able to
collect

data. The areas of
the image that go from dark to the white dots represent the edge of the Kinect

s capturing ability.
The Kinect can capture image and data as close as three feet and as far

as eleven feet.
PrimeSense has said that the range is much larger, but Microsoft has declared this range for
optimal gaming.
[2]



Why Infrared?


Microsoft decided to
use infrared light to handle
motion detection and depth sensing for two
major reasons.
The first is that using infrared light results in no dependence on the amount of
light in the room. The infrared camera is able to pick up and identify the infrared
projections

in
both bright and dark situations. The second is that using an RGB image can g
enerate difficulties
if objects around the room are the same or close in color to that of the user’s clothes.
The
infrared projector and camera method measures distance by using a preset pattern of infrared
projections.

[3]

Because of this, objects of a ro
om could be all the same color, but the depth
sensor would still work normally. This takes care of the color issue that would occur with an
RGB camera source.

Figure3

below shows a depth image generated by the Kinect.
The software
behind the Kinect is what

actually generates these images. There will be more details pertaining
to the software involved with the Kinect further into the paper.

[1
]



Figure3: Depth Image created by Kinect



Multi
-
array Microphone


The Kinect has four separately placed microphones
that make up the multi
-
array microphone.
Three microphones are placed on the bottom left side of the device, and the fourth is placed on
the bottom right. The array of microphones are capable of canceling o
ut ambient noise, which
aid in the ability of voice commands, and is capable of pinpointing the location of the person
talking in the room. The array of microphones allows the device to replace the headset that was
once used for in game communication. It a
lso is capable of detecting multiple voices that are
located close the device.
The main purpose of the microphones is to allow the user/u
sers to
perform voice commands.

[1
]



Software


The hardware com
p
on
ents of the Kinect have been layered with software to produce uniformed
output and data to
secondary devices. The microphone array, camera, and depth sensor process
and pass on raw data. The software layered with this hardware is what drives the Kinect and

its
ability to perform many tasks.



Depth Calculations


The depth image is generated by the IR camera and the
ps1080

chip connected to it. The data is
then treated as input to the next component.

[3]

The depth data of a given pixel is passed as a 16
-
bit number. The first three bits represent the corresponding player for which the pixel is a part of.
The software can track up to six players in the image. So the player index can be a value between
one a
nd six, or is given a value of zero if the pixel doesn’t correspond to any player. Bits 3
-
15
are used to determine the depth at that pixel. Using bit manipulation and calculations, the
distance of the object of that pixel from the sensor can be calculated.

If the depth can’t be
determined, the value is zero. The image below shows a sample of data for a given pixel.
[1]



Figure4: Sample 16
-
bit depth data



Skeletal Viewing


Microsoft layered software over the depth sensing hardware to be able to produce a skeletal view
of the users. The process to reach this skeletal result starts with the depth image produced by the
IR camera. Using the depth image along with large amounts o
f data used during studies, the
software breaks the single blob image of the depth image into 31 distinguishable body parts.
(Shown
Appendix A Figure1

& Figure2
)

From this image, the Natal team was able to use the
body parts to identify a total of 48 joint
s.
Figure5
shows the
20 joints that the Microsoft SDK
exposes. During a demo of the Natal project, developer and team lead Kudo Tsunoda proclaimed
that the Kinect is able to identify the joints and motions of the user’s fingers.
[2]



Figure5: Skeletal vi
ew of Kinect data processing



Kinect USB Drivers & Libraries


Shortly after the release of the Kinect for Xbox, many people were working on a way to utilize
the Kinect for application development. The cable for the Kinect was able to plug directly into a
USB drive, but the drivers and libraries for it needed to be cr
eated. This was done by companies
like Microsoft and PrimeSense as well as everyday developers.

[3]



Hacking the Kinect


The initial purpose of the Kinect was solely to be used as a gaming device. Shortly after the
release of the Kinect, developers saw th
e potential benefits to using the Kinect as a developing
tool. Being a rather cheap depth sensing and motion tracking device developers would be able to
create motion tracking programs at a cheap price.

Johnny Chung Lee, a developer on the Natal
project w
anted there to be a public driver for the Kinect to work utilizing a USB of a computer.
Upset with the lack of work by Microsoft, he contacted AdaFruit and the two of them set up a
prize for the first person to develop a driver to read the data being produ
ced by the Kinect. The
fi
nal bounty was raised to $3,000.
[1]



OpenKinect


T
he solution was cracked seven days after the release of Kinect. This lead to OpenKinect, the
first open source library to allow the Kinect to work with Windows, Linux, and Mac.
Shortly
a
fter the creation of OpenKinect,

PrimeSense released their own drivers and libraries for the
Kinect and processing its data.

[7]

This was a great step for developing with the Kinect,
however
the
developers ran into many of the same issues that the

Natal team had to deal with
.

It took
Microsoft six months to release Microsoft SDK.
[5
]




Microsoft
Kinect
SDK


The first release of Microsoft SDK took place on June 17
th
, 2011.
This release included the
solutions to the skeletal and voice recognitions
as developed by the MSR

team
. This release was
under a non
-
commercial license, so developers were not able to sell any of the programs or
solutions they developed using the Kinect and this SDK. The SDK allows developers to create
programs across a variety
of Microsoft platforms using the Kinect as input for their programs.
Using Microsoft Visual Studio, one is able to set up the Kinect as an input device for the
program, and can utilize the data being passed by the Kinect. The developer is able to choose on
e
or all of the inputs passed by the Kinect. These inputs can be the video camera, the microphone,
or the depth sensor. The latest release of the Microsoft Kinect, along with the Kinect for
Windows device, allows developers to use it for commercial deploym
ents.
The presentation I
will be giving will go more in
-
depth on how to use the Microsoft SDK to develop a program
utilizing the Kinect.
[4]



Developing with the Kinect SDK


The first thing
to do

to begin development with the Kinect SDK is to make sure the Kinect can
be powered by the computer via the USB. Older Kinects that were released with the Xbox 360
console require a power adapter to power the Kinect. New Kinects that can be purchased direc
tly
for programming are capable of being powered via the USB drive. Next is to make sure both
Microsoft Visual Studios and the Kinect SDK are downloaded and installed.
The next step is
deciding what language to begin development in. The SDK works with C++,

C#, and VB. Once
a project is created using one of these languages, the Kinect must be set up as a reference. This is
done by adding a reference in the solution explorer. The component name to be referenced is
Microsoft.Kinect. The next step needed is to
add an existing project.
The existing project will be
the example programs included with the Microsoft Kinect SDK.
This is done to allow you to u
se
the program samples within the program itself
. This is done by installing those programs, and
then referenci
ng them in your program to allow access to them. This then allows the use of all
the Kinect libraries. Figure1 in
Appendix
B
shows the Kinect Sensor being listed auto complete
form.

This then allows the sensor to

be set up as an object for the program. Fro
m here, the object
can be used to enable data collection from the sensor (Shown in Figure2 of Appendix B).

There
are a lot of tools that can be used with the Kinect and a lot of ways to get at them. The best way
to get familiar with them is to work through

some examples and get used to working with the
Kinect data.

[8]



Conclusion


The Kinect was a project put together by Microsoft to rival the video game console of the
Nintendo Wii. Microsoft wasn’t anticipating the use of the Kinect in application programming,
as seen by them being the last to release a driver as well as their libr
aries for the Kinect. With the
Kinect currently being the cheapest device that allows motion detection and depth sensing, along
with the newly release commercial version of the device and SDK package, it is uncertain how
far Kinect application development
will go.



References


[1] Ashley, James and
Jarrett
Webb.
Beginning Kinect Programming with the Microsoft Kinect
SDK.
Apress, 2012. [eBook].


[2]
Hall, Jonathan
,
Sean Kean
, and
Phoenix Perry.
Meet the Kinect: An Introduction to
Programming Natural User
Interfaces.

Apress, 2011. [eBook]


[3] Borenstein, Greg.
Making Things See: 3D vision with Kinect, Processing, Arduino, and
MakerBot
. Make, 2012. [eBook]


[4] “Kinect for Windows.”
Microsoft Support.
Sat. 10 Mar. 2012.


<
http://support.xbox.com/en
-
US/kinect
-
for
-
windows/kinect
-
for
-
windows
-
info
>


[5] “OpenNI.”
PrimeSense.
Sun. 11 Mar. 2012.

<
http://75.98.78.94/default.aspx
>


[6] “About PrimeSense.”
PrimeSense.
Sun. 11 Mar. 2012.


<
http://www.primesense.com/en/company
-
profile
>


[7] “OpenKinect: About.”
OpenKinect.

Sat. 17 Mar. 2012


<
http://openkinect.org/wiki/Main_Page
>


[8] “Kinect for Windows Quickstart Series.”
Channel9.
By: Dan Fernandez. Sat. 17 Mar. 2012


<
http://channel9.msdn.com/Series/KinectQuickstart
>




Appendix A




Figure1: Depth Image of user (Blob)




Figure2: Depth Image of user with associated body parts (31 total)









Appendix B




Figure1


Kinect Sensor shown in the Visual Studio Library




Figure2


Kinect Sensor Data Options






Figure3


Code Example for initializing Kinect Senor for use.