Models from Sensor Data

skillfulbuyerUrban and Civil

Nov 16, 2013 (3 years and 6 months ago)

54 views

Creating Dynamic Social Network
Models from Sensor Data

Tanzeem Choudhury

Intel Research / Affiliate Faculty CSE

Dieter Fox

Henry Kautz

CSE

James Kitts

Sociology


What

are we doing?


Why

are we doing it?


How

are we doing it?

Social Network Analysis


Work across the social & physical sciences
is increasingly studying the
structure of
human interaction

o
1967


Stanley Milgram


6 degrees of separation

o
1973


Mark Granovetter


strength of weak ties

o
1977

International Network for Social Network
Analysis

o
1992


Ronald Burt


structural holes: the social
structure of competition

o
1998


Watts & Strogatz


small world graphs

Social Networks


Social networks are naturally
represented and analyzed as graphs

Example Network Properties


Degree of a node


Eigenvector centrality

o
global importance of a node


Average clustering coefficient

o
degree to which graph decomposes into cliques



Structural holes

o
opportunities for gain by bridging disconnected
subgraphs

Applications


Many practical applications

o
Business


discovering organizational bottlenecks

o
Health


modeling spread of communicable
diseases

o
Architecture & urban planning


designing spaces
that support human interaction

o
Education


understanding impact of peer group
on educational advancement


Much recent theory on finding random
graph models that fit empirical data

The Data Problem


Traditionally data comes from manual
surveys of people’s recollections

o
Very hard to gather

o
Questionable accuracy

o
Few published data sets

o
Almost no longitudinal (dynamic) data


1990’s


social network studies based
on
electronic communication


Social Network Analysis of Email


Science, 6 Jan 2006

Limits of E
-
Data


Email data is cheap and
accurate, but misses

o
Face
-
to
-
face speech



the vast
majority of human interaction,
especially complex communication

o
The
physical context

of
communication


useless for
studying the relationship between
environment and interaction


Within a Floor
Within a Building
Within a Site
Between Sites
0
20
40
60
80
Proportion of Contacts
Face-to-Face
Telephone
High Complexity Information


Can we gather data on face to face

communication automatically?

Research Goal

Demonstrate that we can…


Model social network dynamics by gathering large
amounts of rich

face
-
to
-
face interaction
data

automatically

o
using wearable sensors

o
combined with statistical machine learning techniques


Find
simple and robust measures

derived from sensor
data

o
that are
indicative of people’s roles

and relationships

o
that capture the

connections between

physical environment
and

network dynamics


Questions we want to investigate:


Changes in social networks

over time:

o
How do interaction patterns dynamically relate to
structural position in the network?

o
Why do people sharing relationships tend to be similar?

o
Can one predict formation or break
-
up of communities?


Effect of location

on social networks

o
What are the spatio
-
temporal distributions of interactions?

o
How do locations serve as hubs and bridges?

o
Can we predict the popularity of a particular location?

Support


Human and Social Dynamics


one of
five new priority areas for NSF

o
$800K award to UW / Intel / Georgia Tech team

o
Intel at no
-
cost


Intel Research donating hardware and
internships


Leveraging work on sensors &
localization from other NSF & DARPA
projects

Procedure


Test group

o
32 first
-
year incoming CSE graduate students

o
Units worn 5 working days each month

o
Collect data over one year


Units record

o
Wi
-
Fi signal strength, to determine location

o
Audio features adequate to determine when conversation is
occurring


Subjects answer short monthly survey

o
Selective ground truth on # of interactions

o
Research interests


All data stored securely

o
Indexed by code number assigned to each subject


Privacy


UW Human Subjects Division
approved procedures after 6 months of
review and revisions


Major concern was privacy, addressed
by

o
Procedure for recording audio features without
recording conversational content

o
Procedures for handling data afterwards

Data Collection

Intel Multi
-
Modal Sensor Board

Real
-
time
audio feature
extraction

audio

features

WiFi

strength

Coded

Database

code

identifier

Data Collection


Multi
-
sensor board sends sensor data stream
to iPAQ


iPAQ computes audio features and WiFi
node identifiers and signal strength


iPAQ writes audio and WiFi features to SD
card


Each day, subject uploads data using his or
her code number to the coded data base

Older Procedure


Because the real
-
time feature extraction
software was not finished in time, the
Autumn 2005 data collections used a
different process (also approved)

o
Raw data was encrypted on the SD card

o
The upload program simultaneously unencrypted
and extracted features

o
Only the features were uploaded


Speech Detection


From the audio signal, we want to
extract features that can be used to
determine

o
Speech segments

o
Number of different participants (but not identity
of participants)

o
Turn
-
taking style

o
Rate of conversation (fast versus slow speech)


But the features must not allow the
audio to be reconstructed!

Speech Production

vocal tract

filter

Fundamental frequency

(F0/pitch) and
formant frequencies

(F1, F2 …) are the

most important components for speech synthesis

The source
-
filter Model

Speech Production


Voiced sounds: Fundamental frequency (i.e.
harmonic structure) and energy in lower
frequency component


Un
-
voiced sounds: No fundamental frequency
and energy focused in higher frequencies


Our approach: Detect speech by reliably
detecting voiced regions


We do not extract or store any formant
information. At least three formants are required
to produce intelligible speech*



* 1. Donovan, R. (1996). Trainable Speech Synthesis. PhD Thesis. Cambridge University


2. O’Saughnessy, D. (1987). Speech Communication


Human and Machine,


Addison
-
Wesley.

Goal: Reliably Detect Voiced Chunks in
Audio Stream

Speech Features Computed

1.
Spectral entropy

2.
Relative spectral entropy

3.
Total energy

4.
Energy below 2kHz (low frequencies)

5.
Autocorrelation peak values and number of
peaks

6.
High order MEL frequency cepstral coefficients

Features used: Autocorrelation

Autocorrelation of (a) un
-
voiced frame and (b) voiced frame.


Voiced chunks have higher non
-
initial autocorrelation peak and fewer
number of peaks



(a)

(b)

Features used: Spectral Entropy


Spectral entropy: 3.74


Spectral entropy: 4.21

FFT magnitude of (a) un
-
voiced frame and (b) voiced frame.


Voiced chunks have lower entropy than un
-
voiced chunks, because
voiced chunks have more structure



Features used: Energy


Energy in voiced chunks is concentrated in the lower frequencies



Higher order MEL cepstral coefficients contain pitch (F0) information.

The lower order coefficients are NOT stored

Segmenting Speech Regions

Attributes Useful for Inferring Interaction


Attributes that can be reliably extracted from
sensors:

o
Total number of interactions between people

o
Conversation styles


e.g. turn
-
taking, energy
-
level

o
Location where interactions take place


e.g. office,
lobby etc.

o
Daily schedule of individuals


e.g. early birds, late
nighters


Locations


Wi
-
Fi signal strength can be used to
determine the approximate location of
each speech event

o
5 meter accuracy

o
Location computation done off
-
line


Raw locations are converted to nodes
in a coarse topological map before
further analysis

Topological Location Map


Nodes in map are identified
by
area types

o
Hallway

o
Breakout area

o
Meeting room

o
Faculty office

o
Student office


Detected conversations

are associated with their
area type

Social Network Model


Nodes

o
Subjects (wearing sensors, have given consent)

o
Public places (e.g., particular break out area)

o
Regions of private locations (e.g., hallway of
faculty offices)

o
Instances of conversations


Edges

o
Between subjects and conversations

o
Between places or regions and conversations

Non
-
instrumented Subjects


We may recruit additional subjects who do
not wear sensors


Such subjects would allow us to infer
information about their behavior indirectly,
and to appear (coded) as a node in our
network model

o
E.g., based on their particular office locations


Only people who have provided written
consent appear as entities in our network
models



Disabling Sensor Units


As a courtesy, subjects will disable
their units in particular classrooms or
offices


Access to the Data


Publications about this project will
include summary statistics about the
social network, e.g.:

o
Clustering coefficient

o
Motifs (temporal patterns)


We will not release the actual graph

o
This is prohibited by our HSD approval


We welcome additional collaborators