Authors: Le
Gruenwald
,
Hamed
Chok
,
Mazen
Aboukhamis
Seventh IEEE International Conference on Data Mining

Workshops
Presented by:
Xiaowen
Wu
Introduction
Wireless Sensor networks
Composed of a large number of sensor nodes
Sensor node consists of sensing, data processing, and
communicating components
Usually low

cost, low

power, and limited in computational power
Can be error

prone
The position of sensor nodes need not be engineered or pre

determined.
sensor network protocols and algorithms must possess self

organizing
capabilities.
Wide Usages
Military applications
Environmental applications
Traffic Surveillance
And a lot more ...
Motivation
Missing data in sensor networks
Due to: sensor failure, power outage at the sensor node,
random occurrences of local interferences, a higher bit
error rate of the wireless radio transmissions as
compared with wired communications
The need to estimate missing data stream values
Re

querying data is expensive and can not guarantee to
provide original data
Estimation of missing data is crucial for efficient query
processing
Related work
The problem of estimating missing values has been studied
extensively in statistics
Mean substitution, expectation maximization, maximum
likelihood, Bayesian estimation, regression...
NASA/JPL Sensor Webs project (2003)
If one sensor fails, its neighboring sensors compensate for lost data
by increasing sampling rate
SPIRIT (2005)
Auto

regression as its basic forecasting model
TinyDB
(2005)
Taking average of all values in current round
WARM (2005)
Uses association rule mining and the sliding window
Freshness Association Rule Mining
(FARM)
Currently FARM assumes:
A centralized
sensornets
Sensor data is categorical
Use a data freshness framework on top of an association
rule method
Estimation based on weighted average
Weight derived from the strength of the corresponding sensor
association
Advantages
Incorporate the temporal aspect into association rules and
estimation
Compact data streams and allow a large history to appropriately
influence sensor rules
Capable of retrieving original data from its compact form
FARM: consider freshness of data
in estimation
Each round in the sensor stream is assigned a different
round weight based on its
recency
The more recent the data, the higher the weight
Round weights satisfy recursive relation:
w(1) = 1; w(n) = p*w(n

1); where p
1
An example of traffic sensors (L = Light, M= Moderate, H = Heavy, C = Congestion)
FARM: find association rules
among sensors
Apriori

based
Sensors as items
Define frequency of a sensor with respect to a certain state
Maximum order of frequent
itemsets
is limited to two
Incremental online update of the frequent
itemsets
of any order
impossible
Save storage
Incorporate the freshness factor
Sensor A
Sensor B
w.r.t
. state
e
Actual weight support
Sum of the round
weights of
two sensors A and B reporting the same state
e
/ Sum
of all round weights
Actual weight confidence
Sum of the round weights of two sensors A and B reporting the same state
e
/ Sum
of round weights the state
e
is reported by sensor A
FARM: data structures
buffer
1D array
Reset with new round data after each new sample with a special
value for missing values
2D ragged array
Viewed as the upper triangular part of a matrix where each of the
column and row sets consists of the entire set of sensors
An element in the 2D ragged array is an object corresponding to a
particular pair of sensors
An object holds the history of round info for the pair
Oject
[S
i
][
S
j
] (
i
>j) contains a 1D array of
s
entries, where s is the number
of sensor states
Each array entry in [S
i
][
S
j
] stores the sum of all round weights in which
both sensors reports the same state
State is indicated by the array entry index
FARM: algorithm
3 main algorithms
checkBuffer
Main routine
Checks for any missing values and directs to
estimateValue
Finally calls
update
update
Traverse the buffer to check if any two sensors reporting the same state
in current round
If yes,
set the current common report and increment the appropriate
weight sum by current round weight
O(d
2
)
estimateValue
Find all sensor associations with the missing sensor (MS) and use them
to compute the weighted average for the missing value
O(
ds
)
FARM
:
estimateValue
Step 1: Determine eligible states for estimation
State actual support >
minSup
Step 2: Create a temporary data structure
StateSet
for each eligible state
Group sensors reporting the same state in the current round
Step 3: Test for potential sensor associations
Identify all sensors of each
StateSet
that pair with MS
For sensor
S
i
, compare
actWeightSup
with
minSup
Delete
S
i
if NOT larger than
If larger than, then compare
actWeightConf
with
minConf
Consider
S
i
an eligible sensor if larger than, otherwise delete
S
i
Step 4: Compare the contribution weight of each eligible state
Based on supports between MS and each of the eligible sensors
reporting that particular eligible state
Step 5: Calculate the missing value and round it to the closest state
value
Experimental evaluation
Compare FARM with
Simple Linear Regression (SLR) approach
Multiple Linear Regression (MLR) approach
Curve Regression (CE) approach
Estimation by average (
Avg
)
WARM
SPIRIT
TinyDB
Experimental evaluation
Conduct simulation experiments on
Two real data sets: the air temperature sensor data collected
from NASA/JPL Webs Sensor Project in 2006 and traffic data
collected by the Department of Transportation in 2000
Time
Average time per round of FARM is longer than most other
methods by less than one millisecond
Negligible for most sensor network applications
Estimation accuracy
Evaluated using normalized root mean square error (RMSE)
FARM’s average RMSE is the best on both datasets
Experimental evaluation
Conclusion
FARM uses association rule mining to estimate
missing sensor data
FARM uses a data freshness framework to assimilate
the temporal element as well as to compact the data
stream for improved efficiency
FARM outperforms a pool of statistical and
algorithmic methods on the test data
FARM can be useful for datasets containing temporal
element
FARM can be adjusted by the damping factor for
datasets where
recency
is not important
Comments 0
Log in to post a comment