Authors: Le Gruenwald, Hamed Chok, Mazen Aboukhamis Seventh IEEE International Conference on Data Mining - Workshops

voltaireblingΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

177 εμφανίσεις

Authors: Le
Gruenwald
,
Hamed

Chok
,
Mazen

Aboukhamis

Seventh IEEE International Conference on Data Mining
-

Workshops



Presented by:
Xiaowen

Wu

Introduction


Wireless Sensor networks


Composed of a large number of sensor nodes


Sensor node consists of sensing, data processing, and
communicating components


Usually low
-
cost, low
-
power, and limited in computational power


Can be error
-
prone


The position of sensor nodes need not be engineered or pre
-
determined.


sensor network protocols and algorithms must possess self
-
organizing
capabilities.


Wide Usages


Military applications


Environmental applications


Traffic Surveillance


And a lot more ...

Motivation


Missing data in sensor networks


Due to: sensor failure, power outage at the sensor node,
random occurrences of local interferences, a higher bit
error rate of the wireless radio transmissions as
compared with wired communications


The need to estimate missing data stream values


Re
-
querying data is expensive and can not guarantee to
provide original data


Estimation of missing data is crucial for efficient query
processing

Related work


The problem of estimating missing values has been studied
extensively in statistics


Mean substitution, expectation maximization, maximum
likelihood, Bayesian estimation, regression...


NASA/JPL Sensor Webs project (2003)


If one sensor fails, its neighboring sensors compensate for lost data
by increasing sampling rate


SPIRIT (2005)


Auto
-
regression as its basic forecasting model


TinyDB

(2005)


Taking average of all values in current round


WARM (2005)


Uses association rule mining and the sliding window


Freshness Association Rule Mining
(FARM)


Currently FARM assumes:


A centralized
sensornets


Sensor data is categorical


Use a data freshness framework on top of an association
rule method


Estimation based on weighted average


Weight derived from the strength of the corresponding sensor
association


Advantages


Incorporate the temporal aspect into association rules and
estimation


Compact data streams and allow a large history to appropriately
influence sensor rules


Capable of retrieving original data from its compact form


FARM: consider freshness of data
in estimation


Each round in the sensor stream is assigned a different
round weight based on its
recency


The more recent the data, the higher the weight


Round weights satisfy recursive relation:


w(1) = 1; w(n) = p*w(n
-
1); where p

1


An example of traffic sensors (L = Light, M= Moderate, H = Heavy, C = Congestion)

FARM: find association rules
among sensors


Apriori
-
based


Sensors as items


Define frequency of a sensor with respect to a certain state


Maximum order of frequent
itemsets

is limited to two


Incremental online update of the frequent
itemsets

of any order
impossible


Save storage


Incorporate the freshness factor


Sensor A

Sensor B
w.r.t
. state
e


Actual weight support


Sum of the round
weights of
two sensors A and B reporting the same state
e

/ Sum
of all round weights


Actual weight confidence


Sum of the round weights of two sensors A and B reporting the same state
e
/ Sum
of round weights the state
e

is reported by sensor A



FARM: data structures


buffer


1D array


Reset with new round data after each new sample with a special
value for missing values


2D ragged array


Viewed as the upper triangular part of a matrix where each of the
column and row sets consists of the entire set of sensors


An element in the 2D ragged array is an object corresponding to a
particular pair of sensors


An object holds the history of round info for the pair


Oject

[S
i
][
S
j
] (
i
>j) contains a 1D array of
s

entries, where s is the number
of sensor states


Each array entry in [S
i
][
S
j
] stores the sum of all round weights in which
both sensors reports the same state


State is indicated by the array entry index

FARM: algorithm


3 main algorithms


checkBuffer


Main routine


Checks for any missing values and directs to
estimateValue


Finally calls
update


update


Traverse the buffer to check if any two sensors reporting the same state
in current round


If yes,
set the current common report and increment the appropriate
weight sum by current round weight


O(d
2
)


estimateValue


Find all sensor associations with the missing sensor (MS) and use them
to compute the weighted average for the missing value


O(
ds
)


FARM
:
estimateValue


Step 1: Determine eligible states for estimation


State actual support >
minSup


Step 2: Create a temporary data structure
StateSet

for each eligible state


Group sensors reporting the same state in the current round


Step 3: Test for potential sensor associations


Identify all sensors of each
StateSet

that pair with MS


For sensor
S
i
, compare
actWeightSup

with
minSup


Delete
S
i

if NOT larger than


If larger than, then compare
actWeightConf

with
minConf


Consider
S
i

an eligible sensor if larger than, otherwise delete
S
i


Step 4: Compare the contribution weight of each eligible state


Based on supports between MS and each of the eligible sensors
reporting that particular eligible state


Step 5: Calculate the missing value and round it to the closest state
value


Experimental evaluation


Compare FARM with


Simple Linear Regression (SLR) approach


Multiple Linear Regression (MLR) approach


Curve Regression (CE) approach


Estimation by average (
Avg
)


WARM


SPIRIT


TinyDB


Experimental evaluation


Conduct simulation experiments on


Two real data sets: the air temperature sensor data collected
from NASA/JPL Webs Sensor Project in 2006 and traffic data
collected by the Department of Transportation in 2000


Time


Average time per round of FARM is longer than most other
methods by less than one millisecond


Negligible for most sensor network applications


Estimation accuracy


Evaluated using normalized root mean square error (RMSE)


FARM’s average RMSE is the best on both datasets


Experimental evaluation

Conclusion


FARM uses association rule mining to estimate
missing sensor data


FARM uses a data freshness framework to assimilate
the temporal element as well as to compact the data
stream for improved efficiency


FARM outperforms a pool of statistical and
algorithmic methods on the test data


FARM can be useful for datasets containing temporal
element


FARM can be adjusted by the damping factor for
datasets where
recency

is not important