AJCN Online Supplement:

glibdoadingAI and Robotics

Oct 20, 2013 (3 years and 11 months ago)

96 views


1

AJCN
Online
S
upplement
:

M
eal

pattern analysis using supervised data mining techniques



Authors:

Áine P. Hearty

Michael J. Gibney

Institute of Food & Health, University College Dublin, Ireland.



M
ETHODS

F
ood coding system

Each row of the
Irish NSIFCS
orig
inal database (222,404 records relating to >3000 unique
food codes) represents a single food per eating occasion per day per subject with the nutrient
intake associated with that amount of the food consumed.

For ease of analysis, each of the of
unique food

codes consumed by the respondents (>3,000 codes) were re
-
coded into one of 62
food groups

(
Table
A
)
.



Creation of ‘first order’ meal codes

The food consumption dataset was aggregated from the food group code level to a new ‘meal
code’ level and this meal

code became the new unique identifier of each eating occasion.
Using the data mining software package SPSS Clementine
v.9.0

(SPSS Inc
®
., Chicago, IL),
the data were manipulated to alter the food group code field to one containing a comma
-
separated list of

all food groups that were consumed at that eating occasion. A total of 49,671
eating occasions were identified, and based on the 62 food groups, a total of 17,928 distinct
meals were observed. These were represented by unique comma
-
separated codes
called

‘1
st
-
order’ meal codes.


For example, if we consider the foods consumed at a single breakfast meal by one
subject and the food group codes these are assigned to: breakfast cereal (08), low fat milk
(12), whole
-
meal bread (06), butter & fat spreads (20), p
reserves (47), fruit juice (29). This
breakfast meal can then be represented by a single comma
-
separated code where all
information on the food groups consumed are retained i.e. (08,12,06,20,47,29). Similar codes
were created for all meals in the dataset,
and these are referred to as ‘1
st
-
order’ codes.

Creation of ‘second order’ meal codes


2

The scale of the task for meal pattern analysis is made difficult by the very large number of
‘1
st
-
order’ meal codes and the lack of homogeneity in the structure of these

meal codes which
ranged from string variables consisting of just one food group to meal codes with very long
strings consisting of as many as 25 food groups. Therefore the data was simplified and given
a more homogenous structure by the creation of so cal
led ‘2
nd
-
order’ meal codes. For this
purpose, four meal types were used: breakfast, light meal, main meal and snacks. Within each
of the meal types ‘2
nd
-
order’ meal codes were created to represent the main food components
per meal and to accommodate as mu
ch variability as possible.

For breakfast, 25 ‘2
nd
-
order’ meal codes were created, 47 were created for light meals,
35 were created for main meals and 27 were created for snacks. A code for each meal type
was also created to represent non
-
consumption of
o
ne of the four meal types
.


Supervised Learning techniques

The goal in supervised learning is to predict one or more target values from one or more input
variables. Supervised learning is a form of regression that relies on input and output variables
of t
he data (
1
). Two well
-
known and recognised supervised data mining techniques include
artificial neural networks
(ANNs)
and decision trees.


Neural Networks

Artificial intelligen
ce (AI) techniques, including
ANNs
,

have been attracting growing interest
over
the last few years. They have been successfully used for analysing problems that
conventional rule
-
based techniques have difficulty in dealing with (
2
). ANNs are complex
mathematical models that are distantly based on the human neuronal structure (
3
). They

offer
a means of efficiently modelling large and complex problems in which there may be hundreds
of predictor variables that have many interactions.

The architecture (or topology) of an ANN is the number of nodes and hidden layers
and how they are connec
ted (
4
).


The ANN starts with an input layer, where each node
corresponds to a predictor variable. By far the most popular architecture is the multilayered
perception (MLP) which can be trained by feed
-
forward back
-
error propagation or by other
training me
thods (
5,

6
). The MLP is typically organised as a set of interconnected layers of
arti
ficial neurons (
Figure A
). Input neurons are connected to each of a number of nodes in a
hidden layer, which are in turn connected to an output layer. Each interconnectio
n has an
associated weight (numerical value) that uniquely identifies the interconnection.

Feed
forward’ means that the value of the output is calculated based on the input node values and a

3

set of initial weights. ‘Back
-
propagation’ is when the error in
the output is computed by
finding the difference between the calculated and the desired output. Then the error from the
output is assigned to the hidden layer nodes proportionally to their weights. Finally the error
at each of the hidden and output nodes i
s used by the algorithm to adjust the weight coming
into that node to reduce the error (
4, 7
). This process is repeated for each row in the training
set. The training set will be used repeatedly until the error no longer decreases. At that point
the ANN is

considered to be trained to find the pattern in the test set. Most software programs
automate the entire training process (i.e. to avoid over/under training the data)

(
8
).
Overtraining is the process whereby the ANN has simply memorised the trivial detail
s of the
training samples and performs perfectly on the training data but fails to deliver useful
predictions with an independent test set (
3
).

ANNs operate successfully as decision support tools because they simultaneously take
into account many factors b
y combining the input variables in different ways for tasks such as
classification, prediction and diagnosis (
6
). They have the ability to detect all interactions
between predictor variables due to the presence of the hidden layer (
9
). An understanding of
the underlying mathematical equations that comprise algorithms such as back
-
propagation is
not required
(
8
).

Recently there has been widespread interest in using ANNs for a wide range
of problems in domains as diverse as finance
, physics, engineering and g
eology. Their use

has

particularly
become accepted in medical applications (
10, 11
).


Decision Trees

A decision tree is a predictive model that, as it name implies, can be viewed as a tree. For
many problems of classification where large datasets are used

and the information contained
is complex, decision trees can provide a useful solution. Decision trees are grown through an
iterative splitting of data into discrete groups, where the goal is to maximise the ‘distance’
between groups at each split. The fi
rst component is the top decision node or root node, which
specifies a test to be carried out. Depending on the algorithm, each node may have two or
more branches. Each branch will lead to another decision node or to the bottom of the tree,
called a leaf n
ode. Each node uses the data from the case to choose the appropriate branch (
4
).

A number of different algorithms may be used for building decision trees including
CHAID (Chi
-
squared Automatic Interaction Detection) (
12
),

CART (Classification and
Regressi
on Trees) (
13
) and C4.5 (
14
). The two most frequently reported algorithms in the
literature are CART and
C5.0 (a variant of C4.5)
.
T
he algorithms differ in the criterion used to
drive the splitting. For building C5 models, an information theory based measu
re is used

4

(information gain theory), while for CART models, a dispersion measure (a Gini value) is
used.
T
herefore the two algorithms may not produce identical trees for the same data.
A
nother
difference is that the C5 algorithm can represent solutions as

decision trees or in rule set form,
while CART only produces decision trees. A decision tree produces a unique classification for
each data record, while more than one rule in a rule set may apply to a data record, which
adds complexity but from which a p
rediction can be made.

Decision trees also have many advantages to their use in prediction. For example, they
produce mutually exclusive rules that are collectively exhaustive with respect to the training
database. This automatically eliminates any fields
not important in making the prediction.
Properly deployed, classification trees are not a ‘black box’ (
15
). Users can trace back the
model and either accept or reject the proposed suggestion.
Considering their attractiveness,
decision tree algorithms have
been widely applied in different fields of research, such as in
medicine (
16
-
21
), finance (
15
) and engineering (
22
).

However, as with all statistical
techniques, decision trees have several limitations,

such as the risk of over
-
fitting (where the
algorithm

tries to fit every object in the dataset leading to large decision trees with some
meaningless branches) (
16
). Tree size can be controlled by
stopping rules

that limit growth.
An alternative to stopping rules is to prune the tree where the tree is pruned
back to the
smallest size that does not compromise accuracy (
4
). Another common criticism of decision
trees is that they choose a split using a ‘greedy’ algorithm, in which the decision on which
variable to split on doesn’t take into account this effect on

future splits (
4
).




R
EFERENCES


1.
Agatonovic
-
Kustrin S, Beresford R. Basic concepts of artificial neural network (ANN)
modeling and its application in pharmaceutical research.
Journal of Pharmaceutical &
Biomedical Analysis

2000;22:717
-
27.

2.
Cacciafesta M,
Campana F, Trani I, I et al.
A neural network study of the correlation
between metabolic
-
cardiovascular diseases and disability in elderly people.
Archives
of Gerontology & Geriatrics

2000;31:257
-
66.

3.

Wei JT, Zhang Z, Barnhill SD, Madyastha KR, Zhang H, Oes
terling JE.
Understanding artificial neural networks and exploring their potential applications for
the practicing urologist.
Urology

1998;52:161
-
72


4.
Two Crows Corp. Introduction to data mining and knowledge discovery. 1999.
http://www.twocrows.com/intro
-
dm.pdf


5

5.
Cross SS, Harrison RF, Kennedy RL. Introduction to neural networks. Lancet
1995;346:1075
-
9.

6.
Dayhoff JE, DeLeo JM. Artificial neural networks: opening the black box.
Cancer
2001;91:1615
-
35.

7.
Duh MS
, Walker AM, Ayanian JZ. Epidemiologic interpretation of artificial neural
networks.
American Journal of Epidemiology

1998;147:1112
-
22.

8.
Brickley MR, Shepherd JP, Armstrong RA. Neural networks: a new technique for
development of decision support systems in
dentistry.
Journal of Dentistry

1998;26:305
-
9.

9.
Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic
regression for predicting medical outcomes.
Journal of Clinical Epidemiology

1996;49:1225
-
31.

10.

Ramesh AN, Kambhampati C, M
onson JR, Drew PJ. Artificial intelligence in
medicine.
Annals of the Royal College of Surgery of England
2004;86:334
-
8.

11.

Lisboa PJ, Taktak AF. The use of artificial neural networks in decision support in
cancer: a systematic review.
Neural Networks

2006;19
:408
-
15.

12.

Kass GV. An exploratory technique for investigating large quantities of categorical
data.
Journal of Applied Statistics
1980;29:119
-
27.

13.

Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees.
Pacific Grove, CA: Wadsworth
& Brooks/Cole Advanced Book & Software, 1984.

14.

Quinlan J. C4.5: programs for machine learning.
San Mateo, CA: Morgan Kaufmann,
1993.

15.

Sorensen EH, Miller KL, Ooi CK.
The decision tree approach to stock selection.
Journal of Portfolio Management

2000;27:42
-
52
.

16.

Zorman M, Stiglic MM, Kokol P, Malcic I. The limitations of decision trees and
automatic learning in real world medical decision making.
Journal of Medical Systems

1997;21:403
-
15.

17.

Mair J, Smidt J, Lechleitner P, Dienstl F, Puschendorf B. A decision tree
for the early
diagnosis of acute myocardial infarction in nontraumatic chest pain patients at hospital
admission.
Chest

1995;108:1502
-
9.

18.

Markey MK, Tourassi GD, Floyd CE, Jr. Decision tree classification of proteins
identified by mass spectrometry of blood

serum samples from people with and without
lung cancer.
Proteomics
2003;3:1678
-
9.

19.

Pavlopoulos SA, Stasis AC, Loukis EN.
A decision tree
--
based method for the
differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds.
Biomedica
l Engineering Online

2004;3:21.


6

20.

Ozsen S, Kara S, Latifoglu F, Gunes S. A new supervised classification algorithm in
artificial immune systems with its application to carotid artery Doppler signals to
diagnose atherosclerosis.
Computer Methods & Programs in

Biomedicine

2007;88:246
-
55.

21.

Kunene KN, Weistroffer HR. An approach for predicting and describing patient
outcome using multicriteria decision analysis and decision rules.
European Journal of
Operational Research

2008;185:984
-
97.

22.

Bevilacqua M, Braglia M, M
ontanari R. The classification and regression tree
approach to pump failure rate analysis.
Reliability Engineering and System Safety

2003;79:59
-
67.








7

Table A.









Consumers of each of the 62 food groups used in the data mining analyses













Food



Consumers


Food



Consumers

group

Data mining food group codes

n

%


group

Data mining food group codes

n

%

1

Rice & Rice Dishes

522

37.9


39

Poultry

1015

73.6

2

Pasta, Noodles & Dishes

523

37.9


40

Offal & Dishes

56

4.1

3

Flours, Grain
s & Starches

29

2.1


41

Meat Dishes

748

54.2

4

Pizza & Savoury Dishes

618

44.8


42

Poultry Dishes

625

45.3

5

White Bread & Rolls

1298

94.1


43

Burgers

433

31.4

6

Wholemeal Bread & Rolls

1001

72.6


44

Meat Products

685

49.7

7

Other Breads

707

51.3


45

A
lcoholic Beverages

896

65.0

8

Breakfast Cereals

1008

73.1


46

Sugar & Sweeteners

806

58.4

9

Biscuits

1054

76.4


47

Preserves, Syrups & Spreads

776

56.3

10

Cakes, Buns & Pastries

915

66.4


48

Chocolate Confectionary

867

62.9

11

Whole Milk

1011

73.3


49

Non
-
Chocolate Confectionary

327

23.7

12

Low Fat Milk

624

45.3


50

Savoury Snacks

669

48.5

13

Other Milks

41

3.0


51

Soups

625

45.3

14

Creams

323

23.4


52

Sauces, Dressings & Dips

1228

89.1

15

Cheeses

1023

74.2


53

Sausages

812

58.9

16

Yogurts

425

30.8


54

Nutritional Supplements

344

24.9

17

Ice
-
cream

503

36.5


55

Tea

1242

90.1

18

Puddings & Desserts

567

41.1


56

Herbal Tea

46

3.3

19

Egg & Egg Dishes

944

68.5


57

Coffee

757

54.9

20

Butter & Fat Spreads

1203

87.2


58

Drinking Chocolate, Cocoa & Malte
d
Drinks

79

5.7

21

Low Fat Spreads

385

27.9


59

Water

1041

75.5

22

Oils & Hard Fats

202

14.6


60

Carbonated Beverages

783

56.8


8

23

Potatoes & Potato Dishes

1344

97.5


61

Diet Carbonated Beverages

288

20.9

24

Chips & processed Potato products

1073

77.8


62

Squashes & Cordials

293

21.2

25

Vegetable Dishes

680

49.3






26

Peas, Beans & Lentils

1067

77.4






27

Green Vegetables, Carrots & Others

1326

96.2






28

Salad Vegetables

1051

76.2






29

Fruit Juice

567

41.1






30

Fresh Fruit

1041

75.5






31

Dried Fruit

61

4.4






32

Tinned Fruit

199

14.4






33

Nuts & Seeds

162

11.7






34

Herbs & Spices

80

5.8






35

Fish & Fish Products

915

66.4






36

Fish Dishes

85

6.2






37

Bacon & Ham

1199

86.9






38

Beef, Veal, Lamb & Pork

1075

7
8.0
























n/a= non applicable
















9

FIGURE A




Graphical representation of the basic architecture of an ANN network. The ANN starts
with an
input layer
, where each node corresponds to a predictor variable. These input
nodes a
re connected to a number of nodes in a hidden layer. In the hidden layer, each
node is multiplied by a connective weight, they are added together according to a certain
function, they are passed to the nodes in the next layer, and finally to the output lay
er.
The output layer consists of one or more response variables.