Machine Learning for Control
of Morphing Air Vehicles
John Valasek, Amanda Lampton, Adam Niksch
Aerospace Engineering Department
Texas A&M University
SAE Aerospace Control and Guidance Systems Committee
Meeting 99
1 March 2
007
Boulder, CO
Valasek, Lampton, Niksch

2
Student Research Team
2006

2007
Valasek, Lampton, Niksch

3
Briefing Agenda
Shape Changing / Morphing for Micro Air Vehicles
Learning
Control
Learning and Control
–
Adaptive

Reinforcement Learning Control
Some Results
Some Conclusions
Research Issues
Extensions
Valasek, Lampton, Niksch

4
Biomimetic Example 1
Valasek, Lampton, Niksch

5
300 million year old design
97% success rate in capturing prey
Flight relies entirely on 3

D unsteady aerodynamics
Unstable in all axes
Can hover as well as fly from a forward speed of +100 body
length/sec (~ 60 mph) to

3 bl/sec within 5 bl
Can alter heading by 90 degs in less than 0.1 secs
Dragonflies and other flying insects have the fastest visual processing
system in the animal kingdom
Biomimetic Example 2
These feats are accomplished with the neuro

circuitry
of a brain smaller than a sesame seed.
Andres Meade, Rice University
Valasek, Lampton, Niksch

6
Biomimetic Example 3
Courtesy Graham Taylor, Oxford University
Cossack
Valasek, Lampton, Niksch

7
Biomimetic Example 3
Courtesy Graham Taylor, Oxford University
Valasek, Lampton, Niksch

8
Biomimetic Example 3
Cossack sysid using OKID
Valasek, Lampton, Niksch

9
Biomimetic Example 3
Courtesy Graham Taylor, Oxford University
Kinematic Engineering Model of Eagle Wing
Valasek, Lampton, Niksch

10
Reconfigurable; shape changing (morphing); rigid or elastic body plants
Multi

Input Multi

Output (MIMO)systems
Distributed sensing, actuation, and propulsion
–
Multiple, possibly non co

located sensors
–
Can have large number of actuators
control allocation problem
–
High bandwidth actuators
gust alleviation
Robust Adaptive Control
–
Flow control
–
Exogenous inputs (turbulence and gusts)
–
Damage and fault tolerance
Distributed and limited processing capability
–
Volumetric limit
–
Mass limit
–
Power limit
–
Real

time operation
Integrated guidance, navigation, and control
–
Self learning, self adapting, self tuning
Biomimetic Flight Summary
Information processing system
Valasek, Lampton, Niksch

11
A Multi

Role Platform that:
Changes its state substantially
to adapt to changing
mission environments.
Provides superior system capability not possible
without reconfiguration.
Uses a design that integrates
innovative
combinations of advanced materials, actuators,
flow controllers, and mechanisms
to achieve the
state change.
Morphing Aircraft
Aerospace America, Feb 2004
DARPA’s definition
Changes On The Order Of 50% More Wing Area
Or Wing Span And Chord
Valasek, Lampton, Niksch

12
Morphing for Mission Adaptation
–
Large scale, relatively slow, in

flight shape change to
enable a single vehicle to perform multiple diverse
mission profiles
or:
Morphing for Control
–
In

flight physical or virtual shape change to achieve
multiple control objectives (maneuvering, flutter
suppression, load alleviation, active separation control)
Mission
Adaptation
Control
John Davidson, NASA Langley, AFRL Morphing Controls Workshop
–
Feb 2004
Which Morphing?
Valasek, Lampton, Niksch

13
Lockheed Martin
NextGen
&
Barron Associates
Large Morphing Air Vehicle Models
Valasek, Lampton, Niksch

14
Small Morphing Air Vehicle Models
Cornell (collaborator)
Garcia & Lipson
–
Morphing dynamical model and simulation that incorporates
aerodynamic and structural effects
–
Validated with experimental data
–
Basic morphing parameters (incidence angle, dihedral angle)
Maryland
Hubbard
–
Actuating a flapping wing structure with SMA’s
–
Structure, material, distribution of actuators
Florida
Lind
–
Morphing flight demonstrator vehicle
–
Basic morphing parameters (dihedral angle, sweep angle)
–
H

inf control
lots of modeling, very little control
Valasek, Lampton, Niksch

15
•
Use biologically inspired approach to understand and control the:
•
morphing (
shape changing
),
•
flight control (
stability, flight path, gust tolerance, stall
)
•
maneuvering (
perching and hovering
)
of multi

mission micro air vehicles.
•
Control the physics, in concert with nonlinearities.
•
Not as a robustness bandaid for aerodynamic and structural
uncertainties & lack of understanding
•
Shape change the entire vehicle, not a component on the vehicle
•
DARPA definition
Technical Approach and Scope
Valasek, Lampton, Niksch

16
•
Address:
•
WHEN
to morph, perch, etc.
•
HOW
to morph, perch, etc.
•
LEARNING
to morph, perch, etc.
All while keeping the (pointy?) end facing forward
.
Big Picture Research Goals
Valasek, Lampton, Niksch

17
Valasek, Lampton, Niksch

18
Common Mathematical Domain Helps
To Avoid Ad

hoc Approaches
?
?
Machine Learning
Reconfiguration Policy
State based methods
learning
Adaptive Control
Parameters in a Known
Functional Relationship
State based methods
adapting
The Mathematical Domain Issue
Valasek, Lampton, Niksch

19
Adaptive

Reinforcement
Learning Control (A

RLC)
Conceptual Control Architecture
for Reconfigurable or Morphing Aircraft
SAMI
Structured Adaptive Model Inversion
(Traditional Control)
Flight controller to handle wide
variation in dynamic properties
due to shape change
ML
Machine Learning
(Intelligent Control)
Learn the morphing dynamics and
the optimal shape at every flight
condition in real

time
Valasek, Lampton, Niksch

20
Synthetic Jets for
Virtual Shaping and
Separation Control
MultiSensor MEMS
Arrays for Flow Control
Feedback
Adaptive
Controller
Sensed Information
Aggregation
Control Information
Distribution
Reconfiguration Command
Generation
System Performance
Evaluation
Knowledge
Base
Environment
Control Architecture
Valasek, Lampton, Niksch

21
2

D Plate Rectangular Block Ellipsoid Delta Wing Final
2004 2005 2006 2007
Objective
Morphing Air Vehicle Evolution
Valasek, Lampton, Niksch

22
Morphing Air Vehicle Model

TiiMY
Shape
Ellipsoidal shape with varying axis dimensions.
Constant volume (V) during morphing
2 independent variables:
y
and
z,
dependent dimension
Morphing Dynamics
Smart material: shape memory alloy (SMA)
Morphing Dynamics : Simple Nonlinear Differential
Equations
Valasek, Lampton, Niksch

23
Morphing Dynamics
Y

morphing
Z

morphing
Valasek, Lampton, Niksch

24
Shape Morphing Simulation
TiiMY
Valasek, Lampton, Niksch

25
Optimal Shapes at
Various Flight Conditions
Optimality is defined by identifying a cost function.
J=J (Current shape, Flight condition)
Valasek, Lampton, Niksch

26
6

DOF Mathematical Model
for Dynamic Behavior
Variables
Nonlinear 6
–
DOF Equations
–
Kinematic level:
–
Acceleration level:
Drag Force
–
Function of air density, square of velocity along axis, and
projected area of the vehicle perpendicular to the axis
additional dynamics due to morphing
Valasek, Lampton, Niksch

27
Learning
Valasek, Lampton, Niksch

28
–
Machine Learning
•
Learn the optimal control mechanism by wind

tunnel
experiments & flight tests
•
Possible learning algorithms include
Artificial Neural
Networks (ANN)
,
Explanation

Based Learning (EBL)
,
and
Reinforcement Learning (RL)
Candidates to develop inference mechanism
–
Rules

Based Expert System
•
Model the knowledge of human experts
•
Imitate the natural behaviour of birds
Question:
How Many Control Theorists
Does It Take To Change A Using
Artificial Neural Networks
?
Knowledge Based Control
–
Biologically inspired control process
•
Mimic the behaviour of birds
Question:
How Many Control Theorists
Does It Take To Change A Using
Reinforcement Learning
?
Valasek, Lampton, Niksch

29
Simple Example
Valasek, Lampton, Niksch

30
Reinforcement Learning

1
Supervised or unsupervised learning? Sequential decision making.
–
Knowledge is based on experience and interaction with the environment, not on
input

output data supplied by an external supervisor
Achieves a specific
goal
by learning from interactions with the
environment.
–
Considers
state
information (s)
–
Performs sequences of
actions
, (a),
observing the consequences
–
Attempts to maximize
rewards
(r) over time
•
These specify what is to be achieved,
not how to achieve it
–
Constructs a
state value function
(V)
•
Learns an optimal control
policy
Memory is contained in the state value function
Valasek, Lampton, Niksch

31
Reinforcement Learning

2
Learning is done repetitively, by subjecting to different
scenarios
Learning is cumulative and lifelong
Formulations are generally based on Finite Markov Decision
Processes (MDP)
•
3 major candidate algorithms:
–
Dynamic programming
–
Monte Carlo methods
–
Temporal Difference Learning
Valasek, Lampton, Niksch

32
1.
Actor
takes
action
based upon states and preference function
2.
Critic
updates
state value function
, and evaluates action
3.
Actor updates preference function
Learning is done repetitively, by subjecting to different scenarios
Reinforcement Learning
self training
Valasek, Lampton, Niksch

34
Two Illustrative Examples
Valasek, Lampton, Niksch

35
State:
–
Gain for
δ
a
= K
φ
(
φ
cmd

φ
)
Actions:
–
Increase gain by small amount
–
Decrease gain by small amount
Constraints/Boundaries
–
Max overshoot
–
Rise time
–
Settling time
Interesting Features
–
e

greedy policy incorporated
–
Upward annealing of
γ
incorporated
max os = 2%
T
r
= 8s
T
s
= 10s
Cessna 208B Super Cargomaster
Matlab: ~250 sec real

time
for 1000 learning episodes
familiar example
Reinforcement Learning
Valasek, Lampton, Niksch

36
Reinforcement Learning
familiar example
Valasek, Lampton, Niksch

37
Reinforcement Learning
familiar example
Valasek, Lampton, Niksch

38
Start
Finish
Smart Block Demo 1
aerial obstacle course
Valasek, Lampton, Niksch

40
Smart Block: First Try
Valasek, Lampton, Niksch

41
Smart Block: Second Try
Valasek, Lampton, Niksch

42
Smart Block: New Course
Valasek, Lampton, Niksch

51
Adaptive
–
Reinforcement
Learning Control
Valasek, John, Tandale, Monish D., and Rong, Jie, "A Reinforcement Learning Adaptive
Control Architecture for Morphing,”
Journal of Aerospace Computing, Information, and
Communication,
Volume 2, Number 4, pp. 174

195, April 2005.
Valasek, John, Doebbler, James, Tandale, Monish D., and Meade, Andrew J., "Improved
Adaptive

Reinforcement Learning Control for Morphing Unmanned Air Vehicles,”
Journal
of Aerospace Computing, Information, and Communication
(in review).
Tandale, Monish D., Rong, Jie, and Valasek, John, "Preliminary Results of Adaptive

Reinforcement Learning Control for Morphing Aircraft,” AIAA

2004

5358, Proceedings of
the AIAA Guidance, Navigation, and Control Conference, Providence, RI, 16

19 August
2004.
Valasek, Lampton, Niksch

52
A

RLC Architecture
Valasek, Lampton, Niksch

53
Air Vehicle Example
Valasek, Lampton, Niksch

54
Objective
–
Demonstrate
optimal
shape morphing for
multiple
specified
flight conditions
Method
–
For every flight condition,
learn
optimal policy
that commands
voltage producing the optimal shape
–
Minimize total cost
over the entire flight trajectory
–
Evaluate the learning performance after 200 learning episodes
Air Vehicle Example
RL Module is Completely Ignorant of Optimality Relations and
Morphing Control Functions:
It Must Learn On Its Own, From Scratch
Valasek, Lampton, Niksch

55
Agent
:
Morphing Air Vehicle Reinforcement Learning Module
Environment
:
Various flight conditions
Goal
:
Fly in optimal shape that minimizes cost
States
:
Flight condition; shape of vehicle
Actions
:
Discrete voltages applied to change shape of vehicle
–
Action set:
Rewards
:
Determined by cost functions
Optimal control policy
:
Mapping of the
state
to the voltage leading to the
optimal shape
Timmy Demo
reinforcement learning definitions
Valasek, Lampton, Niksch

56
Episodic Learning
Unsupervised learning episode
Single pass through 100 meter flight path in 200 seconds
Reference trajectory is generated arbitrarily
The flight condition changes twice during each episode
Shape change iteration after every 1 second
Exploration

exploitation dilemma:
–
Explorative early, exploitative later
–

policy with
decreasing
Limited training examples
–
Only 6 discrete flight conditions:
2000 samples for KNNPI
Valasek, Lampton, Niksch

57
Demo
Valasek, Lampton, Niksch

58
Comparison of True Optimal
Shape and Learned Shape
KNN learns poorly for
several flight conditions
Valasek, Lampton, Niksch

59
Function Approximation
–
Errors remained which could not be eliminated with additional training.
–
Use
Galerkin

based Sequential Function Approximation
(SFA) to
approximate the action

value function Q(s,a)
What Happened?
Valasek, Lampton, Niksch

60
New SFA approach learns optimal shape well
Comparison of True Optimal
Shape and Learned Shape
Valasek, Lampton, Niksch

61
Comparison
Normalized RMS error
Y dimension
Z dimension
K N N
1.42
0.821
S F A
1.27
0.661
10% reduction
20% reduction
Valasek, Lampton, Niksch

68
Reinforcement Learning successfully learns the optimal control
policy that results in the optimal shape at every flight condition.
–
Can function in real

time, leading to better performance as system
operates over the long term.
Adaptive

Reinforcement Learning Control is a promising candidate
for control of Mission Morphing.
–
Maintains asymptotic tracking in the presence of parametric
uncertainties and initial condition errors.
Shape Changes for “Mission Morphing” can be treated as piecewise
constant parameter changes
–
SAMI is a favorable method for trajectory tracking control
“Morphing for Control” will require different control strategy
–
Piecewise constant approximation no longer valid
What Does This Show?
Valasek, Lampton, Niksch

69
Issues & Future Directions
Realistic structural response effects
–
Aeroelastic behaviour
–
SMA models and hysteretic behaviour
•
Priesach model is algebraic, only has major hysterisis loops
•
Solution: roll your own with R

L
Valasek, Lampton, Niksch

70
Issues & Future Directions
Time scale problem: control methodologies to handle faster shape
changes
–
Hovakimyan’s Adaptive Control
–
Linear Parameter Varying (LPV) control
Novel distributed sensing and distributed actuation on a large(!) scale
–
Insect and avian inspired sensing
Learning on a continuous domain
–
Continuous versus discrete
Modify the simulation to include a more advanced aircraft model
–
Wing

Body, Wing

Body

Empennage, etc.
Build and fly R/C class morphing demonstrator UAV
Valasek, Lampton, Niksch

71
Issues & Future Directions
Incorporate aerodynamic
and structural effects due to
large shape changing
Cost Function
–
Potential components
•
Specified C
L
•
Minimum drag
•
Minimum peak stress
Degrees of Freedom
–
Thickness
•
6% to 24%
–
Camber
•
0% to 10%
–
Max camber location
•
0.2c to 0.8c
–
Chord
•
1 unit to ? units
–
Angle

of

attack
•

5
°
to 10
°
–
Within linear range
R

L For Morphing Airfoils & Wings
Valasek, Lampton, Niksch

72
Morphing Airfoil Demonstration
Valasek, Lampton, Niksch

73
Questions?
John
Valasek
Aerospace
Engineering
Department
Texas
A&M
University
3141
TAMU
College
Station,
TX
77843

3141
(
979
)
845

1685
valasek@aero
.
tamu
.
edu
FSL Web Page
–
http://flutie.tamu.edu/~fsl
Comments 0
Log in to post a comment