Machine Learning for Control of Morphing Air Vehicles

cuckootrainMechanics

Oct 31, 2013 (4 years and 7 days ago)

107 views

Machine Learning for Control

of Morphing Air Vehicles

John Valasek, Amanda Lampton, Adam Niksch


Aerospace Engineering Department

Texas A&M University

SAE Aerospace Control and Guidance Systems Committee
Meeting 99

1 March 2
007

Boulder, CO

Valasek, Lampton, Niksch
-

2

Student Research Team

2006
-

2007

Valasek, Lampton, Niksch
-

3

Briefing Agenda


Shape Changing / Morphing for Micro Air Vehicles


Learning


Control


Learning and Control


Adaptive
-
Reinforcement Learning Control


Some Results


Some Conclusions


Research Issues


Extensions



Valasek, Lampton, Niksch
-

4

Biomimetic Example 1

Valasek, Lampton, Niksch
-

5


300 million year old design


97% success rate in capturing prey


Flight relies entirely on 3
-
D unsteady aerodynamics


Unstable in all axes


Can hover as well as fly from a forward speed of +100 body


length/sec (~ 60 mph) to
-

3 bl/sec within 5 bl


Can alter heading by 90 degs in less than 0.1 secs


Dragonflies and other flying insects have the fastest visual processing


system in the animal kingdom


Biomimetic Example 2

These feats are accomplished with the neuro
-
circuitry

of a brain smaller than a sesame seed.

Andres Meade, Rice University

Valasek, Lampton, Niksch
-

6

Biomimetic Example 3

Courtesy Graham Taylor, Oxford University

Cossack

Valasek, Lampton, Niksch
-

7

Biomimetic Example 3

Courtesy Graham Taylor, Oxford University

Valasek, Lampton, Niksch
-

8

Biomimetic Example 3

Cossack sysid using OKID

Valasek, Lampton, Niksch
-

9

Biomimetic Example 3

Courtesy Graham Taylor, Oxford University

Kinematic Engineering Model of Eagle Wing

Valasek, Lampton, Niksch
-

10


Reconfigurable; shape changing (morphing); rigid or elastic body plants


Multi
-
Input Multi
-
Output (MIMO)systems


Distributed sensing, actuation, and propulsion


Multiple, possibly non co
-
located sensors


Can have large number of actuators
control allocation problem


High bandwidth actuators
gust alleviation


Robust Adaptive Control


Flow control


Exogenous inputs (turbulence and gusts)


Damage and fault tolerance


Distributed and limited processing capability


Volumetric limit


Mass limit


Power limit


Real
-
time operation


Integrated guidance, navigation, and control


Self learning, self adapting, self tuning


Biomimetic Flight Summary

Information processing system

Valasek, Lampton, Niksch
-

11

A Multi
-
Role Platform that:



Changes its state substantially

to adapt to changing
mission environments.


Provides superior system capability not possible
without reconfiguration.


Uses a design that integrates
innovative
combinations of advanced materials, actuators,
flow controllers, and mechanisms

to achieve the
state change.

Morphing Aircraft

Aerospace America, Feb 2004

DARPA’s definition

Changes On The Order Of 50% More Wing Area
Or Wing Span And Chord

Valasek, Lampton, Niksch
-

12


Morphing for Mission Adaptation


Large scale, relatively slow, in
-
flight shape change to
enable a single vehicle to perform multiple diverse
mission profiles


or:



Morphing for Control


In
-
flight physical or virtual shape change to achieve
multiple control objectives (maneuvering, flutter
suppression, load alleviation, active separation control)

Mission

Adaptation

Control

John Davidson, NASA Langley, AFRL Morphing Controls Workshop


Feb 2004

Which Morphing?

Valasek, Lampton, Niksch
-

13

Lockheed Martin

NextGen

&

Barron Associates

Large Morphing Air Vehicle Models

Valasek, Lampton, Niksch
-

14

Small Morphing Air Vehicle Models

Cornell (collaborator)


Garcia & Lipson


Morphing dynamical model and simulation that incorporates
aerodynamic and structural effects


Validated with experimental data


Basic morphing parameters (incidence angle, dihedral angle)


Maryland


Hubbard


Actuating a flapping wing structure with SMA’s


Structure, material, distribution of actuators


Florida


Lind


Morphing flight demonstrator vehicle


Basic morphing parameters (dihedral angle, sweep angle)


H
-
inf control


lots of modeling, very little control

Valasek, Lampton, Niksch
-

15


Use biologically inspired approach to understand and control the:


morphing (
shape changing
),


flight control (
stability, flight path, gust tolerance, stall
)


maneuvering (
perching and hovering
)



of multi
-
mission micro air vehicles.




Control the physics, in concert with nonlinearities.


Not as a robustness bandaid for aerodynamic and structural
uncertainties & lack of understanding



Shape change the entire vehicle, not a component on the vehicle


DARPA definition

Technical Approach and Scope

Valasek, Lampton, Niksch
-

16



Address:



WHEN

to morph, perch, etc.



HOW

to morph, perch, etc.



LEARNING

to morph, perch, etc.




All while keeping the (pointy?) end facing forward
.

Big Picture Research Goals

Valasek, Lampton, Niksch
-

17

Valasek, Lampton, Niksch
-

18

Common Mathematical Domain Helps

To Avoid Ad
-
hoc Approaches

?

?

Machine Learning

Reconfiguration Policy



State based methods

learning

Adaptive Control

Parameters in a Known

Functional Relationship


State based methods

adapting

The Mathematical Domain Issue

Valasek, Lampton, Niksch
-

19

Adaptive
-
Reinforcement

Learning Control (A
-
RLC)

Conceptual Control Architecture

for Reconfigurable or Morphing Aircraft

SAMI

Structured Adaptive Model Inversion

(Traditional Control)


Flight controller to handle wide

variation in dynamic properties

due to shape change

ML

Machine Learning

(Intelligent Control)


Learn the morphing dynamics and

the optimal shape at every flight

condition in real
-
time


Valasek, Lampton, Niksch
-

20

Synthetic Jets for
Virtual Shaping and
Separation Control

MultiSensor MEMS
Arrays for Flow Control
Feedback

Adaptive
Controller


Sensed Information
Aggregation

Control Information
Distribution


Reconfiguration Command

Generation

System Performance
Evaluation

Knowledge

Base

Environment

Control Architecture

Valasek, Lampton, Niksch
-

21


2
-
D Plate Rectangular Block Ellipsoid Delta Wing Final


2004 2005 2006 2007

Objective

Morphing Air Vehicle Evolution

Valasek, Lampton, Niksch
-

22

Morphing Air Vehicle Model
-

TiiMY

Shape


Ellipsoidal shape with varying axis dimensions.


Constant volume (V) during morphing


2 independent variables:
y

and
z,
dependent dimension


Morphing Dynamics


Smart material: shape memory alloy (SMA)


Morphing Dynamics : Simple Nonlinear Differential
Equations





Valasek, Lampton, Niksch
-

23

Morphing Dynamics

Y
-
morphing


Z
-
morphing


Valasek, Lampton, Niksch
-

24

Shape Morphing Simulation


TiiMY


Valasek, Lampton, Niksch
-

25

Optimal Shapes at

Various Flight Conditions



Optimality is defined by identifying a cost function.





J=J (Current shape, Flight condition)

Valasek, Lampton, Niksch
-

26

6
-
DOF Mathematical Model

for Dynamic Behavior


Variables




Nonlinear 6

DOF Equations


Kinematic level:


Acceleration level:




Drag Force


Function of air density, square of velocity along axis, and
projected area of the vehicle perpendicular to the axis

additional dynamics due to morphing


Valasek, Lampton, Niksch
-

27

Learning

Valasek, Lampton, Niksch
-

28


Machine Learning



Learn the optimal control mechanism by wind
-
tunnel
experiments & flight tests


Possible learning algorithms include
Artificial Neural
Networks (ANN)
,
Explanation
-
Based Learning (EBL)
,
and
Reinforcement Learning (RL)


Candidates to develop inference mechanism


Rules
-
Based Expert System



Model the knowledge of human experts


Imitate the natural behaviour of birds


Question:

How Many Control Theorists
Does It Take To Change A Using


Artificial Neural Networks

?

Knowledge Based Control



Biologically inspired control process


Mimic the behaviour of birds


Question:

How Many Control Theorists
Does It Take To Change A Using


Reinforcement Learning

?

Valasek, Lampton, Niksch
-

29

Simple Example

Valasek, Lampton, Niksch
-

30

Reinforcement Learning
-

1


Supervised or unsupervised learning? Sequential decision making.


Knowledge is based on experience and interaction with the environment, not on
input
-
output data supplied by an external supervisor



Achieves a specific
goal

by learning from interactions with the
environment.


Considers
state

information (s)


Performs sequences of
actions
, (a),

observing the consequences


Attempts to maximize
rewards

(r) over time


These specify what is to be achieved,
not how to achieve it



Constructs a
state value function
(V)


Learns an optimal control
policy



Memory is contained in the state value function


Valasek, Lampton, Niksch
-

31

Reinforcement Learning
-

2


Learning is done repetitively, by subjecting to different
scenarios



Learning is cumulative and lifelong



Formulations are generally based on Finite Markov Decision
Processes (MDP)


3 major candidate algorithms:



Dynamic programming


Monte Carlo methods


Temporal Difference Learning

Valasek, Lampton, Niksch
-

32











1.
Actor

takes
action

based upon states and preference function

2.
Critic

updates
state value function
, and evaluates action

3.
Actor updates preference function


Learning is done repetitively, by subjecting to different scenarios


Reinforcement Learning

self training

Valasek, Lampton, Niksch
-

34

Two Illustrative Examples

Valasek, Lampton, Niksch
-

35


State:


Gain for
δ
a
= K
φ

(
φ
cmd
-

φ
)


Actions:


Increase gain by small amount


Decrease gain by small amount


Constraints/Boundaries


Max overshoot


Rise time


Settling time


Interesting Features


e
-
greedy policy incorporated


Upward annealing of
γ

incorporated

max os = 2%

T
r

= 8s

T
s

= 10s

Cessna 208B Super Cargomaster

Matlab: ~250 sec real
-
time

for 1000 learning episodes

familiar example

Reinforcement Learning

Valasek, Lampton, Niksch
-

36

Reinforcement Learning

familiar example

Valasek, Lampton, Niksch
-

37

Reinforcement Learning

familiar example

Valasek, Lampton, Niksch
-

38

Start

Finish

Smart Block Demo 1

aerial obstacle course

Valasek, Lampton, Niksch
-

40

Smart Block: First Try

Valasek, Lampton, Niksch
-

41

Smart Block: Second Try

Valasek, Lampton, Niksch
-

42

Smart Block: New Course

Valasek, Lampton, Niksch
-

51

Adaptive

Reinforcement

Learning Control


Valasek, John, Tandale, Monish D., and Rong, Jie, "A Reinforcement Learning Adaptive
Control Architecture for Morphing,”
Journal of Aerospace Computing, Information, and
Communication,
Volume 2, Number 4, pp. 174
-
195, April 2005.



Valasek, John, Doebbler, James, Tandale, Monish D., and Meade, Andrew J., "Improved
Adaptive
-
Reinforcement Learning Control for Morphing Unmanned Air Vehicles,”
Journal
of Aerospace Computing, Information, and Communication

(in review).



Tandale, Monish D., Rong, Jie, and Valasek, John, "Preliminary Results of Adaptive
-
Reinforcement Learning Control for Morphing Aircraft,” AIAA
-
2004
-
5358, Proceedings of
the AIAA Guidance, Navigation, and Control Conference, Providence, RI, 16
-
19 August
2004.


Valasek, Lampton, Niksch
-

52

A
-
RLC Architecture

Valasek, Lampton, Niksch
-

53

Air Vehicle Example

Valasek, Lampton, Niksch
-

54


Objective


Demonstrate
optimal


shape morphing for
multiple


specified
flight conditions



Method


For every flight condition,
learn


optimal policy

that commands


voltage producing the optimal shape


Minimize total cost

over the entire flight trajectory


Evaluate the learning performance after 200 learning episodes




Air Vehicle Example

RL Module is Completely Ignorant of Optimality Relations and
Morphing Control Functions:

It Must Learn On Its Own, From Scratch

Valasek, Lampton, Niksch
-

55


Agent
:

Morphing Air Vehicle Reinforcement Learning Module



Environment
:

Various flight conditions



Goal
:

Fly in optimal shape that minimizes cost



States
:

Flight condition; shape of vehicle



Actions
:

Discrete voltages applied to change shape of vehicle


Action set:



Rewards
:

Determined by cost functions



Optimal control policy
:

Mapping of the
state

to the voltage leading to the
optimal shape


Timmy Demo

reinforcement learning definitions

Valasek, Lampton, Niksch
-

56

Episodic Learning


Unsupervised learning episode


Single pass through 100 meter flight path in 200 seconds


Reference trajectory is generated arbitrarily


The flight condition changes twice during each episode


Shape change iteration after every 1 second


Exploration
-
exploitation dilemma:


Explorative early, exploitative later



-
policy with
decreasing




Limited training examples


Only 6 discrete flight conditions:


2000 samples for KNNPI



Valasek, Lampton, Niksch
-

57

Demo

Valasek, Lampton, Niksch
-

58

Comparison of True Optimal

Shape and Learned Shape

KNN learns poorly for

several flight conditions

Valasek, Lampton, Niksch
-

59


Function Approximation


Errors remained which could not be eliminated with additional training.












Use
Galerkin
-
based Sequential Function Approximation

(SFA) to
approximate the action
-
value function Q(s,a)

What Happened?

Valasek, Lampton, Niksch
-

60


New SFA approach learns optimal shape well

Comparison of True Optimal

Shape and Learned Shape

Valasek, Lampton, Niksch
-

61

Comparison

Normalized RMS error




Y dimension



Z dimension




K N N

1.42



0.821



S F A

1.27



0.661









10% reduction


20% reduction






Valasek, Lampton, Niksch
-

68


Reinforcement Learning successfully learns the optimal control
policy that results in the optimal shape at every flight condition.


Can function in real
-
time, leading to better performance as system
operates over the long term.



Adaptive
-
Reinforcement Learning Control is a promising candidate
for control of Mission Morphing.


Maintains asymptotic tracking in the presence of parametric
uncertainties and initial condition errors.



Shape Changes for “Mission Morphing” can be treated as piecewise
constant parameter changes


SAMI is a favorable method for trajectory tracking control




“Morphing for Control” will require different control strategy


Piecewise constant approximation no longer valid



What Does This Show?

Valasek, Lampton, Niksch
-

69

Issues & Future Directions



Realistic structural response effects


Aeroelastic behaviour


SMA models and hysteretic behaviour


Priesach model is algebraic, only has major hysterisis loops


Solution: roll your own with R
-
L



Valasek, Lampton, Niksch
-

70

Issues & Future Directions



Time scale problem: control methodologies to handle faster shape
changes


Hovakimyan’s Adaptive Control


Linear Parameter Varying (LPV) control



Novel distributed sensing and distributed actuation on a large(!) scale


Insect and avian inspired sensing



Learning on a continuous domain


Continuous versus discrete



Modify the simulation to include a more advanced aircraft model


Wing
-
Body, Wing
-
Body
-
Empennage, etc.




Build and fly R/C class morphing demonstrator UAV



Valasek, Lampton, Niksch
-

71

Issues & Future Directions


Incorporate aerodynamic
and structural effects due to
large shape changing



Cost Function


Potential components


Specified C
L


Minimum drag


Minimum peak stress






Degrees of Freedom


Thickness


6% to 24%


Camber


0% to 10%


Max camber location


0.2c to 0.8c


Chord


1 unit to ? units


Angle
-
of
-
attack


-
5
°

to 10
°


Within linear range

R
-
L For Morphing Airfoils & Wings

Valasek, Lampton, Niksch
-

72

Morphing Airfoil Demonstration

Valasek, Lampton, Niksch
-

73

Questions?


John

Valasek


Aerospace

Engineering

Department


Texas

A&M

University


3141

TAMU


College

Station,

TX

77843
-
3141



(
979
)

845
-
1685


valasek@aero
.
tamu
.
edu



FSL Web Page


http://flutie.tamu.edu/~fsl