C28x Core - Hostindiaevents

louisianabodyΗλεκτρονική - Συσκευές

21 Νοε 2013 (πριν από 3 χρόνια και 27 μέρες)

145 εμφανίσεις

1





Motion control,

human machine interface,

industrial automation,

smart grid, s
afety,

transportation,

industrial & medical



Motor control,

digital power, lighting,

renewable energy,

smart grid


High performance real
-
time computing, video
security and analytics,
video communications,
multimedia infrastructure

Connected audio/voice,

video, fingerprint biometrics,

portable medical, sensors

Measurement,

sensing, general

purpose, consumer,

medical

32
-
bit

MCUs

16
-
bit

Ultra
-
Low Power
&


Value Line MCUs

32
-
bit

Real
-
Time MCUs

32
-
bit

Multicore DSPs

16/32
-
bit

Single
-
core DSPs

Stellaris
®


ARM MCU &
Hercules


Safety ARM
MCU


MSP

MCU

C2000

MCU

Embedded Processing Portfolio

Microcontroller (MCU)

Portfolio at
-
a
-
glance

ARM
®



Portfolio at
-
a
-
glance

Digital Signal Processor (DSP)

Portfolio at
-
a
-
glance

Software, Tools, Kits & Boards

MCU

DSP & ARM
®

MPU

32
-
bit

Microprocessor
s

Sitara





ARM Cortex
-
A8



ARM9™

Industrial automation,

point
-
of
-
service,

human machine interface,

portable navigation



ARM Cortex™
-
M


ARM Cortex™
-
R




Delfino
, Piccolo

single
-
core MCU


Concerto C28x+

ARM Cortex™
-
M


C6000


&
C5000



single
-
core DSP



C6000

-
based
multicore DSP


ARM MPU


MSP430 MCU


Fixed/floating
-
point:


DSP + ARM


C66x multicore DSP


DaVinci video
processors


C6000 high
performance
fixed/floating
-
point DSP


C5000 ultra
-
low

power fixed
-
point DSP

3

C67x Architecture and Features



’C62x Fixed
-
Point CPU Core

Data Path 1

D1

M1

S1

L1

A Register File

Data Path 2

L2

S2

M2

D2

B Register File

Instruction Decode

Instruction Dispatch

Program Fetch

Interrupts

Control
Registers

Control
Logic

Emulation

Test

C6x VLIW CPU Core


DSP architecture challenge:


DSP algorithms have a high
degree of parallelism


Cost
-
effective control of
parallelism is difficult


VLIW architecture solution:


Provides simple, cost
-
effective
control of parallelism


fetches 8 instructions/cycle


executes 1
-
8 instructions/cycle
reducing


code size


program fetches


power consumption


Can support high
-
performance
compilers


3x improvement in efficiency
based on DSP benchmark suite


Can scale to support architectural
enhancements

5

C67x Floating point core



Performance (
Comm
/
Ind
)



IEEE Floating Point Format


Double Precision


Single Precision



668 Multiplies & Accumulates
-


Single
-
Precision


2 Multipliers


(334 MFLOPS)


2 ALUs


(334 MFLOPS)



420 MFLOPS, Double Precision



250 Multiplies & Accumulates
-


Double
-
Precision


1
Result/4 Cycles (83.5 MFLOPS)


1
Result/2 Cycles (167 MFLOPS
)

6

VelociTI
TM
:

Speed with efficiency

Fully Serial

Serial/Parallel

Fully

Parallel


Execute:

CPU executes 1 to 8 instructions/cycle


As a result, fetch packets can contain

multiple execute packets


Parallelism is determined at

compile/assembly time and can be:

7

Floating Point DSP Comparison

MIPS

MFLOPs

167 x8= 1336

1000

1600

1200

2000

1500

Architecture

C67x

C67x

C67x+

Memory

64KB Data Memory

64KB Program Memory

4KB L1
-
P, 4KB L1
-
D, 256KB
L2 Cache/SRAM

32KB L1
-
P, 256KB L2 SRAM,
384KB ROM

HPI

HPI
-
16

1 32/16
-
bit

1 UHPI 32/16
-
bit

EMIF

100MHz 32
-
bit (SDRAM)

100MHz 32
-
bit (SDRAM)

100MHz 32
-
bit (SDRAM)

DMA

4
-
ch DMA

16
-
ch EDMA

16
-
ch dMAX

McBSP

2

2

0

McASP

0

2

3

I2C

0

2

3

SPI

0

0

2 (10MHz)

Package

429
-
pin Ceramic BGA

(27mm, 1.27mm)

352
-
pin Plastic BGA,

(35.2mm, 1.27mm)

272
-
pin PBGA

27x27xmm, 1.27mm

256
-
pin PBGA

16x16mm, 1.0mm

(Ceramic Package TBD)

C6701B

167 MHz

C6713B

200 MHz

C6727

250 MHz

Software Compatible

8

TMS320C672x Device Overview

Large on
-
chip memory


384KB on
-
chip ROM


256KB on
-
chip RAM


32KB Inst. cache (
Int

Mem

+ EMIF)


EMIF for expansion

Enhanced Audio IO


16 serial data pins


Up to 6 different clock rates


dMAX



-

Support for
dma
, circular and


multi
-
tap memory delay


(for Reverb)


HPI supports
mux

A/D and non
-
mux

A/D


300
MHz DSP core


300 MHz 67x+™ core


64
Reg

+ Additional FP instructions


Code Compatible with 6713 Devices

TMS320C672x Floating
-
Point DSP

SPI 0

RTI TImer

IIC 1

McASP 2

IIC 0

McASP 1

McASP 0

SPI 1

C67x+
TM

DSP

Core



Instruction

Cache

32K Bytes

768K
Bytes

ROM

256K
Bytes
SRAM

Memory Controller

EMIF

HPI

Switch





dMax

Config

DMA

Max

Max

Control

9


New memory architecture


Improved Instruction cache


Size increased from 4KB to 32KB


Cache miss penalty to Internal Memory reduced
40%


Supports internal RAM/ROM and EMIF


Direct single level flat memory for data,
Single Cycle access (ROM and RAM)


All RAM and ROM is accessible as
pgm

or
data (like C6713)



Memory Architecture

10


Changes in 67x+


All changes are backwards compatible to 67x CPU (C6713)


General Purpose Registers increased from 32 to 64


New MPYSPDP instruction


SP x DP into DP


New MPYSP2DP instruction


SP x SP into DP


Additional ADDSP/DP, ADDDP, SUBSP, SUBDP in S unit


Now have 4 floating point add or subtracts in parallel


Execution packets can span Fetch Packets (64x feature)


Code size reduction (5 to 10% reduction) since no padding
with NOPs

Enhancements


DP, Code Density

11

Benchmark Performance


12

Performance: The BDTImark
TM

TM

Berkeley Design Technology, Inc
-

Berkeley, CA


Real block FIR filter


Complex block FIR filter


Single
-
sample LMS
-
adaptive FIR filter


Single
-
sample real FIR filter


Single
-
sample IIR filter


Vector dot product


Vector add


Vector maximum


IS
-
54 convolutional encoder


Finite state machine


256
-
point FFT



13

’C67x:

Floating point performance*

*Commercial Temp

BDTImark
TM:

A DSP Speed Metric

Source www.BDTI.com. ©1999 BDTI

TI TMS320C67x

1 GFLOPS

TI TMS320C4X

25 MIPS, 60 MFLOPS

TI TMS320C3X

30 MIPS, 80 MFLOPS

ADI ADSP
-
2106x

60 MIPS

Intel Pentium

200 MHz

23

17

9

7

65

TM

Berkeley Design Technology, Inc
-

Berkeley, CA

14

’C67x:

Benchmark performance*

Floating
-
Point Performance

Execution time (in

Sec)

Matrix Vector

Multiply

Convolution

Block FIR

Complex Radix

4 FFT

108.33

0.420

0.828

13.296

Typical Floating
-
Point DSP

(60 MFLOPS)

TI TMS320C6701

1 GFLOPS

149

16.6

1.25

1,672

*Commercial Temp

C28x
Digital Signal Controller

TMS320F2812

Memory Bus

128Kw Flash

+ 1Kw OTP

4Kw

Boot
ROM

18Kw

RAM

XINTF


32
-
Bit


Register


File

Real
-
Time

JTAG


32
-
bit


Timers
(3)


150
MIPs C28x
TM

32
-
bit DSP


32x32

bit


Multiplier

R

M

W

Atomic

ALU


Interrupt Management

Event Mgr A

Event Mgr B

12
-
Bit ADC

Watchdog

GPIO

McBSP

CAN 2.0B

SCI/UART
-
A

SCI/UART
-
B

SPI

Peripheral Bus

TMS320F2812 Features and Benefits

17

Features

Benefits

150
-
MHz C28x 32
-
bit
DSP core

C28x 32
-
bit DSP core enables high
-
speed
execution of control algorithms. Faster control code
execution gives headroom for advanced control
techniques enabling great efficiency and cutting
-
edge features

Unique control
peripherals

12
-
bit high
-
speed dual
-
sample
-
hold ADC allow for
simultaneous sampling of power system currents
and voltages; Event Manager modules provide a
hardware interface for
sensored

or
sensorless

three
-
phase inverter control.

On
-
chip communication
peripherals

CAN, I2C, SPI, UART, and external memory
interface allow for a full system implementation.

18

C28x CPU


32
-
bit fixed
-
point DSP


RISC instruction set


8
-
stage protected pipeline


32x32 bit fixed
-
point MAC for single
-
cycle
32
-
bit multiply


Dual 16x16 bit fixed
-
point MACs


Single
-
cycle instruction execution


Modified Harvard Bus
Architecture



Separate data and instruction buss


Two data buses


one for read, one for write


Enables fetch, read, and write in a single cycle


Essential to maximizing single
-
cycle MAC


Emulation Logic


Real
-
time
emulation allows interrupt
servicing even when main program is halted


Debug host has direct access to registers
and memory


Multiple
hardware debug events and
breakpoints

C28x Core: Bus Structure

Data Address Bus (32)

Data Data Bus (32)

Program Data Bus (32)

Program Address Bus (22)

Execution

R
-
M
-
W

Atomic

ALU

Real
-
Time

Emulation

&

Test

Engine

JTAG

XAR0

to

XAR7

SP

ARAU

MPY32x32

XT

P

ACC

ALU

Registers

Debu
g

Data Write Bus (32)

Program Write Bus (32)

Memory

Data

(4 G * 16)

Program

(4 M* 16)

Standard

Peripherals



External

Interfaces

Register Bus

DP

@X

The C28x multiple bus architecture makes better use of the
processor cycles: Instruction fetch, decode and execute can
happen on the same clock cycle

C28x Core: Protected Pipeline

W

W

W

W

W

W

W

W

Protected Pipeline


Order of results are as written in source code


Programmer need not worry about the
pipeline

Writes: ?

are “free”

F1

F2

D1


D2


R1


R2


X


W


Instruction address

Instruction content

Decode instruction

Resolve operand address

Operand address

Get operand

CPU doing “real” work

Store content to memory

8
-
stage pipeline

F
1

F
2
D
1
D
2
R
1
R
2
X

F
1

F
2
D
1
D
2
R
1
R
2
X

F
1

F
2
D
1
D
2
R
1
R
2
X

F
1

F
2
D
1
D
2
R
1
R
2
X

F
1

F
2
D
1
D
2
R
1
R
2
X

F
1

F
2
D
1
D
2
R
1
R
2
X

A

B

C

D

E

F

G

F
1

F
2
D
1
D
2
R
1
R
2
X

F
1

F
2
D
1
D
2
R
1
R
2
X

R
1


R
2
X

W

D
2

R
1
R
2
X W

E & G access

same address

Many MCUs


Shared bus for program and data address and
content


Typically results in only one instruction in 4
cycles

Read/Modify/Write and Atomic Operation

Offers sufficient hardware resources to efficiently handle control algorithms

WRITE

Registers

LOAD

STORE

READ

Memory

SETC INTM

MOV AL,*XAR2

AND AL,#1234h

MOV *XAR2,AL

CLRC INTM

6 words/ 5 cycles

RISC Read/Modify/Write

Atomic Instructions
Benefits:

Simpler programming

Smaller, faster code

Non
-
interruptible operations

CPU

ALU / MPY

SETC INTM

AND AL,*XAR2,#1234h

MOV *XAR2,AL

CLRC INTM



5 words/ 4 cycles

DSP Read/Modify/Write

AND *XAR2,#1234h

2 words/ 1 cycle

C28x Atomic Operation

Atomic

C28x Core: Instruction set for Control

PIE: Peripheral Interrupt Expansion

EV and Non
-
EV

Peripherals

(EV, ADC, SPI,

SCI, McBSP, CAN
)

Internal Sources

External Sources

XINT1

XINT2

PDPINTx

RS

XNMI_XINT13

NMI

C28x Core

INT1

INT13

INT2

INT3

INT12

INT14

RS







TINT2

TINT1

TINT0

PIE

(Peripheral

Interrupt

Expansion)

C28x Core: Fast Interrupt Response

INTx

Decode 1st ISR

instruction

Latency

Vector fetch

Auto context save

8


Latency: time between when an interrupt occurs to decoding
(D2) the first ISR instruction


Minimum latency:




Internal peripherals: 10
-
14 cycles (100 ns @100MHz)



External signals: 11 cycles (110 ns @ 100 MHz)


Maximum latency: depends on wait states, ready, INTM, etc.

Interrupt jammed

into pipeline

Set

IFR

1

Set

PIEIFR

1

PIE HW

Sync

Internal Signal

Sync

Interrupt


Signal

2

External Signal

C28x Core: Fast Interrupt Response

Latency is Minimized

C2000


real
-
time controllers software



ControlSuite
™ Software


Software infrastructure and tools for every stage of
development and evaluation


Allows customers to focus on differentiation, not basics


Key Functional Areas:


Device Support (Bit fields, API Drivers, Examples)


Library Repository (Math Library, DSP Library,
Application Library, Utilities)


Development Kits (Hardware Package, Software
Examples, Complete System Framework, Graphical
User Interfaces)


Debug and Software Tools (IDE, RTOS,
Emulation



Integrated Development Environment (IDE)



Eclipse
-
based Code Composer Studio™ IDE supports all



Application Specific Software:



Motor Control Software Library



Supports multiple motor types and control
techniques


(ex: FOC (
sensored

and
sensorless
) for ACI,
PMSM



Digital Power Software Library



Library for both C28x Core and CLA


Tools/Reference Designs

ControlSticks

ControlCards

Evaluation Kits

Software Highlights

ControlSuite

Application
Notes

Users Guide


Getting Started


25

Development Tools


26

Tools


Code Composer

is an Integrated Development
Environment (IDE) similar to MS Visual C++ and built
specifically for DSP



DSP/BIOS

is a library of scheduling, instrumentation,
and communications functions that provides real
-
time
analysis and RTDX
TM

(Real
-
Time Data Exchange)



Hardware Emulation, and Evaluation tools

allow
code debug on actual silicon and low
-
cost analysis of
performance in early stages of development cycle



Code Composer Studio

provides an extensible tool
plug
-
in and seamless integration between the host and
target DSP tools

10/19/11

27

CCSv4/v5

Tabbed editor windows

Tab data displays together

to save space

Fast view windows don’t display

Until you click on them

Perspectives contain separate

window arrangements depending

on what you are doing.

Customize toolbars & menus

Code Composer Studio v5

CCSv5 is split into two phases


5.0


Not a replacement for CCSv4


Targeted at users who are using devices running Linux & multi
-
core
C6000


Addresses a need (Linux debug) that is not supported by CCSv4


Available today


5.1


replacement for CCSv4 and is targeted at all users


Available fall 2011

Supports both Windows & Linux


Note that not all emulators will be supported on
Linux


SD DSK/EVM onboard emulators, XDS560 PCI are not supported


Most USB/LAN emulators will be supported


XDS100, SD 510USB/USB+, 560v2, BH 560m/bp/lan


http://processors.wiki.ti.com/index.php/Linux_Host_Support



Code Composer Studio v4


Easy to use, Eclipse based
IDE: Compiler, linker, more


Supports
all
MSP430
MCUs


Enhancements since CCE v3:


Speed


Code size improvements


Auto
-
updating


$495
for CCS v4 MCU Edition


Free

for apps <16KB


Identical look and feel as Code
Composer
Essentials

http://wiki.msp430.com/wiki/index.php?title=Category:Code_Composer_Studio_v4


30

Analyze:

Visualize data



View signals in native
format


Change variables on the fly

and see their effects


Numerous application
-

specific graphical plots


FFT waterfall


Eye diagram


Constellation plot


Image displays & more


Requires no additional

code

Graphical Signal Analysis:

31

BACKUP

C6701 DSP Block Diagram

33

C672x DSP Block Diagram



THANK YOU

34