La presentazione è in caricamento. Aspetta per favore

La presentazione è in caricamento. Aspetta per favore

Introduzione ai processori TMS 320C5402

Presentazioni simili


Presentazione sul tema: "Introduzione ai processori TMS 320C5402"— Transcript della presentazione:

1 Introduzione ai processori TMS 320C5402
Università degli studi di Padova Dipartimento di Ingegneria della Informazione C5402 Introduzione ai processori TMS 320C5402

2 Perché passare al “digitale” ?
Attualmente le tecniche di elaborazione digitale dei segnali, sono così potenti, che molte volte risulta estremamente difficile, se non impossibile ottenere gli stessi risultati con le classiche tecniche di elaborazione analogica. Esempi: FIR filtri a fase lineare. Filtri Adattivi.

3 Perché passare al “digitale” ?
L’elaborazione analogica dei segnali e’ possibile grazie all’impiego di componenti “analogici” quali: Resistenze. Condensatori. Induttanze. La precisione e la stabilità di un circuito analogico può essere compromessa da vari fattori: la tolleranza sui valori dei componenti, la temperatura, i cambiamenti di tensione e le vibrazioni meccaniche.

4 Perché passare al “digitale” ?
Con i DSP risulta molto semplice: Implementare e/o modificare un algoritmo di calcolo Elaborare segnali acquisiti. Interfacciarsi con i calcolatori. Inoltre il DSP riduce: La sensibilità alla interferenze EMI. Il numero di IC in un sistema. Il tempo di sviluppo di un sistema. I Costi. L’assorbimento di potenza (es: tecnologia CMOS).

5 Perché non passare al “digitale”
Talvolta segnali a frequenze elevate non possono essere elaborati in forma digitale per due motivi: I convertitori Analogico-Digitale, ADC non riescono a lavorare con segnali con banda elevata, mantenendo una adeguata risoluzione. L’applicazione potrebbe essere così complessa da non permettere una sua realizzazione in Tempo-Reale (Real-Time).

6 Lavorare in Tempo-Reale
La definizione di Tempo-reale dipende dall’applicazione. Ogni volta che un algoritmo viene “interfacciato” con l’ambiente esterno, deve lavorare in tempo reale. Esempio: Un FIR a 100 coefficienti viene eseguito in Real-Time, se il DSP riesce a completare le seguenti operazioni nell’intervallo di tempo che intercorre tra due campioni:

7 Lavorare in Tempo Reale
Tempo di Attesa Tempo di elaborazione nT (n+1)T Intervallo di campionamento Un’applicazione si dice eseguibile in “Real-Time” se : Il tempo di elaborazione è inferiore al periodo di campionamento ovvero se il tempo di attesa della CPU è non nulla (tale tempo può essere utilizzato per processi secondari)

8 Perché usare un processore DSP
Perché non usare un General Purpose Processor (GPP), come il Pentium, al posto di un DSP? Valutare il consumo di potenza di un Pentium e di un DSP. Valutare il costo di un Pentium e di un DSP?

9 Perché usare un processore DSP
Conviene utilizzare un processore DSP nei casi in cui è necessario: Ridurre i costi. Occupazioni di superficie ridotti. Bassi consumi. Elaborare più segnali ad “alta” frequenza, in real-time. Conviene utilizzare un processore GPP nei casi in cui e’ necessario: Grandi occupazioni di memoria. Sistemi operativi avanzati.

10 Quali sono gli algoritmi tipici per un DSP
La Somma-di-Prodotti (SOP) è l’elemento chiave per la maggior parte degli algoritmi per DSP:

11 Moltiplicazioni in Hardware
I processori DSP, sono ottimizzati per eseguire operazioni di somma e moltiplicazione. Le moltiplicazioni in parallelo alle addizioni sono implementate in Hardware (unità MAC). Tempo di esecuzione: un ciclo macchina.

12 DSP Fixed e Floating point
Le applicazioni che richiedono: Alta precisione. Range dinamico elevato. Rapporti segnale/rumore elevati. Facilità di impiego. Necessitano di un processore in virgola mobile. Svantaggio dei processori in floating point: Alti consumi. Costano molto.

13 DSP Fixed point e Floating point
Sono le applicazioni che impongono quale tipo di dispositivo/piattaforma utilizzare al fine di ottenere le massime prestazioni al minor costo. Per motivi didattici, viene usato un DSP fixed-point (C5402).

14 I processori Texas Instruments della famiglia TMS320
Esistono differenti sottofamiglie per coprire diversi mercati. Bassi Costi Sistemi di controllo Controllo motori Storage Controllori digitali C2000 C5000 Efficienza Elevati MIPS per Watt / Dollaro / Ingombro Telefonia Wireless Internet audio players Digital still cameras Modems Telephony VoIP C6000 Applicazioni Multi-canale e Multi-funzione Infrastrutture comuni Wireless Base-stations DSL Elab. Immagini Multi-media Servers Prestazioni e facilità d’uso

15 Schema a blocchi del DSP
This is the general architecture of most microprocessors. DSPs are based on Harvard architecture where program and data memories are separated and have separated access buses

16 JTAG Test/ Emulation Control
C54x Block Diagram 17x17 MAC Unit Saturation and Rounding Hardware Two 40-bit ACC’s 40-bit ALU 40-bit Barrel Shifter Temporary Register Exponent Encoder Program and Data Address Generation Units Compare, Select and Store Unit 4 Internal Bus Pairs External Interface Muxed GP I/O D(15-0) A(23-0) Program/Data Buses Timer Program/Data SRAM 128K Words Ch 0 Ch 1 Ch 2 Ch 3 Ch 4 Ch 5 DMA 8/16-bit Host Port Interface (HPI) Program/Data ROM 16K Words Peripheral Bus RND, SAT 17 x 17 MPY 40-Bit Adder MAC Shifter 40-Bit Barrel (-16, 31) EXP Encoder 40-Bit ALU CMPS Operator (VITERBI) ALU Accumulators 40-Bit ACC A 40-Bit ACC B 8 Auxiliary Registers 2 Addressing Units Addressing Unit Multichannel Buffered Serial Port (McBSP) JTAG Test/ Emulation Control PLL Clock Generator S/W Waitstate Generator Power Management C5416 example 5

17 Central Processing Unit
17x17 MAC Unit Saturation and Rounding Hardware Two 40-bit ACC’s 40-bit ALU 40-bit Barrel Shifter Temporary Register Exponent Encoder Program and Data Address Generation Units Compare, Select and Store Unit 4 Internal Bus Pairs External Interface some people just can’t live without seeing the entire block diagram of the part. Of course it’s utterly confusing without a map to guide you by. Switch back if you must to the previous slide. Identify the parts that we covered. Show the extra parts that we did not like Viterbi, shifters, etc. When you see Eric and I present, circle the areas we cover in your hardcopy. DO NOT GO INTO MUCH DETAIL. 3 MINUTES MAX ON THIS FOIL. 12

18 What Problems Are We Trying To Solve?
Amplitude x4 x3 x2 x1 x0 Time Data Read Buses ALU B MAC A anxn 3 n = 0 y0 = z = x2 + x4 + x3 + x1 Here are two of the things people buy digital signal processors for: high-speed multiply accumulates general-purpose math What we’re trying to do here is build up the need for this kind of architecture. High-speed math is the primary driver for everything in this part. note that the MAC instruction is dual operand and will need to access two data operands per clock cycle A and B accumulators provided extra level of flexibility Single-cycle MAC Single-cycle ADD MAC *AR2+, *AR3+, A ADD @x2, B ... 10

19 C5402 Architecture Data Read A/D Bus (C) Data Read A/D Bus (D)
Program A/D Bus (P) Decode PC MAC ALU A B Addr Gen DP @x2 AR0-7 Work your way from right to left. MACs generate results into the accumulators. MACs need operands. To read operand’s you need addresses. To generate addresses, the part needs to understand what instruction your trying to run. To fetch instructions you need to generate program addresses. Finally you need to store results. To store results, you need addresses. Watch out here for going into too much detail on addressing modes. That’s not the point of the foil. Work backwards from the MAC and ALU showing the need for operands, addresses, etc. Data Write A/D Bus (E) MAC *AR2+, *AR3+, A ADD @x2, B ... 11

20 C5402 Internal Memory and Buses
4Kx16 0-wait ROM 2x8Kx16 0-wait DARAM 5402DSK Memory Resources 64Kx16 1-wait SRAM P Bus Ext’l Mem I/F A D Bus D 256Kx16 7-wait FLASH C Bus E Bus Look at the previous slide….the fully loaded pipe … how are you going to support a program access, two data reads and maybe a data write in a single cycle? With terrific internal memory resources. Spread you r memory usage out to limit access to 2x per cycle per DARAM block. Here we show that the has access to both internal and external memory. Internal memory is 0 wait state, power efficient and can be accessed multiple times per cycle. External memory can only be accessed once per cycle and will almost certainly be subject to wait states. Internal memory is broken up into blocks which offer the user the ability to place their code and data intelligently for maximum performance. The “buses” going into the ROM and DARAM are address and data buses. You might think that a bus shouldn’t go INTO the ROM (as if you’re righting to the ROM which is impossible). The arrows going INTO the ROM signify the ADDRESS buses. ROM - 1 access per block per cycle DARAM - 2 accesses per block per cycle External - 1 access every other cycle Wait States are shown for 100MHz clock 14

21 C5402 DSK Data Memory C5402 can access 64Kx16 data
0000 MMRs 0060 SPRAM C5402 can access 64Kx16 data 0080 ~8Kx16 DARAM Block 1 All internal accesses are 0-wait User should partition algorithm resources to avoid memory access conflicts 8Kx16 DARAM Block 2 2000 Can access most CPU registers via memory-mapped locations (MMR) 4000 All C54 data access is limited to 64K. This is a primary limitation of hardware. All address registers, data read buses, data address generation hardware, etc. are all limited to 16 bits. Internal DARAM is broken up into two main blocks so that the user can allocate memory accesses without encountering conflicts. SPRAM is a small area (32 words) accessible by memory mapped addressing or other means. Memory mapped registers take up the region below 60 hex. any external memory mapped into the area below 4000 hex in this part will not be visible. Accesses below 4000 hex are automatically made inside the part. Accesses above 4000 hex will be made to external memory. External 48Kx16 1-wait SRAM C5402 can also access 64Kx16 I/O FFFF What internal peripherals are on the ‘C5402 ? 16

22 C5402 DSK Program Memory C5402 can address up to 1Mx16 of program memory Program Memory 16K DARAM OVLYbit=0 on reset (all program is external) Upper 48K Page 0 Flash DSK uses the following: 00 FFFF OVLYbit=1: 16K DARAM mapped to ALL Program Mem Pages (access as data/prog) Allows access to 0-wait memory for code Only 256K of 1M total address reach of C5402 is physically implemented . 16K DARAM Program addressing in the 5402 offers two flavors: OVLY=0 where every page of program memory is separate OVLY=1 where internal memory is mapped to the lower portion of every program page. This allows the user to load high-speed code into internal memory and run it in zero wait state memory. OVLY=1 is the common usage. Upper 48K Page 3 Flash 03 FFFF Let’s take a closer look at the data memory resources... 15

23 Pipeline Drives Single-Cycle Performance
Pipeline Phases P F D A R X P - generate program address P F D A R X F - get opcode from prog mem D - decode instruction A - generate data read address R - get operands from data mem X - execute instruction Full Pipeline Pipeline phases maximize hardware usage One instruction is retired EVERY cycle Dedicated loop control instructions (RPT and RPTB) available to reduce pipeline flushing many people will not be familiar with a pipelined architecture. It takes no less then six cycles to execute any instruction. Once the pipelined is full though, an instruction can be retired every cycle. Obviously you want to avoid discontinuities in program execution. Branching to a new location will cause the pipelined to flush, costing cycles. Modern DSPs offer features that remove the need for branching in high-speed code. Note that this pipeline breaks up both program memory and data memory accesses into two cycles. P/F and A/R. This gives extra time for external memory to respond and reduces cost. The major advantage of a pipelined machine is speed. There is an additional advantage of power efficiency. Every part of the processor is operating every single cycle. Technical Training Organization T TO How does the architecture support pipelining? 13

24 What does the DSK look like?
Peripheral Overview C54x CPU C5402 McBSP 2 Multi-Channel BSPs: Each offers up to 128-channel rcv/xmt DMA 6-channels: facilitates transfers without CPU intervention EHPI Host Port Interface: 8-bit interface to host processor Boot Boot Loader: Multiple ways to load program to volatile memory Timers Two 20-bit timers: Can generate timed-based interrupts GPIO General Purpose I/O: 4 dedicated and 16 multipurpose pins This is a very short and sweet overview of the peripherals that the 5402 has. McBSP The multi-channel buffered serial port allows access of any 32 of these 128 channels don’t go into too much detail. Will cover more later In module 2, we show the I/O gateways as serial port and DMA. Also included is a slide or two on the specifics of the DMA and McBSP. If the students want more details on these two peripherals, tell them to wait until module 2. PLL Phase Locked Loop: software programmable Pwr Down Idle Modes: Power saving modes and features Technical Training Organization T TO What does the DSK look like? 17

25 C5402 McBSP CPU DMA DR RBR RSR DRR DX XSR DXR CLKR CLKX FSR FSX
RINT Data Bus DMA DR RBR RSR DRR DX XSR DXR XINT CLKR Clock & Frame Control Event CLKX FSR Multi-Channel Control FSX Full duplex direct interface to codecs and other serial devices Max bit rate: 1/2 CPU Clock Rate Word length: 8-, 12-, 16-, 20-, 24-, 32-bit Multi-channel operation supports up to 128 channels Support for ulaw/A-law companding built in Technical Training Organization T TO What is the CPU busy doing ? 6


Scaricare ppt "Introduzione ai processori TMS 320C5402"

Presentazioni simili


Annunci Google