lecture asip 5

Upload: neha-pachauri

Post on 03-Apr-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Lecture ASIP 5

    1/59

    VLSI Architecture :: MEL G642

    MEL G642

    Dr. A. Amalin Prince

    BITS Pilani K.K. Birla Goa Campus

    Department of Electrical , Electronics and Instrumentation Engineering

  • 7/28/2019 Lecture ASIP 5

    2/59

    System on a chip

    What is a DSP core?

    What is a DSP processor?

    What is a DSP subsystem?

    MCU is the task controller-executes tasks without real-time

    requirements

    MEL G642

    MCU subsystemDSP subsystem

    DSP Processor core

    RF Control path

    DM PM Interrupt TimerDMA MMU

    CustomerIF

    ALU MAC

    Bus with arbitration

    Main memories

    MCU core

    AGU

    SoCdesignhierarchy

    CustomerIF

  • 7/28/2019 Lecture ASIP 5

    3/59

    G642

    rchite

    cture

    Bus and arbiration

    MMU

    Main memories

    RF Control pathADG

    DM DM PMDMA

    ubsys

    tem

    MEL G642

    ME

    L

    Processor

    DSP Processor

    DSP core

    Interrupt TimerOther pheriph

    Chip inferface

    ALUMAC accelerator DSP

  • 7/28/2019 Lecture ASIP 5

    4/59

    Architecture and microarchitecture

    The processor architecture is the hardware organization

    of the core and its peripherals including the memory bus

    architecture. Architecture represents relations of modules The microarchitecture design is the specification of

    functional modules

    MEL G642

    ASIP microarchitecture design is the implementation ofan ISA specification into hardware modules.

  • 7/28/2019 Lecture ASIP 5

    5/59

    Inside a core

    The core can be divided into three parts:

    the datapath, the control path, and the address generation unit

    (AGU). The core components are organized around two data

    busses:

    MEL G642

    The memory bus is distributed between the core and thememory subsystem.

    The register bus connects the register file to all units in the core.

  • 7/28/2019 Lecture ASIP 5

    6/59

    Memory subsystem in a DSP subsystem

    The memory subsystem consists of

    data memories (DM),

    program (code) memory (PM), AGU, DMA, and MMU.

    MEL G642

  • 7/28/2019 Lecture ASIP 5

    7/59

    Peripherals in a DSP subsystem

    Timers for counting clock cycles and events

    Interrupt controller for handling interrupts

    DMA (Direct Memory Access) controller for handlingdata transfers to/from main memory and between other

    memories/ports

    MEL G642

    MMU (Memory Management Unit) for reliable andefficient (address space) memory usage

  • 7/28/2019 Lecture ASIP 5

    8/59

    DSP memory architecture

    MEL G642

  • 7/28/2019 Lecture ASIP 5

    9/59

    History of DSP memory architectures

    Memory

    Control Arithmetic

    Programmemory Datamemory

    MEL G642

    un un t

    In-out

    (a) Von Neumann architecture

    Controlunit Arithmeticunit

    In-out

    (b) Harvard architecture

  • 7/28/2019 Lecture ASIP 5

    10/59

    History of DSP memory architectures

    DP DM

    CP PM

    DP DM

    CP PM

    MUX

    DP DM

    CP PM

    MUX

    MEL G642

    (a) (b) (c)

    One tap of

    convolution requires

    multiple clock

    Fetch coefficients

    instead of

    instructions during

    CONV

    Dual port/multi-port

    memory required.

    Used up to 1980s

  • 7/28/2019 Lecture ASIP 5

    11/59

  • 7/28/2019 Lecture ASIP 5

    12/59

    A typical DSP bus architecture

    Register _File ALU MAC

    OPA

    OPB

    ressingpath

    (AGU)

    Register

    bus

    Datapath

    MEL G642

    -a ress

    D1-data

    D2-address

    D2-data

    P-address

    Program

    PM DM1 DM2ControlPa

    th(CP)andadd

    Memorybu

    s

    PMbus

  • 7/28/2019 Lecture ASIP 5

    13/59

    Control flow of DSP ASIP

    Calculate PC Request an instruction Receive an instruction

    Send PC

    to PM

    Get code

    from PMreset

    MEL G642

    Receive states from DP Generate operandaddresses

    Decode the instructionand send control to DP

    Control

    signals to DPGenerated addressto storage units

    Flags

    from DP

    Instruction Flow FSM

  • 7/28/2019 Lecture ASIP 5

    14/59

    Data flow of DSP ASIP

    Receiveinstruction

    Receive operand

    address

    Fetch

    operands

    From PMFrom address

    generatorSend address

    to storage HW

    MEL G642

    Return

    statesStore

    result

    Execute

    instruction

    Flags toPC FSM

    Send result to

    storage HW

    Data Flow FSM

  • 7/28/2019 Lecture ASIP 5

    15/59

    G642

    faDSP

    processor

    RF

    UPCFSM

    PM

    Program

    address

    gura

    tion

    status

    xecun

    itALU

    /MAC

    Instruction

    Program flow control

    Results

    struc

    tion

    dec

    oder

    Operand

    &

    result

    co

    ntrol

    MEL G642

    MEL

    Acompleteview

    o

    A

    Con

    fi

    an

    d

    D

    Operation ctrl

    MEM ctrl

    Legend

    Data bus Control signals

    Memory busInternal signals

    in control path

    I

  • 7/28/2019 Lecture ASIP 5

    16/59

    Modules in a core

    MEL G642

  • 7/28/2019 Lecture ASIP 5

    17/59

    Modules in a DSP core

    Datapath

    Register file

    ALU MAC

    AGU

    MEL G642

    Control path

  • 7/28/2019 Lecture ASIP 5

    18/59

    Differences between design of DSP and MPU

    The MPU designers think ofultimate performance and

    ultimate flexibility as well as the compiler-friendly

    instruction set. The ASIP DSP designers think ofapplication and cost

    first, and the challenge is to be efficient.

    MEL G642

    The goal of an ASIP design is to reach the highestperformance over silicon, the highest performance over

    power consumption, the highest performance over the

    design cost.

  • 7/28/2019 Lecture ASIP 5

    19/59

    Is DSP CISC or RISC

    a DSP, like a RISC:

    More general-purpose registers.

    Most instructions as simple instructions. Instruction decoding by decoding logic circuit instead of

    microcode.

    MEL G642

    egu ar ns ruc on p pe n ng.

    a DSP, like a CISC:

    One execution cycle for ALU and multiple cycles for iteration. Complicated data memory addressing modes and circuits.

    Special-purpose registers (accumulator registers).

    Strong instructions for accelerating certain tasks.

  • 7/28/2019 Lecture ASIP 5

    20/59

    Is DSP CISC or RISC

    DSP RISC CISC

    Emphasis on hardware

    and software

    Emphasis on software Emphasis on hardware

    Single and multiclockcomplex instructions Single-clock, reduced instructiononly Includes multiclockcomplex instructions

    Operands from registers

    Operands also from

    Operands only from registers

    LOAD and STORE are used to

    Arithmetic computing based

    on memory-to-memory

    MEL G642

    data memories

    - -

    register variables

    Small code size Large code size Small code size

    Most silicon area used for

    program and data storing

    Most silicon area used for

    program and data storing

    Silicon might be used for

    storing complex instructions

    (microcode)

  • 7/28/2019 Lecture ASIP 5

    21/59

    Design instruction set

    MEL G642

  • 7/28/2019 Lecture ASIP 5

    22/59

    G642

    tdesi

    gnflow

    S ource code profiling: c overage and 10-90% lo cality

    D esign o f ge neral R ISC instructio ns

    D esign of C ISC accele rate d ins tructions

    De sig n of mi sc ellaneous ins tructions

    MEL G642

    MEL

    Instru

    ctions

    Instruc tion s et simu la tor and a ssem bler

    Benc hmarking performa nce a nd covera ge

    Release the ins truc tion set archi tec ture

    N o

    ye s

    ns t ruc t on c o ng an r e ea se manua

    satisfied

  • 7/28/2019 Lecture ASIP 5

    23/59

    Release an instruction set

    Design of

    assembly

    instruction set

    Instruction set

    benchmarkingApplication

    profiling

    MEL G642

    When

    Benchmarking result equivalent to requirements

  • 7/28/2019 Lecture ASIP 5

    24/59

    We need to identify problems

    How is an instruction set designed and why is it designed

    in that way?

    In which circumstances should a function beimplemented using an instruction instead of a subroutine?

    Why ASIP DSP instructions not really RISC

    MEL G642

    Why my benchmarking is not satisfactory?

  • 7/28/2019 Lecture ASIP 5

    25/59

    What is the starting point

    Let us start at the point to implement C functions to an

    assembly instruction set

    A typical architecture with two DM in parallel Instructions including move-load-store, ALU/MAC, and

    program flow control

    MEL G642

  • 7/28/2019 Lecture ASIP 5

    26/59

    Classify the Instruction set

    Instruction

    group /type

    Operands Operations Mathematical

    description

    Flags CC

    Load, store,

    and move

    Register name

    and memoryaddressing

    Data transfer

    and addressingmodes

    DST (ADR)

  • 7/28/2019 Lecture ASIP 5

    27/59

    Move-load-store instructions

    RISC processor architecture simple.

    Data and parameters of a subroutine are loaded to the

    register file first. Operands are from register file or immediate data carried

    by an instruction.

    MEL G642

    Results in the register file need to be moved back to thedata memory

  • 7/28/2019 Lecture ASIP 5

    28/59

    Move-load-store instructions

    Mnem Operand Description Operation CC

    Load Rd, DA Load data from memory

    0/1

    RdDM(DA) 1

    Store DA, Rs Store data to memory

    0/1

    DM(DA) Rs 1

    MEL G642

    move Rd, Rs Move between two

    registers

    Rd Rs 1

    move Rd, K Move immediate data to

    a register

    Rd immediate 1

  • 7/28/2019 Lecture ASIP 5

    29/59

    Addressing for data memory access

    Memory addressing is addressing algorithm carried by anassembly instruction.

    It specifies the way to calculate the memory the uniquelocation of data in a data memory for a read or a write.

    MEL G642

    Implicitly addressing algorithm in C; explicitly algorithmin ASM

  • 7/28/2019 Lecture ASIP 5

    30/59

    Addressing for data memory access

    Name DA DA code

    cost (b)

    Memory Algorithm CC

    Direct D 16 DM0/1 16-bit constant as the direct

    memory address

    1

    Register

    indirect

    R 5 DM0/1 A register containing the memory

    address

    1

    =

    MEL G642

    incremental

    ,

    addressingRegister

    decrement

    --R 5 DM0/1 R=R1 before addressing, R gives

    address

    1

  • 7/28/2019 Lecture ASIP 5

    31/59

    Arithmetic logic instructions

    Basic arithmetic operations in C are +, , , /, and %.

    The modulo operation % is not used very often for DSP

    arithmetic computing, to implement it using a subroutine. Division operation / is not easy to implement in

    hardware

    MEL G642

  • 7/28/2019 Lecture ASIP 5

    32/59

    Basic Arithmetic Instructions

    Mnem Operand Description Operation Flags CC

    ADD Rd, Rr Add Rd Ra + Rb Z,N,V 1

    SUB Rd, Rr Subtract Rd Ra - Rb Z,N,V 1

    ABS Rd, Rr Absolute operation RdABS(Ra) Z,N,V 1

    INC Rd Increment Rd Ra + 1 Z,N,V 1

    DEC Rd Decrement Rd Ra - 1 Z,N,V 1

    MEL G642

    MPL A, Rd, Rr Multiplication A

    Ra Rb Z,N,V 1MAC A, Rd, Rr Multiplication and

    accumulation

    AA + Ra Rb Z,N,V 2

    RND Rd, A Round, saturate,

    and truncate

    Rd Saturate(Round(A)) Z,N,V 1

    CAC A Clear an

    accumulator

    A 0 Z,N,V 1

  • 7/28/2019 Lecture ASIP 5

    33/59

    Logic and Shift Operations

    Logic and shift operations in C

    &(and), |(or), ~(not), ^(xor),

    > (right shift).

    Here "and" operates on each bit of operand A and B; that

    is, C[0]=A[0] & B[0], C[1]=A[1] & B[1],

    MEL G642

    C[15]=A[15] & B[15].

    L i d Shif O i

  • 7/28/2019 Lecture ASIP 5

    34/59

    Logic and Shift Operations

    Mnem Operand Description Operation Flags CC

    AND Ra, Rb A logic-and B Rd Ra and Rb C, Z 1

    OR Ra, Rb A logic-or B Rd Ra or Rb C, Z 1

    NOT Ra, Rb Invert A Rd INV (Ra) C, Z 1

    XOR Ra, Rb A logic-xor B Rd Ra xor Rb C, Z 1

    MEL G642

    LS Ra, Rb Logic left shift Rd Ra left shifted byRb [3:0]

    C, Z 1

    RS Ra, Rb Logic right shift Rd Ra right shifted by

    Rb [3:0]

    C, Z 1

    L i O i C

  • 7/28/2019 Lecture ASIP 5

    35/59

    Logic Operators in C

    Condition symbol Conditions

    < Less than

    = Greater than or equal to

    > Greater than

    MEL G642

    != Not equal to

    && Boolean AND

    || Boolean OR

    ! Boolean NOT

    P fl t l i C

  • 7/28/2019 Lecture ASIP 5

    36/59

    Program flow control in C

    Conditional and unconditional controls in C. Unconditional GOTO operations.

    Conditional: Condition test and jump in C are integrated, for

    example, if A then B else C.

    In an assembl lan ua e

    MEL G642

    Condition test and condition jump are separated the first instruction offers and flag computation

    the second instruction is the conditional jump

    P fl t l i t ti

  • 7/28/2019 Lecture ASIP 5

    37/59

    Program flow control instructions

    Mnem Description Condit

    ions

    Flags

    meet

    CC

    JLT Jump when Less than < N=1 3/1

    JLE Jump when Less than or Equal to N=0 and

    Z=0

    3/1

    JNE Jump when Not Equal to != Z=0 3/1

    JUMP Unconditional jump 3

    CALL Jump, push return address into stack 3

    Return Return to the stacked address 3

    Target addressing for jumping

  • 7/28/2019 Lecture ASIP 5

    38/59

    Target addressing for jumping

    TA Algorithm

    Absolute 16 bits constant

    Relative In a general register

    MEL G642

    y

  • 7/28/2019 Lecture ASIP 5

    39/59

    G642

    ionSe

    tSumm

    ary

    MEL G642

    M

    EL

    A

    ssembl

    yInstru

    c

  • 7/28/2019 Lecture ASIP 5

    40/59

    Benchmarking theinstruction set

    MEL G642

    What is benchmark

  • 7/28/2019 Lecture ASIP 5

    41/59

    What is benchmark

    DSP benchmarking gets cycle cost and code size used by

    a DSP algorithm with single-precision data.

    Convention of DSP benchmarking round is required before moving long data from an accumulation

    register to a general register

    MEL G642

  • 7/28/2019 Lecture ASIP 5

    42/59

    How to benchmark

  • 7/28/2019 Lecture ASIP 5

    43/59

    How to benchmark

    BDTI benchmarking convention

    It measures the execution time (cycle cost), the code size

    (program memory cost), and the cost of data memories.

    The cycle cost = prologue + Kernel + epilogue

    MEL G642

    Prologue: preparing for running a program,

    Epilogue: terminating the program

    Kernel: the part of the algorithm

    Assumption in this discussion

  • 7/28/2019 Lecture ASIP 5

    44/59

    Assumption in this discussion

    Data frame size: 40 samples.

    The number of FIR taps = 16.

    The cycle cost = 1 cycle per normal instruction 3 cyclesfor jump taken.

    MAC takes one c cle if the followin instruction does

    MEL G642

    not use the data in an accumulator register. TSMD: a typical single MAC DSP (TSMD)

    processor available as a COTS (commercial off-the shelf).

    Example: Block Transfer

  • 7/28/2019 Lecture ASIP 5

    45/59

    Example: Block Transfer

    C-code: DM1 (SEG: 0 to 39) -> DM1 (SEG: 0 to 39)

    Assembly code

    MEL G642

    Example: Block Transfer

  • 7/28/2019 Lecture ASIP 5

    46/59

    Example: Block Transfer

    Processor Algorithm Total cycle

    cost

    Pro-epilogue

    cycle cost

    Kernel

    cycle cost

    Total code

    cost

    Code for pro-

    epilogue

    DM

    cost

    Basic (ours) BT 242 4238

    8 4 84

    TSMD 47 4

    437 4 84

    MEL G642

    The loop: The extra cost of each jump taken and DEC of theloop counter consumes four clock cycles. HW loop may

    eliminate the cost.

    Load and store can be merged to a memory move to memoryinstruction.

    Example: Single sample FIR

  • 7/28/2019 Lecture ASIP 5

    47/59

    Example: Single sample FIR

    Modulo addressing

    FIFO Emulated in a data memory

    Can be hardware accelerated memory addressing (for accelerated

    instructions)

    MEL G642

    Example: consider 7-tap FIR Filter

  • 7/28/2019 Lecture ASIP 5

    48/59

    Example: consider 7 tap FIR Filter

    MEL G642

  • 7/28/2019 Lecture ASIP 5

    49/59

    Example: Single sample FIR

  • 7/28/2019 Lecture ASIP 5

    50/59

    p g p

    Assembly code

    MEL G642

    Example: Single sample FIR: FIFO behavior

  • 7/28/2019 Lecture ASIP 5

    51/59

    p g p

    DM X (n-3)X (n-4)

    X (n)

    DARX (n-4)

    X (n)

    X (n-1)

    MIN address

    DAR

    BAR BARStep 0 Step 1

    MEL G642

    Thed

    atamemoryspace

    TAR

    BAR

    TheFIFO

    buffer

    BAR + 0

    BAR + 1

    BAR + 2

    BAR + 3

    BAR + 4

    X (n-1)

    X (n-2)

    X (n-2)

    X (n-3)

    Example: The procedure a FIFO getting a new data sample

    before getting

    new data

    MAX address

    after getting

    new data 1

    X (n)

    X (n-1)X (n-2)

    X (n-3)

    X (n-4) DAR

    X (n-1)

    X (n-2)X (n-3)

    X (n-4)

    X (n)

    after getting

    new data 2

    DAR

    after getting

    new data 3

    TAR TAR

    TAR

    BAR

    TAR

    BAR

    Step 2 Step 3

    Example: N sample FIR (Single Sample inloop)

  • 7/28/2019 Lecture ASIP 5

    52/59

    loop)

    MEL G642

    Example: Single sample FIR

  • 7/28/2019 Lecture ASIP 5

    53/59

    p g p

    Processor Algorithm Total cycle cost Kernel cycle

    cost

    Total code

    cost

    Basic 16-tapFIR 192 173 26

    TSMD 16-tapFIR 31 16 15

    MEL G642

    - . .

    times higher than the benchmark of a TSMD. Opportunities for improvement are:

    The cost ofSW emulated circular buffer and modulo addressing is high.

    o HW circular buffer and modulo addressing is essential.

    Data and coefficient loading, MAC, and the loop control can be merged into

    one instruction, convolution, which is one of the most frequently used

    instructions in DSP.

    CONV N DM0(AP0++M) DM1(AP1++)

    Example:

  • 7/28/2019 Lecture ASIP 5

    54/59

    p

    FIR Filtering

    Auto correlation Autocorrelation is used for finding regularities or periodical

    features of a signal

    MEL G642

    Cross-correlation

    Cross-correlation is used for measuring the similarity of a signal

    with a known signal pattern

    What difference??

  • 7/28/2019 Lecture ASIP 5

    55/59

    Analyses on identifiedproblems

    MEL G642

    Lessons Learned

  • 7/28/2019 Lecture ASIP 5

    56/59

    C does not give parallel features;

    The convolution is one of the most used DSP operations, very high

    efficiency by having the memory addressing, arithmetic

    computing, result store, and program flow control carried out inparallel in one instruction.

    It is ossible because the arallel hardware can be or anized in a

    MEL G642

    pipeline.

    Other most frequently used iterative DSP ops can also be

    specified into one instruction.

    Research work: Why?

    Identify the requirement and benchmark it

    Conclusion

  • 7/28/2019 Lecture ASIP 5

    57/59

    An assembly language instruction set must be more

    efficient.

    Accelerations implemented at arithmetic and algorithmic

    levels.

    Addressing and memory accesses should be executed in

    MEL G642

    parallel with arithmetic computing. Program flow control such as loop or conditional execution

    shall also be accelerated

    ASIP microarchitecture design flow

  • 7/28/2019 Lecture ASIP 5

    58/59

    Proposed assembly language manual

    pe

    line

    Further expose all micro operations of each assembly instruction

    Partiton micro operations into DP, CP, and AP

    MEL G642

    Propose

    dp

    steps

    Schedule micro operations into each pipeline step

    Design for HW multiplexing in DP and AP

    Specify microarchitecture and micro operations for CP

    Release micro architecture documents

    The End :: Thank you for your attention

  • 7/28/2019 Lecture ASIP 5

    59/59

    Questions?

    MEL G642