👩🏽‍💻 👲🏼 👨🏻‍💼 DSP-processors: purpose and features 📡 ✅ 🕵🏿

source: https://innovas-services.fr/solving-business-problems/

DSP-processors: purpose and features

: (general-purpose, x86) , , , . , : , Digital Signal Processors DSP.

DSP , ( ) , . ARM-, DSP.

DSP, , , .

DSP 1970- . - , , ( - ). , ( MOSFET) (, ) , .. , , . , (time-to-market), , . .

Fig. 1 DSP's first major success: Speak & Spell tablet (Texas Instruments, 1978) — . 1 DSP: Speak&Spell (Texas Instruments, 1978)

Fig. 2 Since the advent of the GSM standard, DSPs have been an indispensable component of mobile networks. — . 2 GSM DSP

Fig. 3 Image processing in cameras (debayering, noise removal, filtering) is also done on DSP (source: https://snapshot.canon-asia.com/india/article/en/5-things-made-possible-with-digic- image-processor) — . 3 (, , ) DSP (: https://snapshot.canon-asia.com/india/article/en/5-things-made-possible-with-digic-image-processor)

- DSP . - , .. , .

DSP

DSP , Intel Xeon Cortex-A, ? Intel.

Fig. 4 Intel Skylake (source: https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)) — . 4 Intel Skylake (: https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client) )

, , , (out-of-order speculative execution) (scheduling). , “” , .. , , 1%:

While a simple arithmetic operation requires around 0.5–20 pJ, modern cores spend about 2000 pJ to schedule it.

Conventional multicore processors consume 157–707 times more energy than customized hardware designs.

( “Rise and Fall of Dark Silicon”, ).

, Intel DSP Texas Instruments ( Skylake Xeon Platinum 8180M TMS320C6713BZDP300):

	CPU (Intel)	DSP (TI)
	2.5	500
	28	1
	560 GIPS	1.8 GIPS
	205	1
Out-of-order
	$13K (+ )	$35 ( )
Target applications		- - -
/ /	0.097 GIPS//	1.7 GIPS// ( 17 Intel)
/ / / $	0.0075 MIPS///$	0.051 GIPS///$ ( 7000 Intel)

, DSP , 1 4 (!) . : DSP : , , .. ( , ) .

DSP

DSP. Jennifer Eyre, BDTI, “ DSP , ” (“Architecture of DSP is molded by algorithms”, “Evolution of DSP Processors”). :

(ILP, Instruction Level Parallelism)
(, , )
,

, .

ILP :

(SIMD, Single Instruction Multiple Data)
(CISC, Complex Instruction Set Computer):
1. ( , , , )
2. ( )
3. ( , - -, .)
4. - ( , , - )
5. ( IP- Ceva Tensillica)
( scatter/gather)
, :
1. ( )
2. ( )
4. (zero-overhead loops)
- ( , , QR-, )
(DMA), 2D/3D-

( 20- 40- )
(in-order) , (speculation) (out-of-order)
(VLIW)
(exposed pipeline)
1. DRAM-
2. instruction- data- ( 1 ) SRAM-, .. scratchpad Tightly Coupled Memory,
3. – , TCM,
(branch target buffer, BTB) DSP ( , , )
(.. , )

, DSP .

Texas Instruments:

;

L2:

LDW .D1T1 *A4++, A3

|| LDW .D2T2 *B4++, B5 ;

NOP 3 ;

BDEC .S2 L2, B0 ; ( + + goto)

ADD .L1X B5, A3, A3 ; 1

STW .D1T1 A3, *A5++ ; 2

BDEC

, , 0
||
NOP

DSP

DSP . , ( , ), , . , , , DSP.

, DSP : LTO (link-time optimization) PGO/FDO (profile-guided/feedback-driven optimization). , restrict/noalias, .

- (intrinsic), .. , , . , ( ).

, , , - : - - .

, , : , . ( Nvidia Nsight ). .

, DSP, .

DSP (, , , , IDE) :

( )
host- intrinsic-, ( , - – “”)
(“linear assembler”), ( , )
, .. source-level ( )
boilerplate- ( RPC- DSP )

DSP

DSP . – –O2 –O0 - (.. out-of-order ). DSP . , , performance-critical .

DSP open-source :

Open64 ( Ceva Cadence/Tensilica)
GCC (Texas Instruments Qualcomm)
LLVM ( Ceva Qualcomm, Cadence/Tensilica)

C++, , OpenCL, OpenMP/OpenACC Halide.

DSP , .. (, DSP Hexagon AMDGPU , AArch64). , VLIW : , NOP’ . Intel Itanium, DSP VLIW, , “-” (“heroic compiler”, , ). , DSP ( Partitioned Boolean Quadratic Programming).

, , . , :

//      

// (      )

#pragma MUST_ITERATE(min, max, multiple)

for (i = x; i < y; ++i)

  ...

//     

#pragma FUNC_NO_GLOBAL_ASG(func)

extern void foo();

 

//        

#pragma FUNC_NO_IND_ASG(func)

extern void bar();

 

// i    8

_nassert(i & 3 == 0);

, , ( __builtin_prefetch

GCC) , ( , -).

, ( ) , restrict . :

( -Msafeptr=all PGI).

, . :

int32x4_t acc = 0;

  int *p = ..., coeff = ...;

  for (i = 0; i < N; i += 4) {

    int32x4_t x = vload(&p[i]);

    acc = vmac(acc, coeff, x);

//  acc += coeff * x;    , ..     

  }

intrinsic- . ( , .. ).

DSP ( , , , .) :

(unrolling) (software pipelining)
(if-conversion)
(induction variable renaming)

, .

//  

for (i = 0; i < N; ++i)

  a[i] = b[i] * 3;  

// 

ld (a0)+, a2

nop 3

mul a2, a3, a4

st  a4, (a1)+

//  

tmp2 = b[0] * 3; tmp1 = b[1];

for (i = 0; i < N - 2; ++i) {

  a[i] = tmp2

  tmp2 = tmp1 * 3;

  tmp1 = b[i + 2]

}

a[N - 2] = tmp2; a[N - 1] = tmp1 * 3;

 

// 

st a4, (a1)+ || ld (a0)+, a2 || mul a2, a3, a4

, .

 for (i = ...) {

  if (p[i])

    x[i] = a * y;

  else

    x[i] = b * z;

}

for (i = ...) {

  bool predicate = p[i];

  tmp1 = predicate ? a * y : 0;

  tmp2 = predicate ? b * z : 0;

  x[i] = predicate ? tmp1 : tmp2;

}

, ,

for (i = ...) {

  bool predicate = p[i];

  tmp1 = a * y;

  tmp2 = b * z;

  x[i] = predicate ? tmp1 : tmp2;

}

cmp a7, 0, p1

mul a0, a1, a2, p1

mul a3, a4, a2, !p1

, , p1

. “” (- ), ( ).

, , .

// Unroll 2

for (i = ...) {

  z[i] = a * x[i]

  ++i;

  z[i] = a * x[i];

  ++i;

}

// Dependencies removed

for (i1, i2 = ...) {

  z[i1] = a * x[i1]

  i1 += 2;

  z[i2] = a * x[i2];

  i2 += 2;

}

, i1

i2

, .

Fifty years of signal processing
DSPs for Mobile Communication

The Rise and Fall of Dark Silicon
Understanding sources of inefficiency in general-purpose chips

VLIW

J.A. Fisher et al. “Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools” (J.A. Fished - VLIW)
“Mill computing” YouTube ( )

VLIW

Texas Instrument (User Programmer guides, App reports)
The Making of a Compiler for the Intel Itanium Processor
The Multiflow Trace Scheduling Compiler
DSP-C Specification Embedded-C extensions

DSP

CEVA Launches Machine Learning DSP Solution: CEVA-XM6 (anandtech)
CEVA-XC12 The world's most advanced communications DSP

DSP ( E. Belaish, CEVA)

Combining C code with assembly code in DSP applications
Architecture Oriented C Optimizations
Compiler optimization for DSP applications

, Senior Engineer, System-on-Chip SW Team, Samsung

Samsung: , .

Samsung Exynos DSP NPU. , , . Samsung state-of-the-art , . : -. .

→ Github

DSP-processors: purpose and features

DSP-processors: purpose and features

DSP

DSP

DSP

DSP

More articles: