Espressif DSP Library Benchmarks

The table bellow contains benchmarks of functions provided by ESP-DSP library. The values are CPU cycle counts taken to execute each of the functions. The Values in the column “O2” are made with compiler optimization for speed, and in the column “Os” column are made with compiler optimization for size. The values in “ESP32” and “ESP32S3” column are for the optimized (assembly) implementation, values in “ANSI” column are for the non-optimized implementation.

Function

Optimization

ESP32

O2

ESP32S3

O2

ANSI

O2

ESP32

Os

ESP32S3

Os

ANSI

Os
             
Dot Product            
dsps_dotprod_f32 for N=256 points 1058 589 1325 1058 448 4129
dsps_dotprode_f32 for N=256 points with step 1 1317 1325 2853 1317 1325 3621
dsps_dotprod_s16 for N=256 points 447 325 3647 447 323 6466
             
FIR Filters            
dsps_fir_f32 1024 input samples and 256 coefficients 1079312 1074223 2150685 1079599 1074222 5147785
dsps_fird_f32 1024 samples 256 coeffs and decimation 4 350915 347436 614234 350520 347607 1317367
             
FFTs Radix-2 32 bit Floating Point            
dsps_fft2r_fc32 for 64 complex points 6079 5142 7037 5452 5142 8333
dsps_fft2r_fc32 for 128 complex points 13031 11707 15907 12399 11707 19035
dsps_fft2r_fc32 for 256 complex points 27828 26303 35562 27828 26303 42922
dsps_fft2r_fc32 for 512 complex points 61753 58435 78705 61753 58435 95673
dsps_fft2r_fc32 for 1024 complex points 135742 128586 172664 135742 128585 211252
             
FFTs Radix-4 32 bit Floating Point            
dsps_fft4r_fc32 for 64 complex points 3125 3262 5185 3247 3174 5631
dsps_fft4r_fc32 for 256 complex points 15551 15450 26115 16056 15789 28397
dsps_fft4r_fc32 for 1024 complex points 75547 75185 127669 77587 76548 138522
             
FFTs 16 bit Fixed Point            
dsps_fft2r_sc16 for 64 complex points 8786 933 14575 8786 793 15861
dsps_fft2r_sc16 for 128 complex points 20214 1734 33121 20214 1627 36238
dsps_fft2r_sc16 for 256 complex points 45755 3507 74290 45755 3430 81638
dsps_fft2r_sc16 for 512 complex points 102208 7312 164803 102416 7311 181758
dsps_fft2r_sc16 for 1024 complex points 225862 15641 362193 225861 15640 400853
             
IIR Filters            
dsps_biquad_f32 - biquad filter for 1024 input samples 17450 17458 24613 17451 17459 36895
             
Matrix Multiplication            
dspm_mult_f32 - C[16;16] = A[16;16]*B[16;16] 24669 6298 51502 24670 6529 78197
dspm_mult_s16 - C[16;16] = A[16;16]*B[16;16] 24707 1847 83699 24707 2047 99353
dspm_mult_3x3x1_f32 - C[3;1] = A[3;3]*B[3;1] 79 88 226 80 86 271
dspm_mult_3x3x3_f32 - C[3;3] = A[3;3]*B[3;3] 211 217 492 210 217 611
dspm_mult_4x4x1_f32 - C[4;1] = A[4;4]*B[4;1] 112 160 334 113 121 425
dspm_mult_4x4x4_f32 - C[4;4] = A[4;4]*B[4;4] 405 191 1008 404 331 1335
             
Image processing prototypes            
dspi_dotprod_s8/u8 - dotproduct of two images 16x16 3827 179 3828 4010 178 4011
dspi_dotprod_off_s8/u8 - dotproduct of two images 16x16 4142 243 4142 4772 244 4774
dspi_dotprod_s8/u8- dotproduct of two images 64x64 58069 704 58068 58825 706 58826
dspi_dotprod_off_s8/u8 - dotproduct of two images 64x64 62365 1010 62366 71233 1010 71062
dspi_dotprod_s16/u16 - dotproduct of two images 8x8 1455 162 1453 1804 302 1806
dspi_dotprod_off_s16/u16 - dotproduct of two images 8x8 1529 429 1531 2074 363 2074
dspi_dotprod_s16 - dotproduct of two images 32x32 20190 424 20029 25300 425 25301
dspi_dotprod_off_s16/u16 - dotproduct of two images 32x32 21090 578 21089 29432 576 29432

The benchmark test could be reproduced by executing test cases found in test/test_dsp.c.