Espressif DSP Library Benchmarks¶
The table bellow contains benchmarks of functions provided by ESP-DSP library. The values are CPU cycle counts taken to execute each of the functions. The Values in the column “O2” are made with compiler optimization for speed, and in the column “Os” column are made with compiler optimization for size. The values in “ESP32” and “ESP32S3” column are for the optimized (assembly) implementation, values in “ANSI” column are for the non-optimized implementation.
Function Optimization |
ESP32
|
ESP32S3
|
ANSI
|
ESP32
|
ESP32S3
|
ANSI
|
---|---|---|---|---|---|---|
Dot Product | ||||||
dsps_dotprod_f32 for N=256 points | 1058 | 589 | 1325 | 1058 | 448 | 4129 |
dsps_dotprode_f32 for N=256 points with step 1 | 1317 | 1325 | 2853 | 1317 | 1325 | 3621 |
dsps_dotprod_s16 for N=256 points | 447 | 325 | 3647 | 447 | 323 | 6466 |
FIR Filters | ||||||
dsps_fir_f32 1024 input samples and 256 coefficients | 1079312 | 1074223 | 2150685 | 1079599 | 1074222 | 5147785 |
dsps_fird_f32 1024 samples 256 coeffs and decimation 4 | 350915 | 347436 | 614234 | 350520 | 347607 | 1317367 |
FFTs Radix-2 32 bit Floating Point | ||||||
dsps_fft2r_fc32 for 64 complex points | 6079 | 5142 | 7037 | 5452 | 5142 | 8333 |
dsps_fft2r_fc32 for 128 complex points | 13031 | 11707 | 15907 | 12399 | 11707 | 19035 |
dsps_fft2r_fc32 for 256 complex points | 27828 | 26303 | 35562 | 27828 | 26303 | 42922 |
dsps_fft2r_fc32 for 512 complex points | 61753 | 58435 | 78705 | 61753 | 58435 | 95673 |
dsps_fft2r_fc32 for 1024 complex points | 135742 | 128586 | 172664 | 135742 | 128585 | 211252 |
FFTs Radix-4 32 bit Floating Point | ||||||
dsps_fft4r_fc32 for 64 complex points | 3125 | 3262 | 5185 | 3247 | 3174 | 5631 |
dsps_fft4r_fc32 for 256 complex points | 15551 | 15450 | 26115 | 16056 | 15789 | 28397 |
dsps_fft4r_fc32 for 1024 complex points | 75547 | 75185 | 127669 | 77587 | 76548 | 138522 |
FFTs 16 bit Fixed Point | ||||||
dsps_fft2r_sc16 for 64 complex points | 8786 | 933 | 14575 | 8786 | 793 | 15861 |
dsps_fft2r_sc16 for 128 complex points | 20214 | 1734 | 33121 | 20214 | 1627 | 36238 |
dsps_fft2r_sc16 for 256 complex points | 45755 | 3507 | 74290 | 45755 | 3430 | 81638 |
dsps_fft2r_sc16 for 512 complex points | 102208 | 7312 | 164803 | 102416 | 7311 | 181758 |
dsps_fft2r_sc16 for 1024 complex points | 225862 | 15641 | 362193 | 225861 | 15640 | 400853 |
IIR Filters | ||||||
dsps_biquad_f32 - biquad filter for 1024 input samples | 17450 | 17458 | 24613 | 17451 | 17459 | 36895 |
Matrix Multiplication | ||||||
dspm_mult_f32 - C[16;16] = A[16;16]*B[16;16] | 24669 | 6298 | 51502 | 24670 | 6529 | 78197 |
dspm_mult_s16 - C[16;16] = A[16;16]*B[16;16] | 24707 | 1847 | 83699 | 24707 | 2047 | 99353 |
dspm_mult_3x3x1_f32 - C[3;1] = A[3;3]*B[3;1] | 79 | 88 | 226 | 80 | 86 | 271 |
dspm_mult_3x3x3_f32 - C[3;3] = A[3;3]*B[3;3] | 211 | 217 | 492 | 210 | 217 | 611 |
dspm_mult_4x4x1_f32 - C[4;1] = A[4;4]*B[4;1] | 112 | 160 | 334 | 113 | 121 | 425 |
dspm_mult_4x4x4_f32 - C[4;4] = A[4;4]*B[4;4] | 405 | 191 | 1008 | 404 | 331 | 1335 |
Image processing prototypes | ||||||
dspi_dotprod_s8/u8 - dotproduct of two images 16x16 | 3827 | 179 | 3828 | 4010 | 178 | 4011 |
dspi_dotprod_off_s8/u8 - dotproduct of two images 16x16 | 4142 | 243 | 4142 | 4772 | 244 | 4774 |
dspi_dotprod_s8/u8- dotproduct of two images 64x64 | 58069 | 704 | 58068 | 58825 | 706 | 58826 |
dspi_dotprod_off_s8/u8 - dotproduct of two images 64x64 | 62365 | 1010 | 62366 | 71233 | 1010 | 71062 |
dspi_dotprod_s16/u16 - dotproduct of two images 8x8 | 1455 | 162 | 1453 | 1804 | 302 | 1806 |
dspi_dotprod_off_s16/u16 - dotproduct of two images 8x8 | 1529 | 429 | 1531 | 2074 | 363 | 2074 |
dspi_dotprod_s16 - dotproduct of two images 32x32 | 20190 | 424 | 20029 | 25300 | 425 | 25301 |
dspi_dotprod_off_s16/u16 - dotproduct of two images 32x32 | 21090 | 578 | 21089 | 29432 | 576 | 29432 |
The benchmark test could be reproduced by executing test cases found in test/test_dsp.c.