Image Processing with Intel’s SSE SIMD instructions for 12-bit images

Over the last few days I have been working on implementing some low-level 12-bit image processing functions using Intel’s SIMD instruction set – SSE. The aim here is to increase processing time performance as much as possible – initial results are very encouraging, those 128bit register really get things to scream along!

It has been a while since I have done such work on Intel devices (ARM and hence ARM NEON is more common on IoT projects), the last time was before MMX ‘Intrinsics’ were invented and involved hand coding MMX instructions with associated support assembly language. Intrinsics really make coding this stuff so much simpler!

Very often it doesn’t pay for a software engineer to hand-code and optimise image processing algorithms, but when maximum performance is required, huge speed gains can be made with some careful coding on standard hardware!