November 12, 2011

Parallel programming Optimization tricks

Comparision of 2 __m128i:
  • SSE 4.1:
#define iszero(a) _mm_testz_si128(a, a)
#define isequal(a, b) _mm_testc_si128(_mm_cmpeq_epi32(a, b), gAllOnes)
  • Others:
#define iszero(a) (_mm_movemask_epi8(_mm_cmpeq_epi32(a, _mm_setzero_si128())) == 0xffff)
#define isequal(a, b) (_mm_movemask_epi8(_mm_cmpeq_epi32(a, b)) == 0xffff)


Number of trailing zeros of an int:
  • intrinsic: _bit_scan_forward()
  • corresponding asm instruction: bsfl

No comments:

Post a Comment