summaryrefslogtreecommitdiffstats
path: root/src/viterbi.c
Commit message (Collapse)AuthorAgeFilesLines
* core/conv: do not mix up AVX and SSE codeVadim Yanitskiy2017-05-291-63/+84
| | | | | | | | | | | | | | | | | | | | | | | According to GCC's wiki: If you specify command-line switches such as -msse, the compiler could use the extended instruction sets even if the built-ins are not used explicitly in the program. For this reason, applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options. So, this change introduces a separate Viterbi implementation, which is almost the same as previous one, but is being compiled with -mavx2. This implementation will be only used by CPUs with both SSE and AVX support: SSE3 and AVX2: viterbi_sse_avx.c SSE3 only: viterbi_sse.c Generic: viterbi_generic.c Change-Id: I042cc76258df7e4c6c90a73af3d0a6e75999b2b0
* core/conv: add x86 SSE support for Viterbi decoderTom Tsou2017-05-241-10/+111
| | | | | | | | | | | | | | Fast convolutional decoding is provided through x86 intrinsic based SSE operations. SSE3, found on virtually all modern x86 processors, is the minimal requirement. SSE4.1 and AVX2 are used if available. Also, the original code was extended with runtime SIMD detection, so only supported extensions will be used by target CPU. It makes the library more partable, what is very important for binary packages distribution. Runtime SIMD detection is currently implemented through the __builtin_cpu_supports call. Change-Id: I1da6d71ed0564f1d684f3a836e998d09de5f0351
* core/conv: strip unused memalign() callVadim Yanitskiy2017-05-071-10/+1
| | | | | | | | | The alligned memory allocation is only required for SSE, which is currently unsupported. Moreover, it's better to use dedicated _mm_malloc() and _mm_free() from xmmintrin.h instead, which are introduced by Intel specifically for SIMD computations. Change-Id: Ide764d1c643527323334ef14335be7f8915f7622
* core/conv: implement optimized Viterbi decoderTom Tsou2017-04-111-0/+602
Add a separate, faster convolution decoding implementation for rates up to N=4 and constraint lengths of K=5 and K=7, which covers the most GSM code uses. The decoding algorithm exploits the symmetric structure of the Viterbi add-compare-select (ACS) operation - commonly known as the ACS butterfly. This shift-register optimization can be found in the well-known text by Dave Forney. Forney, G.D., "The Viterbi Algorithm," Proc. of the IEEE, March 1973. Implementation is non-architecture specific and improves performance on x86 as well as ARM processors. Existing API is unchanged with optimized code being called internally for supported codes. The original code was relicensed under GPLv2-or-later with permission of copyright holder - Tom Tsou. Change-Id: I74d355274b4176a7d924f91ef3c96912ce338fb2