|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to GCC's wiki:
If you specify command-line switches such as -msse, the compiler
could use the extended instruction sets even if the built-ins are
not used explicitly in the program. For this reason, applications
that perform run-time CPU detection must compile separate files
for each supported architecture, using the appropriate flags. In
particular, the file containing the CPU detection code should be
compiled without these options.
So, this change introduces a separate Viterbi implementation,
which is almost the same as previous one, but is being compiled
with -mavx2. This implementation will be only used by CPUs with
both SSE and AVX support:
SSE3 and AVX2: viterbi_sse_avx.c
SSE3 only: viterbi_sse.c
Generic: viterbi_generic.c
Change-Id: I042cc76258df7e4c6c90a73af3d0a6e75999b2b0
|
|
Fast convolutional decoding is provided through x86 intrinsic based
SSE operations. SSE3, found on virtually all modern x86 processors,
is the minimal requirement. SSE4.1 and AVX2 are used if available.
Also, the original code was extended with runtime SIMD detection,
so only supported extensions will be used by target CPU. It makes
the library more partable, what is very important for binary
packages distribution. Runtime SIMD detection is currently
implemented through the __builtin_cpu_supports call.
Change-Id: I1da6d71ed0564f1d684f3a836e998d09de5f0351
|