You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
248 lines
10 KiB
248 lines
10 KiB
.TH SIMD-VITERBI 3
|
|
.SH NAME
|
|
create_viterbi27, set_viterbi27_polynomial, init_viterbi27, update_viterbi27_blk,
|
|
chainback_viterbi27, delete_viterbi27,
|
|
create_viterbi29, set_viterbi_29_polynomial, init_viterbi29, update_viterbi29_blk,
|
|
chainback_viterbi29, delete_viterbi29,
|
|
create_viterbi39, set_viterbi_39_polynomial, init_viterbi39, update_viterbi39_blk,
|
|
chainback_viterbi39, delete_viterbi39,
|
|
create_viterbi615, set_viterbi615_polynomial, init_viterbi615, update_viterbi615_blk,
|
|
chainback_viterbi615, delete_viterbi615 -\ IA32 SIMD-assisted Viterbi decoders
|
|
.SH SYNOPSIS
|
|
.nf
|
|
.ft B
|
|
#include "fec.h"
|
|
void *create_viterbi27(int blocklen);
|
|
void set_viterbi27_polynomial(int polys[2]);
|
|
int init_viterbi27(void *vp,int starting_state);
|
|
int update_viterbi27_blk(void *vp,unsigned char syms[],int nbits);
|
|
int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
|
|
void delete_viterbi27(void *vp);
|
|
.fi
|
|
.sp
|
|
.nf
|
|
.ft B
|
|
void *create_viterbi29(int blocklen);
|
|
void set_viterbi29_polynomial(int polys[2]);
|
|
int init_viterbi29(void *vp,int starting_state);
|
|
int update_viterbi29_blk(void *vp,unsigned char syms[],int nbits);
|
|
int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
|
|
void delete_viterbi29(void *vp);
|
|
.fi
|
|
.sp
|
|
.nf
|
|
.ft B
|
|
void *create_viterbi39(int blocklen);
|
|
void set_viterbi39_polynomial(int polys[3]);
|
|
int init_viterbi39(void *vp,int starting_state);
|
|
int update_viterbi39_blk(void *vp,unsigned char syms[],int nbits);
|
|
int chainback_viterbi39(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
|
|
void delete_viterbi39(void *vp);
|
|
.fi
|
|
.sp
|
|
.nf
|
|
.ft B
|
|
void *create_viterbi615(int blocklen);
|
|
void set_viterbi615_polynomial(int polys[6]);
|
|
int init_viterbi615(void *vp,int starting_state);
|
|
int update_viterbi615_blk(void *vp,unsigned char syms[],int nbits);
|
|
int chainback_viterbi615(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
|
|
void delete_viterbi615(void *vp);
|
|
.fi
|
|
.SH DESCRIPTION
|
|
These functions implement high performance Viterbi decoders for four
|
|
convolutional codes: a rate 1/2 constraint length 7 (k=7) code
|
|
("viterbi27"), a rate 1/2 k=9 code ("viterbi29"),
|
|
a rate 1/3 k=9 code ("viterbi39") and a rate 1/6 k=15 code ("viterbi615").
|
|
The decoders use the Intel IA32 or PowerPC SIMD instruction sets, if available, to improve
|
|
decoding speed.
|
|
|
|
On the IA32 there are three different SIMD instruction sets. The first
|
|
and most common is MMX, introduced on later Intel Pentiums and then on
|
|
the Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe,
|
|
etc). SSE was introduced on the Pentium III and later implemented in
|
|
the AMD Athlon 4 (AMD calls it "3D Now! Professional"). Most
|
|
recently, SSE2 was introduced in the Intel Pentium 4, and has been
|
|
adopted by more recent AMD CPUs. The presence of SSE2 implies the
|
|
existence of SSE, which in turn implies MMX.
|
|
|
|
Altivec is the PowerPC SIMD instruction set. It is roughly comparable
|
|
to SSE2. Altivec was introduced to the general public in the Apple
|
|
Macintosh G4; it is also present in the G5. Altivec is actually a
|
|
Motorola trademark; Apple calls it "Velocity Engine" and IBM calls it
|
|
"VMX". All refer to the same thing.
|
|
|
|
When built for the IA32 or PPC architectures, the functions
|
|
automatically use the most powerful SIMD instruction set available. If
|
|
no SIMD instructions are available, or if the library is built for a
|
|
non-IA32, non-PPC machine, a portable C version is executed
|
|
instead.
|
|
|
|
.SH USAGE
|
|
Four versions of each function are provided, one for each code.
|
|
In the following discussion, change "viterbi" to "viterbi27", "viterbi29", "viterbi39"
|
|
or "viterbi615" as desired.
|
|
|
|
Before Viterbi decoding can begin, an instance must first be created with
|
|
\fBcreate_viterbi()\fR. This function creates and returns a pointer to
|
|
an internal control structure
|
|
containing the path metrics and the branch
|
|
decisions. \fBcreate_viterbi()\fR takes one argument that gives the
|
|
length of the data block in bits. You \fImust not\fR attempt to
|
|
decode a block longer than the length given to \fBcreate_viterbi()\fR.
|
|
|
|
Before decoding a new frame,
|
|
\fBinit_viterbi()\fR must be called to reset the decoder state.
|
|
It accepts the instance pointer returned by
|
|
\fBcreate_viterbi()\fR and the initial starting state of the
|
|
convolutional encoder (usually 0). If the initial starting state is unknown or
|
|
incorrect, the decoder will still function but the decoded data may be
|
|
incorrect at the start of the block.
|
|
|
|
Blocks of received symbols are processed with calls to
|
|
\fBupdate_viterbi_blk()\fR. The \fBnbits\fR parameter specifies the
|
|
number of \fIdata bits\fR (not channel symbols) represented by the
|
|
\fBsyms\fR buffer. (For rate 1/2 codes, the number of symbols in
|
|
\fBsyms\fR is twice \fInbits\fR, and so on.)
|
|
Each symbol is expected to range
|
|
from 0 through 255, with 0 corresponding to a "strong 0" and 255
|
|
corresponding to a "strong 1". The caller is responsible for
|
|
determining the proper pairing of input symbols (commonly known as
|
|
decoder symbol phasing).
|
|
|
|
At the end of the block, the data is recovered with a call to
|
|
\fBchainback_viterbi()\fR. The arguments are the pointer to the
|
|
decoder instance, a pointer to a user-supplied buffer into which the
|
|
decoded data is to be written, the number of data bits (not bytes)
|
|
that are to be decoded, and the terminal state of the convolutional
|
|
encoder at the end of the frame (usually 0). If the terminal state is
|
|
incorrect or unknown, the decoded data bits at the end of the frame
|
|
may be unreliable. The decoded data is written in big-endian order,
|
|
i.e., the first bit in the frame is written into the high order bit of
|
|
the first byte in the buffer. If the frame is not an integral number
|
|
of bytes long, the low order bits of the last byte in the frame will
|
|
be unused.
|
|
|
|
Note that the decoders assume the use of a tail, i.e., the encoding
|
|
and transmission of a sufficient number of padding bits beyond the end
|
|
of the user data to force the convolutional encoder into the known
|
|
terminal state given to \fBchainback_viterbi()\fR. The tail is
|
|
always one bit less than the constraint length of the code, so the k=7
|
|
code uses 6 tail bits (12 tail symbols), the k=9 code uses 8 tail bits
|
|
(16 tail symbols) and the k=15 code uses 14 tail bits (84 tail
|
|
symbols).
|
|
|
|
The tail bits are not included in the length arguments to
|
|
\fBcreate_viterbi()\fR and \fBchainback_viterbi()\fR. For example, if
|
|
the block contains 1000 user bits, then this would be the length
|
|
parameter given to \fBcreate_viterbi27()\fR and
|
|
\fBchainback_viterbi27()\fR, and \fBupdate_viterbi27_blk()\fR would be called
|
|
with a total of 2012 symbols - the last 12 encoded symbols
|
|
representing the tail bits.
|
|
|
|
After the call to \fBchainback_viterbi()\fR, the decoder may be reset
|
|
with a call to \fBinit_viterbi()\fR and another block can be decoded.
|
|
Alternatively, \fBdelete_viterbi()\fR can be called to free all resources
|
|
used by the Viterbi decoder.
|
|
|
|
The \fBset_viterbi_polynomial()\fR function allows use of other than the default
|
|
code generator polynomials. Although only one set of polynomials are generally
|
|
used with each code, there can are different conventions as to their order and
|
|
symbol polarity, and these functions simplifies their use.
|
|
|
|
The default polynomials for the viterbi27 routes
|
|
are those of the NASA-JPL convention \fIwithout\fR symbol inversion.
|
|
The NASA-JPL convention normally inverts the first symbol.
|
|
The CCSDS/NASA-GSFC convention swaps the two symbols and inverts the second.
|
|
.sp
|
|
To set the NASA-JPL convention with symbol inversion:
|
|
.sp
|
|
.nf
|
|
.ft B
|
|
int polys[2] = { -V27POLYA,V27POLYB };
|
|
set_viterbi27_polynomial(polys);
|
|
.ft R
|
|
.fi
|
|
.sp
|
|
and to set the CCSDS convention with symbol inversion:
|
|
.sp
|
|
.nf
|
|
.ft B
|
|
int polys[2] = { V27POLYB,-V27POLYA };
|
|
set_viterbi27_polynomial(polys);
|
|
.ft R
|
|
.fi
|
|
.sp
|
|
The default polynomials for the viterbi615 routines
|
|
are those used by the Cassini spacecraft \fIwithout\fR
|
|
symbol inversion. Mars Pathfinder (MPF) and STEREO
|
|
swap the third and fourth polynomials.
|
|
Both conventions invert the
|
|
first, third and fifth symbols. Refer to fec.h for the polynomial constant definitions.
|
|
.sp
|
|
To set the Cassini convention with symbol inversion, do the following:
|
|
|
|
.nf
|
|
.ft B
|
|
int polys[6] = { -V615POLYA,V615POLYB,-V615POLYC,V615POLYD,-V615POLYE,V615POLYF };
|
|
set_viterbi615_polynomial(polys);
|
|
.ft R
|
|
.fi
|
|
.sp
|
|
and to set the MPF/STEREO convention with symbol inversion:
|
|
.sp
|
|
.nf
|
|
.ft B
|
|
int polys[6] = { -V615POLYA,V615POLYB,-V615POLYD,V615POLYC,-V615POLYE,V615POLYF };
|
|
set_viterbi615_polynomial(polys);
|
|
.ft R
|
|
.fi
|
|
|
|
For performance reasons, calling this function changes the code
|
|
generator polynomials for \fIall\fR instances of corresponding Viterbi decoder,
|
|
including those already created.
|
|
|
|
.SH ERROR PERFORMANCE
|
|
These decoders have all been extensively tested and found to provide
|
|
performance consistent with that expected for soft-decision Viterbi
|
|
decoding with 8-bit symbols.
|
|
|
|
Due to internal differences, the implementations
|
|
vary slightly in error performance. In
|
|
general, the portable C versions exhibit the best error performance
|
|
because they use full-sized branch metrics, and the MMX versions
|
|
exhibit the worst because they use 8-bit branch metrics with modulo
|
|
comparisons. The SSE, SSE2 and Altivec implementations of the r=1/2 k=7 and
|
|
r=1/2 k=9 codes use unsigned
|
|
8-bit branch metrics, and are almost as good as the C versions. The
|
|
r=1/3 k=9 and r=1/6 k=15 codes are implemented with 16-bit path metrics in all SIMD
|
|
versions.
|
|
|
|
.SH DIRECT ACCESS TO SPECIFIC FUNCTION VERSIONS
|
|
Calling the functions listed above automatically calls the appropriate
|
|
version of the function depending on the CPU type and available SIMD
|
|
instructions. A particular version can also be called directly by
|
|
appending the appropriate suffix to the function name. The available
|
|
suffixes are "_mmx", "_sse", "_sse2", "_av" and "_port", for the MMX,
|
|
SSE, SSE2, Altivec and portable versions, respectively. For example,
|
|
the SSE2 version of the update_viterbi27_blk() function can be invoked
|
|
as update_viterbi27_blk_sse2().
|
|
|
|
Naturally, the _av functions are only available on the PowerPC and the
|
|
_mmx, _sse and _sse2 versions are only available on IA-32. Calling
|
|
a SIMD-enabled function on a CPU that doesn't support the appropriate
|
|
set of instructions will result in an illegal instruction exception.
|
|
|
|
.SH RETURN VALUES
|
|
\fBcreate_viterbi\fR returns a pointer to the structure containing
|
|
the decoder state.
|
|
The other functions return -1 on error, 0 otherwise.
|
|
|
|
.SH AUTHOR & COPYRIGHT
|
|
Phil Karn, KA9Q (karn@ka9q.net)
|
|
|
|
.SH LICENSE
|
|
This software may be used under the terms of the GNU Limited General Public License (LGPL).
|
|
|
|
|