INTERNATIONAL TELECOMMUNICATION UNION
)45
4 ’����
TELECOMMUNICATION (03/96)
STANDARDIZATION SECTOR
OF ITU
’%.%2!,��!30%#43��/&��$)’)4!,��42!.3-)33)/.
3934%-3
#/$).’��/&��30%%#(��!4�����KBIT�S
53).’��#/.*5’!4%
3425#452%
!,’%"2!)#
#/$%
%8#)4%$
,).%!2
02%$)#4)/.���#3
!#%,0
)45
4��Recommendation��’����
(Previously “CCITT Recommendation”)
FOREWORD
The ITU-T (Telecommunication Standardization Sector) is a permanent organ of the International Telecommunication
Union (ITU). The ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommen-
dations on them with a view to standardizing telecommunications on a worldwide basis.
The World Telecommunication Standardization Conference (WTSC), which meets every four years, establishes the
topics for study by the ITU-T Study Groups which, in their turn, produce Recommendations on these topics.
The approval of Recommendations by the Members of the ITU-T is covered by the procedure laid down in WTSC
Resolution No. 1 (Helsinki, March 1-12, 1993).
ITU-T Recommendation G.729 was prepared by ITU-T Study Group 15 (1993-1996) and was approved under the
WTSC Resolution No. 1 procedure on the 19th of March 1996.
___________________
NOTE
In this Recommendation, the expression “Administration” is used for conciseness to indicate both a telecommunication
administration and a recognized operating agency.
ITU 1996
All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or
mechanical, including photocopying and microfilm, without permission in writing from the ITU.
Recommendation G.729 (03/96) i
CONTENTS
Recommendation G.729 (03/96)
Page
1 Introduction .................................................................................................................................................... 1
2 General description of the coder .................................................................................................................... 1
2.1 Encoder ............................................................................................................................................. 2
2.2 Decoder............................................................................................................................................. 3
2.3 Delay................................................................................................................................................. 4
2.4 Speech coder description .................................................................................................................. 4
2.5 Notational conventions ..................................................................................................................... 4
3 Functional description of the encoder ............................................................................................................ 7
3.1 Pre-processing .................................................................................................................................. 7
3.2 Linear prediction analysis and quantization ..................................................................................... 7
3.3 Perceptual weighting ........................................................................................................................ 14
3.4 Open-loop pitch analysis .................................................................................................................. 15
3.5 Computation of the impulse response............................................................................................... 16
3.6 Computation of the target signal....................................................................................................... 16
3.7 Adaptive-codebook search................................................................................................................ 17
3.8 Fixed codebook – Structure and search ............................................................................................ 19
3.9 Quantization of the gains .................................................................................................................. 22
3.10 Memory update................................................................................................................................. 24
4 Functional description of the decoder ............................................................................................................ 25
4.1 Parameter decoding procedure.......................................................................................................... 25
4.2 Post-processing ................................................................................................................................. 28
4.3 Encoder and decoder initialization ................................................................................................... 30
4.4 Concealment of frame erasures......................................................................................................... 30
5 Bit-exact description of the CS-ACELP coder............................................................................................... 32
5.1 Use of the simulation software ......................................................................................................... 32
5.2 Organization of the simulation software........................................................................................... 32
Recommendation G.729 (03/96) 1
Recommendation G.729
Recommendation G.729 (03/96)
CODING OF SPEECH AT 8 kbit/s USING CONJUGATE-STRUCTURE
ALGEBRAIC-CODE-EXCITED LINEAR-PREDICTION (CS-ACELP)
(Geneva, 1996)
1 Introduction
This Recommendation contains the description of an algorithm for the coding of speech signals at 8 kbit/s using
Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP).
This coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering
(Recommendation G.712) of the analogue input signal, then sampling it at 8000 Hz, followed by conversion to 16-bit
linear PCM for the input to the encoder. The output of the decoder should be converted back to an analogue signal by
similar means. Other input/output characteristics, such as those specified by Recommendation G.711 for 64 kbit/s PCM
data, should be converted to 16-bit linear PCM before encoding, or from 16-bit linear PCM to the appropriate format
after decoding. The bitstream from the encoder to the decoder is defined within this Recommendation.
This Recommendation is organized as follows: Clause 2 gives a general outline of the CS-ACELP algorithm. In
clauses 3 and 4, the CS-ACELP encoder and decoder principles are discussed, respectively. Clause 5 describes the
software that defines this coder in 16 bit fixed-point arithmetic.
2 General description of the coder
The CS-ACELP coder is based on the Code-Excited Linear-Prediction (CELP) coding model. The coder operates on
speech frames of 10 ms corresponding to 80 samples at a sampling rate of 8000 samples per second. For every 10 ms
frame, the speech signal is analysed to extract the parameters of the CELP model (linear-prediction filter coefficients,
adaptive and fixed-codebook indices and gains). These parameters are encoded and transmitted. The bit allocation of the
coder parameters is shown in Table 1. At the decoder, these parameters are used to retrieve the excitation and synthesis
filter parameters. The speech is reconstructed by filtering this excitation through the short-term synthesis filter, as is
shown in Figure 1. The short-term synthesis filter is based on a 10th order Linear Prediction (LP) filter. The long-term,
or pitch synthesis filter is implemented using the so-called adaptive-codebook approach. After computing the
reconstructed speech, it is further enhanced by a postfilter.
TABLE 1/G.729
Bit allocation of the 8 kbit/s CS-ACELP algorithm (10 ms frame)
Parameter Codeword Subframe 1 Subframe 2 Total per frame
Line spectrum pairs L0, L1, L2, L3 18
Adaptive-codebook delay P1, P2 8 5 13
Pitch-delay parity P0 1 1
Fixed-codebook index C1, C2 13 13 26
Fixed-codebook sign S1, S2 4 4 8
Codebook gains (stage 1) GA1, GA2 3 3 6
Codebook gains (stage 2) GB1, GB2 4 4 8
Total 80
2 Recommendation G.729 (03/96)
T1518640-95/d01
Parameter decoding
Received bitstream
Excitation
codebook
Long-term
synthesis
filter
Post
filter
Output
speech
FIGURE 1/G.729
Block diagram of conceptual CELP synthesis model
Short-term
synthesis
filter
FIGURE 1/G.729...[D01] = 5 CM
2.1 Encoder
The encoding principle is shown in Figure 2. The input signal is high-pass filtered and scaled in the pre-processing
block. The pre-processed signal serves as the input signal for all subsequent analysis. LP analysis is done once per 10 ms
frame to compute the LP filter coefficients. These coefficients are converted to Line Spectrum Pairs (LSP) and quantized
using predictive two-stage Vector Quantization (VQ) with 18 bits. The excitation signal is chosen by using an analysis-
by-synthesis search procedure in which the error between the original and reconstructed speech is minimized according
to a perceptually weighted distortion measure. This is done by filtering the error signal with a perceptual weighting filter,
whose coefficients are derived from the unquantized LP filter. The amount of perceptual weighting is made adaptive to
improve the performance for input signals with a flat frequency-response.
The excitation parameters (fixed and adaptive-codebook parameters) are determined per subframe of 5 ms (40 samples)
each. The quantized and unquantized LP filter coefficients are used for the second subframe, while in the first subframe
interpolated LP filter coefficients are used (both quantized and unquantized). An open-loop pitch delay is estimated once
per 10 ms frame based on the perceptually weighted speech signal. Then the following operations are repeated for each
subframe. The target signal x(n) is computed by filtering the LP residual through the weighted synthesis filter W(z)/Â(z).
The initial states of these filters are updated by filtering the error between LP residual and excitation. This is equivalent
to the common approach of subtracting the zero-input response of the weighted synthesis filter from the weighted speech
signal. The impulse response h(n) of the weighted synthesis filter is computed. Closed-loop pitch analysis is then done
(to find the adaptive-codebook delay and gain), using the target x(n) and impulse response h(n), by searching around the
value of the open-loop pitch delay. A fractional pitch delay with 1/3 resolution is used. The pitch delay is encoded with
8 bits in the first subframe and differentially encoded with 5 bits in the second subframe. The target signal x(n) is
updated by subtracting the (filtered) adaptive-codebook contribution, and this new target, x′(n), is used in the fixed-
codebook search to find the optimum excitation. An algebraic codebook with 17 bits is used for the fixed-codebook
excitation. The gains of the adaptive and fixed-codebook contributions are vector quantized with 7 bits, (with MA
prediction applied to the fixed-codebook gain). Finally, the filter memories are updated using the determined excitation
signal.
Recommendation G.729 (03/96) 3
T1518650-95/D02
Input
speech
Pre-
processing
Fixed
codebook
Synthesis
filter
Pitch
analysis
LPC info
Perceptual
weighting
Gain
quantization
Parameter
encoding
Transmitted
bitstream
LP analysis
quantization
interpolation
Adaptive
codebook
Fixed CB
search
LPC info
GP
GC
FIGURE 2/G.729
Encoding principle of the CS-ACELP encoder
LPC info
FIGURE 2/G.729...[D02] = 16 CM
2.2 Decoder
The decoder principle is shown in Figure 3. First, the parameter’s indices are extracted from the received bitstream.
These indices are decoded to obtain the coder parameters corresponding to a 10 ms speech frame. These parameters are
the LSP coefficients, the two fractional pitch delays, the two fixed-codebook vectors, and the two sets of adaptive and
fixed-codebook gains. The LSP coefficients are interpolated and converted to LP filter coefficients for each subframe.
Then, for each 5 ms subframe the following steps are done:
• the excitation is constructed by adding the adaptive and fixed-codebook vectors scaled by their respective
gains;
• the speech is reconstructed by filtering the excitation through the LP synthesis filter;
• the reconstructed speech signal is passed through a post-processing stage, which includes an adaptive
postfilter based on the long-term and short-term synthesis filters, followed by a high-pass filter and
scaling operation.
4 Recommendation G.729 (03/96)
T1518660-95/d03
Fixed
codebook
Short-term
filter
Post-
processing
GC
GP
Adaptive
codebook
FIGURE 3/G.729
Principle of the CS-ACELP decoder
FIGURE 3/G.729...[D03] = 7 CM
2.3 Delay
This coder encodes speech and other audio signals with 10 ms frames. In addition, there is a look-ahead of 5 ms,
resulting in a total algorithmic delay of 15 ms. All additional delays in a practical implementation of this coder are due
to:
• processing time needed for encoding and decoding operations;
• transmission time on the communication link;
• multiplexing delay when combining audio data with other data.
2.4 Speech coder description
The description of the speech coding algorithm of this Recommendation is made in terms of bit-exact, fixed-point
mathematical operations. The ANSI C code indicated in clause 5, which constitutes an integral part of this
Recommendation, reflects this bit-exact, fixed-point descriptive approach. The mathematical descriptions of the encoder
(clause 3), and decoder (clause 4), can be implemented in several other fashions, possibly leading to a codec
implementation not complying with this Recommendation. Therefore, the algorithm description of the ANSI C code of
clause 5 shall take precedence over the mathematical descriptions of clauses 3 and 4 whenever discrepancies are found.
A non-exhaustive set of test signals, which can be used with ANSI C code, are available from the ITU.
2.5 Notational conventions
Throughout this Recommendation, it is tried to maintain the following notational conventions:
• Codebooks are denoted by caligraphic characters (e.g. ).
• Time signals are denoted by their symbol and a sample index between parenthesis [e.g. s(n)]. The symbol
n is used as sample index.
• Superscript indices between parenthesis (e.g. g(m) are used to indicate time-dependency of variables. The
variable m refers, depending on the context, to either a frame or subframe index, and the variable n to a
sample index.
• Recursion indices are identified by a superscript between square brackets (e.g. E[k]).
• Subscripts indices identify a particular element in a coefficient array.
• The symbol ^ identifies a quantized version of a parameter (e.g. gc^ ).
• Parameter ranges are given between square brackets, and include the boundaries (e.g. [0.6, 0.9]).
Recommendation G.729 (03/96) 5
• The function log denotes a logarithm with base 10.
• The function int denotes truncation to its integer value.
• The decimal floating-point numbers used are rounded versions of the values used in the 16 bit fixed-point
ANSI C implementation.
Table 2 lists the most relevant symbols used throughout this Recommendation. A glossary of the most relevant signals is
given in Table 3. Table 4 summarizes relevant variables and their dimension. Constant parameters are listed in Table 5.
The acronyms used in this Recommendation are summarized in Table 6.
TABLE 2/G.729
Glossary of most relevant symbols
TABLE 3/G.729
Glossary of most relevant signals
Name Reference Description
1/Â(z) Equation (2) LP synthesis filter
Hh1(z) Equation (1) Input high-pass filter
Hp(z) Equation (78) Long-term postfilter
Hf (z) Equation (84) Short-term postfilter
Ht(z) Equation (86) Tilt-compensation filter
Hh2(z) Equation (91) Output high-pass filter
P(z) Equation (46) Pre-filter for fixed codebook
W(z) Equation (27) Weighting filter
Name Reference Description
c(n) 3.8 Fixed-codebook contribution
d(n) 3.8.1 Correlation between target signal and h(n)
ew(n) 3.10 Error signal
h(n) 3.5 Impulse response of weighting and synthesis filters
r(n) 3.6 Residual signal
s(n) 3.1 Pre-processed speech signal
s^(n) 4.1.6 Reconstructed speech signal
s′(n) 3.2.1 Windowed speech signal
sf(n) 4.2 Postfiltered output
sf ′(n) 4.2 Gain-scaled postfiltered output
sw(n) 3.6 Weighted speech signal
x(n) 3.6 Target signal
x′(n) 3.8.1 Second target signal
u(n) 3.10 Excitation to LP synthesis filter
v(n) 3.7.1 Adaptive-codebook contribution
y(n) 3.7.3 Convolution v(n) * h(n)
z(n) 3.9 Convolution c(n) * h(n)
6 Recommendation G.729 (03/96)
TABLE 4/G.729
Glossary of most relevant variables
TABLE 5/G.729
Glossary of most relevant constants
Name Size Description
gp 1 Adaptive-codebook gain
gc 1 Fixed-codebook gain
gl 1 Gain term for long-term postfilter
gf 1 Gain term for short-term postfilter
gt 1 Gain term for tilt postfilter
G 1 Gain for gain normalization
Top 1 Open-loop pitch delay
ai 11 LP coefficients (a0 = 1.0)
ki 10 Reflection coefficients
k′1 1 Reflection coefficient for tilt postfilter
oi 2 LAR coefficients
ωi 10 LSF normalized frequencies
p^i, j 40 MA predictor for LSF quantization
qi 10 LSP coefficients
r(k) 11 Auto-correlation coefficients
r′(k) 11 Modified auto-correlation coefficients
wi 10 LSP weighting coefficients
l^i 10 LSP quantizer output
Name Value Description
fs 8000 Sampling frequency
f0 60 Bandwidth expansion
γ1 0.94/0.98 Weight factor perceptual weighting filter
γ2 0.60/[0.4 − 0.7] Weight factor perceptual weighting filter
γn 0.55 Weight factor postfilter
γd 0.70 Weight factor postfilter
γp 0.50 Weight factor pitch postfilter
γt 0.90/0.2 Weight factor tilt postfilter
Table 7 Fixed (algebraic) codebook
L0 3.2.4 Moving-average predictor codebook
L1 3.2.4 First stage LSP codebook
L2 3.2.4 Second stage LSP codebook (low part)
L3 3.2.4 Second stage LSP codebook (high part)
3.9 Gain codebook (first stage)
3.9 Gain codebook (second stage)
wlag Equation (6) Correlation lag window
wlp Equation (3) LP analysis window
Recommendation G.729 (03/96) 7
TABLE 6/G.729
Glossary of acronyms
3 Functional description of the encoder
In this clause the different functions of the encoder represented in the blocks of Figure 2 are described. A detailed signal
flow is shown in Figure 4.
3.1 Pre-processing
As stated in clause 2, the input to the speech encoder is assumed to be a 16 bit PCM signal. Two pre-processing
functions are applied before the encoding process:
1) signal scaling; and
2) high-pass filtering.
The scaling consists of dividing the input by a factor 2 to reduce the possibility of overflows in the fixed-point
implementation. The high-pass filter serves as a precaution against undesired low-frequency components. A second
order pole/zero filter with a cut-off frequency of 140 Hz is used. Both the scaling and high-pass filtering are combined
by dividing the coefficients at the numerator of this filter by 2. The resulting filter is given by:
Hh1(z) = 0.46363718 − 0.92724705z
−1
+ 0.46363718z −2
1 − 1.9059465z −1 + 0.9114024z −2
(1)
The input signal filtered through Hh1(z) is referred to as s(n), and will be used in all subsequent coder operations.
3.2 Linear prediction analysis and quantization
The short-term analysis and synthesis filters are based on 10th order Linear Prediction (LP) filters.
The LP synthesis filter is defined as:
1
Â(z) =
1
1 + ∑i = 110 âi z − i
(2)
where âi, i = 1,...,10, are the (quantized) Linear Prediction (LP) coefficients. Short-term prediction, or linear prediction
analysis is performed once per speech frame using the autocorr
本文档为【G.729音频压缩传输协议(ip电话)】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。