首页 G.729音频压缩传输协议(ip电话)

G.729音频压缩传输协议(ip电话)

举报
开通vip

G.729音频压缩传输协议(ip电话) INTERNATIONAL TELECOMMUNICATION UNION )45 4 ’���� TELECOMMUNICATION (03/96) STANDARDIZATION SECTOR OF ITU ’%.%2!,��!30%#43��/&��$)’)4!,��42!.3-)33)/. 3934%-3 #/$).’��/&��30%%#(��!4�����KBIT�S 53).’��#/.*5’!4% 3425#452% !,’%"2!)# #/$% %8#)4%$ ,).%!2 02%...

G.729音频压缩传输协议(ip电话)
INTERNATIONAL TELECOMMUNICATION UNION )45 4 ’���� TELECOMMUNICATION (03/96) STANDARDIZATION SECTOR OF ITU ’%.%2!,��!30%#43��/&��$)’)4!,��42!.3-)33)/. 3934%-3 #/$).’��/&��30%%#(��!4�����KBIT�S 53).’��#/.*5’!4% 3425#452% !,’%"2!)# #/$% %8#)4%$ ,).%!2 02%$)#4)/.���#3 !#%,0 )45 4��Recommendation��’���� (Previously “CCITT Recommendation”) FOREWORD The ITU-T (Telecommunication Standardization Sector) is a permanent organ of the International Telecommunication Union (ITU). The ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommen- dations on them with a view to standardizing telecommunications on a worldwide basis. The World Telecommunication Standardization Conference (WTSC), which meets every four years, establishes the topics for study by the ITU-T Study Groups which, in their turn, produce Recommendations on these topics. The approval of Recommendations by the Members of the ITU-T is covered by the procedure laid down in WTSC Resolution No. 1 (Helsinki, March 1-12, 1993). ITU-T Recommendation G.729 was prepared by ITU-T Study Group 15 (1993-1996) and was approved under the WTSC Resolution No. 1 procedure on the 19th of March 1996. ___________________ NOTE In this Recommendation, the expression “Administration” is used for conciseness to indicate both a telecommunication administration and a recognized operating agency.  ITU 1996 All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from the ITU. Recommendation G.729 (03/96) i CONTENTS Recommendation G.729 (03/96) Page 1 Introduction .................................................................................................................................................... 1 2 General description of the coder .................................................................................................................... 1 2.1 Encoder ............................................................................................................................................. 2 2.2 Decoder............................................................................................................................................. 3 2.3 Delay................................................................................................................................................. 4 2.4 Speech coder description .................................................................................................................. 4 2.5 Notational conventions ..................................................................................................................... 4 3 Functional description of the encoder ............................................................................................................ 7 3.1 Pre-processing .................................................................................................................................. 7 3.2 Linear prediction analysis and quantization ..................................................................................... 7 3.3 Perceptual weighting ........................................................................................................................ 14 3.4 Open-loop pitch analysis .................................................................................................................. 15 3.5 Computation of the impulse response............................................................................................... 16 3.6 Computation of the target signal....................................................................................................... 16 3.7 Adaptive-codebook search................................................................................................................ 17 3.8 Fixed codebook – Structure and search ............................................................................................ 19 3.9 Quantization of the gains .................................................................................................................. 22 3.10 Memory update................................................................................................................................. 24 4 Functional description of the decoder ............................................................................................................ 25 4.1 Parameter decoding procedure.......................................................................................................... 25 4.2 Post-processing ................................................................................................................................. 28 4.3 Encoder and decoder initialization ................................................................................................... 30 4.4 Concealment of frame erasures......................................................................................................... 30 5 Bit-exact description of the CS-ACELP coder............................................................................................... 32 5.1 Use of the simulation software ......................................................................................................... 32 5.2 Organization of the simulation software........................................................................................... 32 Recommendation G.729 (03/96) 1 Recommendation G.729 Recommendation G.729 (03/96) CODING OF SPEECH AT 8 kbit/s USING CONJUGATE-STRUCTURE ALGEBRAIC-CODE-EXCITED LINEAR-PREDICTION (CS-ACELP) (Geneva, 1996) 1 Introduction This Recommendation contains the description of an algorithm for the coding of speech signals at 8 kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP). This coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering (Recommendation G.712) of the analogue input signal, then sampling it at 8000 Hz, followed by conversion to 16-bit linear PCM for the input to the encoder. The output of the decoder should be converted back to an analogue signal by similar means. Other input/output characteristics, such as those specified by Recommendation G.711 for 64 kbit/s PCM data, should be converted to 16-bit linear PCM before encoding, or from 16-bit linear PCM to the appropriate format after decoding. The bitstream from the encoder to the decoder is defined within this Recommendation. This Recommendation is organized as follows: Clause 2 gives a general outline of the CS-ACELP algorithm. In clauses 3 and 4, the CS-ACELP encoder and decoder principles are discussed, respectively. Clause 5 describes the software that defines this coder in 16 bit fixed-point arithmetic. 2 General description of the coder The CS-ACELP coder is based on the Code-Excited Linear-Prediction (CELP) coding model. The coder operates on speech frames of 10 ms corresponding to 80 samples at a sampling rate of 8000 samples per second. For every 10 ms frame, the speech signal is analysed to extract the parameters of the CELP model (linear-prediction filter coefficients, adaptive and fixed-codebook indices and gains). These parameters are encoded and transmitted. The bit allocation of the coder parameters is shown in Table 1. At the decoder, these parameters are used to retrieve the excitation and synthesis filter parameters. The speech is reconstructed by filtering this excitation through the short-term synthesis filter, as is shown in Figure 1. The short-term synthesis filter is based on a 10th order Linear Prediction (LP) filter. The long-term, or pitch synthesis filter is implemented using the so-called adaptive-codebook approach. After computing the reconstructed speech, it is further enhanced by a postfilter. TABLE 1/G.729 Bit allocation of the 8 kbit/s CS-ACELP algorithm (10 ms frame) Parameter Codeword Subframe 1 Subframe 2 Total per frame Line spectrum pairs L0, L1, L2, L3 18 Adaptive-codebook delay P1, P2 8 5 13 Pitch-delay parity P0 1 1 Fixed-codebook index C1, C2 13 13 26 Fixed-codebook sign S1, S2 4 4 8 Codebook gains (stage 1) GA1, GA2 3 3 6 Codebook gains (stage 2) GB1, GB2 4 4 8 Total 80 2 Recommendation G.729 (03/96) T1518640-95/d01 Parameter decoding Received bitstream Excitation codebook Long-term synthesis filter Post filter Output speech FIGURE 1/G.729 Block diagram of conceptual CELP synthesis model Short-term synthesis filter FIGURE 1/G.729...[D01] = 5 CM 2.1 Encoder The encoding principle is shown in Figure 2. The input signal is high-pass filtered and scaled in the pre-processing block. The pre-processed signal serves as the input signal for all subsequent analysis. LP analysis is done once per 10 ms frame to compute the LP filter coefficients. These coefficients are converted to Line Spectrum Pairs (LSP) and quantized using predictive two-stage Vector Quantization (VQ) with 18 bits. The excitation signal is chosen by using an analysis- by-synthesis search procedure in which the error between the original and reconstructed speech is minimized according to a perceptually weighted distortion measure. This is done by filtering the error signal with a perceptual weighting filter, whose coefficients are derived from the unquantized LP filter. The amount of perceptual weighting is made adaptive to improve the performance for input signals with a flat frequency-response. The excitation parameters (fixed and adaptive-codebook parameters) are determined per subframe of 5 ms (40 samples) each. The quantized and unquantized LP filter coefficients are used for the second subframe, while in the first subframe interpolated LP filter coefficients are used (both quantized and unquantized). An open-loop pitch delay is estimated once per 10 ms frame based on the perceptually weighted speech signal. Then the following operations are repeated for each subframe. The target signal x(n) is computed by filtering the LP residual through the weighted synthesis filter W(z)/Â(z). The initial states of these filters are updated by filtering the error between LP residual and excitation. This is equivalent to the common approach of subtracting the zero-input response of the weighted synthesis filter from the weighted speech signal. The impulse response h(n) of the weighted synthesis filter is computed. Closed-loop pitch analysis is then done (to find the adaptive-codebook delay and gain), using the target x(n) and impulse response h(n), by searching around the value of the open-loop pitch delay. A fractional pitch delay with 1/3 resolution is used. The pitch delay is encoded with 8 bits in the first subframe and differentially encoded with 5 bits in the second subframe. The target signal x(n) is updated by subtracting the (filtered) adaptive-codebook contribution, and this new target, x′(n), is used in the fixed- codebook search to find the optimum excitation. An algebraic codebook with 17 bits is used for the fixed-codebook excitation. The gains of the adaptive and fixed-codebook contributions are vector quantized with 7 bits, (with MA prediction applied to the fixed-codebook gain). Finally, the filter memories are updated using the determined excitation signal. Recommendation G.729 (03/96) 3 T1518650-95/D02 Input speech Pre- processing Fixed codebook Synthesis filter Pitch analysis LPC info Perceptual weighting Gain quantization Parameter encoding Transmitted bitstream LP analysis quantization interpolation Adaptive codebook Fixed CB search LPC info GP GC FIGURE 2/G.729 Encoding principle of the CS-ACELP encoder LPC info FIGURE 2/G.729...[D02] = 16 CM 2.2 Decoder The decoder principle is shown in Figure 3. First, the parameter’s indices are extracted from the received bitstream. These indices are decoded to obtain the coder parameters corresponding to a 10 ms speech frame. These parameters are the LSP coefficients, the two fractional pitch delays, the two fixed-codebook vectors, and the two sets of adaptive and fixed-codebook gains. The LSP coefficients are interpolated and converted to LP filter coefficients for each subframe. Then, for each 5 ms subframe the following steps are done: • the excitation is constructed by adding the adaptive and fixed-codebook vectors scaled by their respective gains; • the speech is reconstructed by filtering the excitation through the LP synthesis filter; • the reconstructed speech signal is passed through a post-processing stage, which includes an adaptive postfilter based on the long-term and short-term synthesis filters, followed by a high-pass filter and scaling operation. 4 Recommendation G.729 (03/96) T1518660-95/d03 Fixed codebook Short-term filter Post- processing GC GP Adaptive codebook FIGURE 3/G.729 Principle of the CS-ACELP decoder FIGURE 3/G.729...[D03] = 7 CM 2.3 Delay This coder encodes speech and other audio signals with 10 ms frames. In addition, there is a look-ahead of 5 ms, resulting in a total algorithmic delay of 15 ms. All additional delays in a practical implementation of this coder are due to: • processing time needed for encoding and decoding operations; • transmission time on the communication link; • multiplexing delay when combining audio data with other data. 2.4 Speech coder description The description of the speech coding algorithm of this Recommendation is made in terms of bit-exact, fixed-point mathematical operations. The ANSI C code indicated in clause 5, which constitutes an integral part of this Recommendation, reflects this bit-exact, fixed-point descriptive approach. The mathematical descriptions of the encoder (clause 3), and decoder (clause 4), can be implemented in several other fashions, possibly leading to a codec implementation not complying with this Recommendation. Therefore, the algorithm description of the ANSI C code of clause 5 shall take precedence over the mathematical descriptions of clauses 3 and 4 whenever discrepancies are found. A non-exhaustive set of test signals, which can be used with ANSI C code, are available from the ITU. 2.5 Notational conventions Throughout this Recommendation, it is tried to maintain the following notational conventions: • Codebooks are denoted by caligraphic characters (e.g. ). • Time signals are denoted by their symbol and a sample index between parenthesis [e.g. s(n)]. The symbol n is used as sample index. • Superscript indices between parenthesis (e.g. g(m) are used to indicate time-dependency of variables. The variable m refers, depending on the context, to either a frame or subframe index, and the variable n to a sample index. • Recursion indices are identified by a superscript between square brackets (e.g. E[k]). • Subscripts indices identify a particular element in a coefficient array. • The symbol ^ identifies a quantized version of a parameter (e.g. gc^ ). • Parameter ranges are given between square brackets, and include the boundaries (e.g. [0.6, 0.9]). Recommendation G.729 (03/96) 5 • The function log denotes a logarithm with base 10. • The function int denotes truncation to its integer value. • The decimal floating-point numbers used are rounded versions of the values used in the 16 bit fixed-point ANSI C implementation. Table 2 lists the most relevant symbols used throughout this Recommendation. A glossary of the most relevant signals is given in Table 3. Table 4 summarizes relevant variables and their dimension. Constant parameters are listed in Table 5. The acronyms used in this Recommendation are summarized in Table 6. TABLE 2/G.729 Glossary of most relevant symbols TABLE 3/G.729 Glossary of most relevant signals Name Reference Description 1/Â(z) Equation (2) LP synthesis filter Hh1(z) Equation (1) Input high-pass filter Hp(z) Equation (78) Long-term postfilter Hf (z) Equation (84) Short-term postfilter Ht(z) Equation (86) Tilt-compensation filter Hh2(z) Equation (91) Output high-pass filter P(z) Equation (46) Pre-filter for fixed codebook W(z) Equation (27) Weighting filter Name Reference Description c(n) 3.8 Fixed-codebook contribution d(n) 3.8.1 Correlation between target signal and h(n) ew(n) 3.10 Error signal h(n) 3.5 Impulse response of weighting and synthesis filters r(n) 3.6 Residual signal s(n) 3.1 Pre-processed speech signal s^(n) 4.1.6 Reconstructed speech signal s′(n) 3.2.1 Windowed speech signal sf(n) 4.2 Postfiltered output sf ′(n) 4.2 Gain-scaled postfiltered output sw(n) 3.6 Weighted speech signal x(n) 3.6 Target signal x′(n) 3.8.1 Second target signal u(n) 3.10 Excitation to LP synthesis filter v(n) 3.7.1 Adaptive-codebook contribution y(n) 3.7.3 Convolution v(n) * h(n) z(n) 3.9 Convolution c(n) * h(n) 6 Recommendation G.729 (03/96) TABLE 4/G.729 Glossary of most relevant variables TABLE 5/G.729 Glossary of most relevant constants Name Size Description gp 1 Adaptive-codebook gain gc 1 Fixed-codebook gain gl 1 Gain term for long-term postfilter gf 1 Gain term for short-term postfilter gt 1 Gain term for tilt postfilter G 1 Gain for gain normalization Top 1 Open-loop pitch delay ai 11 LP coefficients (a0 = 1.0) ki 10 Reflection coefficients k′1 1 Reflection coefficient for tilt postfilter oi 2 LAR coefficients ωi 10 LSF normalized frequencies p^i, j 40 MA predictor for LSF quantization qi 10 LSP coefficients r(k) 11 Auto-correlation coefficients r′(k) 11 Modified auto-correlation coefficients wi 10 LSP weighting coefficients l^i 10 LSP quantizer output Name Value Description fs 8000 Sampling frequency f0 60 Bandwidth expansion γ1 0.94/0.98 Weight factor perceptual weighting filter γ2 0.60/[0.4 − 0.7] Weight factor perceptual weighting filter γn 0.55 Weight factor postfilter γd 0.70 Weight factor postfilter γp 0.50 Weight factor pitch postfilter γt 0.90/0.2 Weight factor tilt postfilter Table 7 Fixed (algebraic) codebook L0 3.2.4 Moving-average predictor codebook L1 3.2.4 First stage LSP codebook L2 3.2.4 Second stage LSP codebook (low part) L3 3.2.4 Second stage LSP codebook (high part) 3.9 Gain codebook (first stage) 3.9 Gain codebook (second stage) wlag Equation (6) Correlation lag window wlp Equation (3) LP analysis window Recommendation G.729 (03/96) 7 TABLE 6/G.729 Glossary of acronyms 3 Functional description of the encoder In this clause the different functions of the encoder represented in the blocks of Figure 2 are described. A detailed signal flow is shown in Figure 4. 3.1 Pre-processing As stated in clause 2, the input to the speech encoder is assumed to be a 16 bit PCM signal. Two pre-processing functions are applied before the encoding process: 1) signal scaling; and 2) high-pass filtering. The scaling consists of dividing the input by a factor 2 to reduce the possibility of overflows in the fixed-point implementation. The high-pass filter serves as a precaution against undesired low-frequency components. A second order pole/zero filter with a cut-off frequency of 140 Hz is used. Both the scaling and high-pass filtering are combined by dividing the coefficients at the numerator of this filter by 2. The resulting filter is given by: Hh1(z) = 0.46363718 − 0.92724705z −1 + 0.46363718z −2 1 − 1.9059465z −1 + 0.9114024z −2 (1) The input signal filtered through Hh1(z) is referred to as s(n), and will be used in all subsequent coder operations. 3.2 Linear prediction analysis and quantization The short-term analysis and synthesis filters are based on 10th order Linear Prediction (LP) filters. The LP synthesis filter is defined as: 1 Â(z) = 1 1 + ∑i = 110 âi z − i (2) where âi, i = 1,...,10, are the (quantized) Linear Prediction (LP) coefficients. Short-term prediction, or linear prediction analysis is performed once per speech frame using the autocorr
本文档为【G.729音频压缩传输协议(ip电话)】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_549457
暂无简介~
格式:pdf
大小:245KB
软件:PDF阅读器
页数:39
分类:互联网
上传时间:2011-10-01
浏览量:29