CARNet WebWorld - CUC 2004

Abstract

Compression has become state-of-the-art technology in modern audio communications, both in wide-band and voice-band, thus allowing for a great number of components such as mobile phones, radio and TV satellite networks, Internet audio, digital audio broadcasting below 30 MHz (DRM – Digital Radio Mondiale) and over 30 MHz (DAB - Digital Audio Broadcasting), DVD (Digital Versatile Disc), VoIP (Voice Over Internet Protocol), and many more. Lowering data rates to a minimum is contradictory to clarity and fidelity of sound. In time of "all digital technology", sound quality and the intelligibility of speech have become important again.

However the "old fashioned" methods of measuring, i.e. objective grading the actual audio communication system are not satisfactory for those systems. The traditional methods deal with S/N and distortion (linear and nonlinear) measurements. Today audio encoders that uses compression, or better to say – reduction, uses psychoacoustic models of human hearing as the base for bit-rate reduction, so it is necessary to simulate the subjective evaluation of human subjects if we want to judge these systems. The analysis is based on the most recent perceptual techniques, such as PEAQ (Perceptual Evaluation of Audio Quality), PESQ (Perceptual Evaluation of Speech Quality) and PSQM (Perceptual Speech Quality Measurement). Since the psychoacoustic models are developed upon investigations with the real signal, the contemporary measurements have to use the same natural stimulus for measurement: human voice and music program material. Employment of such stimulus makes it possible to monitor the quality during normal operation of system under test. As a consequence of this approach that measures the perceived audio quality instead of signal characteristics, it is possible to gain an objective metrics, which truly characterizes the quality of service ("QoS") of a system or a network.

Due to the lack of international standards and recognized measurement procedures, the only widely accepted assessment procedures for audio or speech codecs were listening tests. The first methods for testing telephone band speech signals were standardized within ITU-T (International Telecommunication Union - Telecommunication Standardization Sector) Recommendation P.800 in 1993. In 1994. ITU has recommended the ITU-R BS.1116 (ITU-R - ITU Radiocommunication Sector) - a test procedure to assess wide band audio codecs on the basis of subjective tests. The analysis of the results from a subjective listening test is based on the Subjective Difference Grade (SDG) defined as:
SDG = GradeSignal under test - GradeReference signal
The values of the SDG range from 0 to –4, where 0 corresponds to an "imperceptible impairment" and –4 to a "very annoying impairment".

Both described methods use results from listening tests that are time-consuming, expensive and impractical for everyday use. Consequently, the idea of substituting the subjective listening tests with objective, computer-based methods has become the main goal. After few years of research, in 1996 ITU finalized the recommendation P.861. This recommendation defines the method for the objective analysis of speech codecs. Since it is correlated up to 98% with the scores of subjective listening tests, it uses the PSQM algorithm for cognitive perceptual model. For specific applications, such as VoIP, the PSQM algorithm is not satisfactory, so in 2001. ITU finalized another refined method through recommendation P.862. It uses the PESQ algorithm for cognitive perceptual model. For wide band audio codecs ITU-R recommended the PEAQ algorithm implemented in recommendation ITU-R BS.1387.

The process of comparation, based upon mentioned cognitive models, is divided into several phases. Each phase gives us as a result one or more Model Output Variables (MOV), i.e. descriptors for various cognitive processes. The final quality figure takes into account all MOVs and is represented as single number Objective Difference Grade (ODG).

In this paper we show the results of tests made on various codecs* using the PEAQ measurement algorithm according to ITU-R BS.1387. We made the measurements using the computer measuring system Opera from Opticom. The measurement was done in the following way: first we encoded referential audio clip** on all codecs and on all most common bitrates. Then we decoded all of the resulting compressed clips and made tests in Opera comparing them with the referential uncompressed clip. In this phase the results are showed only for the 100% compatible encoders and decoders, i.e. the devices from the same manufacturer. Please note that all measurements were done in stereo so the bit rates are showed accordingly (i.e. 128 kbps refers to two [left and right] 64 kbps encoded audio channels).

For the brief overall comparison the final ODG value for different bit rates are shown in Fig.1. For deeper analysis we used the MOVs obtained during the measurements.
-4-3.5-3-2.5-2-1.5-1-0.50ODGMP3-3.19-1.46-0.4-0.10.01AAC-3.2-0.71-0.33-0.15-0.15OGG-3.1-0.43-0.31-0.160.01MP2-3.37-2.32-0.75-0.3364 kbps128 kbps160 kbps192 kbps256 kbps
Figure 1 – ODG values for 4 different codecs
It is visible that all codecs act similarly at higher (≥160 kbps) bitrates - the differences are minimal except for the MP2. On the lower bitrates (<160 kbps) on the other hand, we can see different behavior of all four codecs, especially in the most interesting 128 kbps. The best is OGG Vorbis and the AAC is very near.
From these results we can conclude that it is very important to pick the right codec at lower bitrates while it is not so important on higher bitrates in the terms of audio quality. However, due to the popularity of some codecs, i.e. MP3, the choice is often not only based on quality and that is why it is very hard to recommend any in particular.

* The codecs used are standard MP2 and MP3 (or MPEG 1 layer 2 and layer 3, according to ISO/IEC 11172/3, 1992. where MPEG stands for Moving Picture Experts Group and ISO/IEC for International Standards Organization/International Electrotechnical Commission), AAC (Advanced Audio Coding or MPEG 2 AAC, according to ISO/IEC 13818/3, 1994.) and OGG Vorbis (free codec from Xiph.org, different from MPEG 1 and 2 standards). They all use psychoacoustic methods for lossy compression (some data is discarded as irrelevant according to their algorithm and cannot be restored unlike lossless compression where no data is lost) of the audio data - the differences are in their algorithms and their complexity so they do not act the same which is visible from the results of the measurements.
** The audio clip used for encoding and testing was ripped from audio CD in WAV format (16 bit, 44.1 kHz), it was 5 seconds in duration, and was the same for all four codecs.