Compression has become state-of-the-art technology in modern audio
communications, both in wide-band and voice-band, thus allowing for
a great number of components such as mobile phones, radio and TV satellite
networks, Internet audio, digital audio broadcasting below 30 MHz (DRM
– Digital Radio Mondiale) and over 30 MHz (DAB - Digital Audio Broadcasting),
DVD (Digital Versatile Disc), VoIP (Voice Over Internet Protocol), and
many more. Lowering data rates to a minimum is contradictory to clarity
and fidelity of sound. In time of "all digital technology", sound
quality and the intelligibility of speech have become important again.
However the "old fashioned" methods of measuring, i.e. objective
grading the actual audio communication system are not satisfactory for
those systems. The traditional methods deal with S/N and distortion
(linear and nonlinear) measurements. Today audio encoders that uses
compression, or better to say – reduction, uses psychoacoustic models
of human hearing as the base for bit-rate reduction, so it is necessary
to simulate the subjective evaluation of human subjects if we want to
judge these systems. The analysis is based on the most recent perceptual
techniques, such as PEAQ (Perceptual Evaluation of Audio Quality), PESQ
(Perceptual Evaluation of Speech Quality) and PSQM (Perceptual Speech
Quality Measurement). Since the psychoacoustic models are developed
upon investigations with the real signal, the contemporary measurements
have to use the same natural stimulus for measurement: human voice and
music program material. Employment of such stimulus makes it possible
to monitor the quality during normal operation of system under test.
As a consequence of this approach that measures the perceived audio
quality instead of signal characteristics, it is possible to gain an
objective metrics, which truly characterizes the quality of service
("QoS") of a system or a network.
Due to the lack of international standards and recognized measurement
procedures, the only widely accepted assessment procedures for audio
or speech codecs were listening tests. The first methods for testing
telephone band speech signals were standardized within ITU-T (International
Telecommunication Union - Telecommunication Standardization Sector)
Recommendation P.800 in 1993. In 1994. ITU has recommended the ITU-R
BS.1116 (ITU-R - ITU Radiocommunication Sector) - a test procedure to
assess wide band audio codecs on the basis of subjective tests. The
analysis of the results from a subjective listening test is based on
the Subjective Difference Grade (SDG) defined as:
SDG = GradeSignal under test - GradeReference signal
The values of the SDG range from 0 to –4, where 0 corresponds to an
"imperceptible impairment" and –4 to a "very annoying impairment".
Both described methods use results from listening tests that are time-consuming,
expensive and impractical for everyday use. Consequently, the idea of
substituting the subjective listening tests with objective, computer-based
methods has become the main goal. After few years of research, in 1996
ITU finalized the recommendation P.861. This recommendation defines
the method for the objective analysis of speech codecs. Since it is
correlated up to 98% with the scores of subjective listening tests,
it uses the PSQM algorithm for cognitive perceptual model. For specific
applications, such as VoIP, the PSQM algorithm is not satisfactory,
so in 2001. ITU finalized another refined method through recommendation
P.862. It uses the PESQ algorithm for cognitive perceptual model. For
wide band audio codecs ITU-R recommended the PEAQ algorithm implemented
in recommendation ITU-R BS.1387.
The process of comparation, based upon mentioned cognitive models,
is divided into several phases. Each phase gives us as a result one
or more Model Output Variables (MOV), i.e. descriptors for various cognitive
processes. The final quality figure takes into account all MOVs and
is represented as single number Objective Difference Grade (ODG).
In this paper we show the results of tests made on various codecs* using
the PEAQ measurement algorithm according to ITU-R BS.1387. We made the
measurements using the computer measuring system Opera from Opticom.
The measurement was done in the following way: first we encoded referential
audio clip** on all codecs and on all most common bitrates. Then we
decoded all of the resulting compressed clips and made tests in Opera
comparing them with the referential uncompressed clip. In this phase
the results are showed only for the 100% compatible encoders and decoders,
i.e. the devices from the same manufacturer. Please note that all measurements
were done in stereo so the bit rates are showed accordingly (i.e. 128
kbps refers to two [left and right] 64 kbps encoded audio channels).
For the brief overall comparison the final ODG value for different bit
rates are shown in Fig.1. For deeper analysis we used the MOVs obtained
during the measurements.
kbps128 kbps160 kbps192 kbps256 kbps
Figure 1 – ODG values for 4 different codecs
It is visible that all codecs act similarly at higher (≥160 kbps) bitrates
- the differences are minimal except for the MP2. On the lower bitrates
(<160 kbps) on the other hand, we can see different behavior of all
four codecs, especially in the most interesting 128 kbps. The best is
OGG Vorbis and the AAC is very near.
From these results we can conclude that it is very important to pick
the right codec at lower bitrates while it is not so important on higher
bitrates in the terms of audio quality. However, due to the popularity
of some codecs, i.e. MP3, the choice is often not only based on quality
and that is why it is very hard to recommend any in particular.
* The codecs used are standard MP2 and MP3 (or MPEG 1 layer 2 and
layer 3, according to ISO/IEC 11172/3, 1992. where MPEG stands for Moving
Picture Experts Group and ISO/IEC for International Standards Organization/International
Electrotechnical Commission), AAC (Advanced Audio Coding or MPEG 2 AAC,
according to ISO/IEC 13818/3, 1994.) and OGG Vorbis (free codec from
Xiph.org, different from MPEG 1 and 2 standards). They all use psychoacoustic
methods for lossy compression (some data is discarded as irrelevant
according to their algorithm and cannot be restored unlike lossless
compression where no data is lost) of the audio data - the differences
are in their algorithms and their complexity so they do not act the
same which is visible from the results of the measurements.
** The audio clip used for encoding and testing was ripped from audio
CD in WAV format (16 bit, 44.1 kHz), it was 5 seconds in duration, and
was the same for all four codecs.