Header Sep
Latest News
My Rating Score
Login to rate page

August 23, 2006
Getting the hang of the music industry
Part 2: Understanding codecs and bandwidth issues

[Back]

 

Part two in our music industry article series gives developers an introduction to data compression, compares MP3 and AAC efficiency, presents a technical overview of audio codecs including MP3, AAC, AAC+ and E-AAC+ and summarizes strategies for managing network bandwidth.

In case you missed it, part one in the series presents the intricate web of participants involved in the music business, from content creators to record labels, rights organizations, distributors, retailers, and mobile operators. Part 1: Understanding the players, market trends and service definitions>>

Introduction
Data bandwidth in mobile networks is limited, ranging from around 80 kbit/s in GPRS to 384 in UMTS. Uncompressed digital audio on a CD is recorded as linear PCM (Pulse Code Modulation) and has a bit rate of around 1400 kbit/s (a sampling rate of 44.1 kHz multiplied by a bit depth of 16 bits and doubled to account for both tracks of a stereo recording). The files are also quite large, running around 10 megabytes for one minute of stereo audio. Both the bit rate and the file size make it unrealistic to transmit this type of file over a mobile network. In order to transmit music in a way that provides a satisfactory experience for the user, it is necessary to use data compression techniques on audio that is intended to be delivered over these networks.

Data compression

Data compression for audio is accomplished through the use of a codec (encoder/decoder). There are two types of audio compression, lossless and lossy. Lossless codecs reduce the file size without discarding any audio data. Lossless codecs are primarily used to gain advantages in file size with archiving or storing for local playback and are not suitable for use in mobile transmission.

Research in psychoacoustics has revealed that not all data in an audio stream is perceived by the human ear. Codecs using lossy compression take advantage of this fact by discarding unnecessary data. Additionally, the bit depth of the stream is significantly reduced through the use of noise shaping. This technique uses a perceptual model of human hearing to significantly reduce the bits used to encode frequencies that are harder for the human ear to hear. Lossy codecs can achieve significant reductions in file size and bit rate, making them perfect for use in mobile applications.  

Because audio perception is highly subjective, purely technical approaches do not necessarily account for the way imperfections are perceived by the human ear. Results are not always generally applicable. For example, two codecs might differ greatly in quality at low bitrates, but not so much at high bitrates. That is why codecs have been subject to extensive listening tests.

The main objective is to have as high compression efficiency as possible, i.e. achieving a small size (bitrate, kbit/s) without compromising the quality (quality-in equals quality-out). There is always a trade-off between the amount of compression applied and the perceived quality of the sound. When compressing the sound with AAC or MP3, the quality perceived by the human ear is close to CD quality at these bit rates. Newer codecs are far more efficient. With AAC+ v1, the same quality can be achieved at about 64 kbit/s and with AAC+ v2 at about 32-48 kbit/s.

MP3/AAC efficiency comparison

Currently, the two most widely used codecs for music are MP3 and AAC. MP3 was designed as part of the MPEG 1 standard used for Video CDs and the first commercial digital satellite networks. As such, it was designed primarily to support high quality speech with background music, as in a film or television show, and therefore is not ideally suited for pure music encoding. AAC is a more recent development and was created with high-quality music encoding in mind. Both formats are supported in most handsets from major manufacturers.

Following is a comparison of the two codecs:

  • AAC LC (low complexity) is approximately 25% more efficient than MP3
  • AAC+ is approximately 40% more efficient than AAC
  • E-AAC+ (enhanced AAC+) is approximately 40% more efficient than AAC+

Implying...

  • AAC+ is approximately 55% more efficient than MP3
  • E-AAC+ is approximately 75% more efficient than MP3

A comparable sound quality from a 4 MB MP3 track at 128 kbit/s (approx 4 minutes) is achieved through a file of:

  • 3 MB (96 kbit/s) when encoded in AAC
  • 2 MB (56 kbit/s) when encoded in AAC+
  • 1 MB (36 kbit/s) when encoded in E-AAC+

Audio codec overview

Below is a general overview of some of the audio codecs used in the industry. Sony Ericsson's intention is to support the standard music codecs that are most prevalent on the global market. A full list of which codecs are supported on Sony Ericsson phones can be found in the publication "Developers' Guidelines, Music and Video in Sony Ericsson phones">> 

MP3

  • MP3 = MPEG-1/2/2.5, Layer 3
  • Specified by ISO
  • Bitrate: up to 320 kbit/s
  • Sampling rate: 8-48 kHz
  • Not referred to by any "mobile" standard, such as 3GPP
  • More information on MP3>>

AAC

  • AAC = Advanced Audio Coding
  • Specified by ISO (ISO/IEC 14496-3:2001)
  • Bitrate: up to 256 kbit/s (Low Complexity)
  • Sampling rate: 8-48 kHz
  • The LC (Low Complexity) object type is included as a recommended codec for MMS, Streaming, Video Clips (3GP) in 3GPP Rel-4 and onwards.
  • More information on AAC>> 
AAC+

  • Also known as:
      • HE-AAC (HE = High Efficiency), ISO's name
      • AAC+ v1 
  • Uses AAC LC as a basis
  • Adds SBR (Spectral Band Replication) tool
  • Uses the correlation between low and high frequencies to compute an "SBR signal". This signal uses a very small amount of data
      • The AAC encoding runs at half of the original sample rate (half bitrate, half spectrum)
      • The decoder uses the "SBR signal" to reconstruct the higher frequencies
  • Not referred to by any "mobile" standard, such as 3GPP
  • An AAC+ decoder can decode AAC and AAC+ encoded signals without compromising the quality
  • More information on AAC+>>

E-AAC+

  • E-AAC+ = Enhanced AAC+ (3GPP's name)
  • Also known as:
      • HE-AAC v2 (HE = High Efficiency), ISO's name
      • AAC+ v2, Coding Technologies brand name
  • Uses AAC+ as a basis
  • Adds PS (Parametric Stereo) tool
      • Uses the correlation between left and right channels to compute a "PS signal". This signal uses a very small amount of data
      • The mono signal (L+R) is AAC+ encoded (half bitrate)
      • The decoder uses the "PS signal" to reconstruct the stereo image
  • The PS tool can operate at any bitrates, but works best at low bitrates
  • Recommended codec for MMS, Streaming, Video Clips (3GP) in 3GPP Rel-6
  • Strong operator interest
  • An E-AAC+ decoder can decode AAC, AAC+ and E-AAC+ encoded signals without compromising the quality
  • More information on E-AAC+>>

WMA

  • WMA = Windows Media Audio
  • Specified by Microsoft
  • Bitrate: 5-320 kbit/s
  • Sampling rate: 8-48 kHz
  • Not referred to by any "mobile" standard, such as 3GPP
  • Used by a lot of music services such as Napster
  • More information on WMA>>

AMR-NB

  • AMR-NB = Adaptive Multi-Rate, Narrow-Band
  • Often referred to as simply "AMR"
  • Specified by 3GPP (TS 26.071)
  • Optimized for speech
  • Bitrate: 4.75-12.20 kbit/s
  • Sampling rate: 8 kHz
  • Codec for MMS, Streaming, Video Clips (3GP) in 3GPP Rel-4 and onwards
  • More information on AMR-NB>>

AMR-WB

  • AMR-WB = Adaptive Multi-Rate, Wide Band
  • Specified by 3GPP (TS 26.171)
  • Also known as G.722.2 in ITU
  • Optimized for high-quality speech
  • Bitrate: 6.60-23.85 kbit/s
  • Sampling rate: 16 kHz
  • Optional codec for MMS, Streaming, Video Clips (3GP) in 3GPP Rel-5 and onwards
  • Not used much for multimedia (but will likely be for voice calls)
  • More information on AMR-WB>>

ADPCM, IMA 4

  • ADPCM = Adaptive Differential Pulse Code Modulation
  • IMA = International Multimedia Association
  • Compresses 16-bit data to 4-bits/sample (4:1)
  • Bitrate: 32 kbit/s
  • Sample rate: 6-64 kHz
  • Not very good compression nor quality, but low complexity
  • Mostly used for Java games
  • More information on ADPCM, IMA 4>>

RealAudio

  • Specified by RealNetworks
  • Four codecs:
      • Sipro: ACELP voice codec, 5/6.5/8.5/16 kbit/s
      • Gecko: a.k.a. G2, 8-96 kbit/s
      • AAC: 96-320 kbit/s
      • RA Lossless: 700 kbit/s and above
  • More information on RealAudio>>

Strategies for dealing with bandwidth

A good end-user experience is the key to a successful music application. Users expect an instant experience no matter the bandwidth conditions. Even on a 3G network, a full music download will take a significant amount of time, from between 30 to 60 seconds or even more depending on file size and network conditions.

Sony Ericsson mobile phones support a number of different methods of downloading that support different schemes to deal with bandwidth conditions.

Streaming
Streaming is a method for making music, video, radio and other multimedia available in real-time or near real-time, over different types of networks. The data in the file is split into small packets that are sent in a continuous flow, or a "stream", to the end user's computer or mobile phone. The user can begin listening to the content in the first packets, while the rest are being transferred. There is a short delay at the start to allow the client to buffer a small amount of data. This buffer makes it possible for the client to play the stream without interruption, even if the rate of received data varies slightly. In the case of streaming, the audio file is not stored on the user's device, so in order to listen again, the user will have to reconnect to the streaming server and initiate (and pay for) another streaming session.

 

Download
Whereas streaming media is aimed at making media available in real-time, downloaded media allows the user to enjoy the media over and over again. The media is stored locally on the mobile phone, in the built-in memory or on a removable Memory Stick. The drawback to this method is that downloading large media files can take quite long time. For the user to be able to start listening to the media as soon as possible, progressive download may be used.

Progressive download
With progressive download, playback can start as soon as a certain amount of data has been buffered in the phone memory. This allows the user to listen to a song while it is still being downloaded. This technique is similar to the one used for streaming media, with the difference that the media is also saved in the phone when download is complete.

Dual Delivery
There is another download and delivery method called Dual Delivery which allows for an instant experience by providing a highly compressed music file for downloading over-the-air (OTA) along with a higher-quality file downloaded to a PC for subsequent listening and synchronization between the PC and a digital music player such as a Walkman phone. Dual download can be implemented as a function of a music service such as an online music store, and does not require any special functionality in the phone.

More information:

 

My Rating Score
Login to rate page