Opus is a data format for lossy audio data compression with special suitability for interactive real-time transmissions over the Internet.
It is described as an international open standard in RFC 6716. A frequency transformation and Linear Predictive Coding (LPC) are used as basic methods. It enables particularly high sound quality and particularly low delay in transmissions.
Range of possible bit rates and algorithmic delays in comparison
Opus has a particularly low codec latency to minimize latency in real-time applications when processing the signal typically generated just before the compressed transmission. Opus works with block lengths of 2.5 to 20 (60) ms with dynamic, artifact-free switching between different block lengths. Depending on the mode, there are also 2.5 to 5 ms at Lookahead. It allows constant and variable bit rates over a very wide range of 6 kbit/s to 510 kbit/s and the imaging of the entire human auditory range. Only two channels (stereo) can be coupled at a time. More channels can be displayed by multiplexing them (independently or in pairs) together into a container file. EBU recommendation R 128 is supported for receiver-side loudness adjustments.
The procedure is openly documented as an open standard and a reference implementation is published in source code. Parts of the procedure are encumbered with software patents, whereby the right holders have agreed to unlimited use of their patents in the context of the use of the codec including future versions of the standard. However, all reserve the right to use their patents to defend against patent claims by third parties.
Opus is a hybrid process of CELT and a heavily modified, incompatible version of SILK. The method has three modes, two for pure speech signals and one for music, for example. The speech modes provide a mode in which the entire human auditory range is mapped, whereby the algorithms of CELT, which are essentially based on frequency transformation (MDCT), are responsible for an upper frequency component from 8 kHz, which are essentially based on Linear Predictive Coding (LPC) SILK algorithms for the lower one. For low bit rates (below about 30 kBit/s) the frequency range can be limited and the CELT layer can be omitted. For other signal types, the SILK layer specializing in speech signals can be switched off and only the unspecialized CELT can be used. Starting with version 0.9.2 (March 2011), it is possible to switch seamlessly between these modes during operation and the encoder automatically selects the mode by default.
Opus data can be packed in Ogg containers. The content of such Ogg-Opus data streams is then specified with audio/ogg; codecs=opus and for Ogg-Opus files the file name extension .opus is recommended. Support for encapsulating Opus in the Matroska container format is being worked on.
Comparison of coding efficiency characteristics
In comparative hearing tests, Opus at low bit rates is superior in quality to the HE-AAC codecs previously dominated by the use of proprietary spectral band replication. At bit rates of 12 kbit/s and below, a version of the codec from mid-February 2011 was subject to the AMR codecs from GSM for voice signals, which qualitatively mark the state of the art here. Opus-internally, the hybrid mode was superior for speech signals at bit rates between about 20 and 48 kbit/s - above that the purely MDCT-based mode and below that the purely LPC-based mode. In contrast to other common transformation methods, strong tonal signals are particularly difficult for Opus and complex passages can be displayed relatively sparingly.
Opus is a hybrid codec that combines two different, originally separate processes. A transformation layer (originally CELT) works on the basis of the modified discrete cosine transformation (MDCT) and approaches of CELP (codebook for excitation, but in the frequency domain). A layer specializing in speech signals (originally SILK) is based on Linear Predictive Coding (LPC). The original SILK has been modified, including support for 10 millisecond blocks. The common area coding of the two parts of a hybrid data stream was taken over by CELT. The LPC part operates internally at a sampling frequency of 16 kHz. The encoder has a built-in sample rate converter. To compensate for the lower lookahead of the CELT signal, it is delayed accordingly. SILK has a greater algorithmic delay to minimize the administrative data generated by the transmission protocols in its typical application scenario. For example, lower latencies are possible in pure CELT operation.
The transformation layer of Opus functions technically largely the same as the abandoned independent CELT. However, it has been modified and further developed for integration with SILK. Support for 20 millisecond blocks and signalizable deviations from the fixed allocation of available bits to the (bar) bands with a so-called "allocation tilt" and a so-called "band boost" have been added.
Opus is recommended by the Internet Engineering Task Force (IETF) as Request for Comments (RFC) 6716 as an international open standard for lossy audio data compression on the Internet and was developed by the codec working group at IETF using personnel from and based on initially separate proposals from Xiph.Org Foundation and Skype Technologies S.A. (today Microsoft). Main developers are Jean-Marc Valin (Xiph.Org, Octasic, Mozilla Corporation), Koen Vos (Skype) and Timothy B. Terriberry (Mozilla Corporation). Also involved were Raymond Chen (Broadcom), Gregory Maxwell (Xiph.Org) and Christopher Montgomery (Xiph.Org).
Mozilla pays the main developer Valin a salary for his development work on Opus as part of a paid job. The browser manufacturer Opera Software also explicitly supports Opus as a new, open standard. Google Inc. is committed to establishing Opus as a license-free standard format on the Internet. Microsoft's Skype department continues to be actively involved in the standardisation process as the (co-)initiator. Juin-Hwey (Raymond) Chen from Broadcom contributed a pre and post filter to pitch prediction in CELT. Other participants in the standardisation process at the IETF were representatives of the Chair of Communication Networks at the University of Tübingen and its commercial spin-off Symonics, Polycom and Cisco Systems.
Broadcom and the Xiph.Org Foundation hold patents relating to CELT, Skype/Microsoft patents are relevant to the SILK part. Alleged patent claims by Qualcomm and Huawei proved incorrect.
Meanwhile, an experimental development branch of the reference coder has started to work towards a version 1.1 with significantly better sound quality. On December 21, 2012, after more than a year of development, the first alpha version of the 1.1 series was released. On July 11, 2013, the beta phase for version 1.1 began and version 1.0.3 was released, which fixes some bugs and adopts the surround sound API of the 1.1 series. The finished version 1.1 was released on December 5, 2013.
Version 1.1 of the reference coder has reportedly achieved significantly better sound quality and efficiency, especially with particularly tonal sequences, by taking advantage of more possibilities of the format and improved coding decisions. Among other things, it exploits the possibilities of the bit rate variability (VBR) format by dynamically distributing the available bit rate between frequency bands ("dynalloc" - "band boost", "allocation tilt"). It has an unlimited VBR mode and adapts the bit rate more aggressively to the complexity of the source material. The new VBR mode now tries to achieve a constant quality across all files and no longer to reach the specified target bit rate per file. The encoder has been calibrated to approach the specified target bit rate on average for a large quantity of encoded material with a wide mix of different, typical useful signals. Several new analysis steps investigate signal characteristics and inform coding decisions. Among other things, the bit rate is now increased by assessing the tonality of particularly tonal passages and automatically switched between the integrated LPC-based speech codec, MDCT-based coding and hybrid mode by recognizing speech signals. For spatial sound formats with more than two channels, dynamic bit rate assignment to the individual channels is now performed using masking effects between the channels and there have been quality improvements for the LFE channel. Due to initial code optimizations, the entire reference codec now has a significantly higher operating speed, especially on ARM devices.
Other new features include the possibility of predictive detection of signal characteristics for operating scenarios where time delays are not critical, the more efficient display of strongly correlated stereo signals, the rejection of DC components (3 Hz high pass), the temporal variation of the bit rate based on the volume and a level limiter to prevent hard clipping.
On November 26, 2015, version 1.1.1 with assembler optimizations for x86 (SSE, SSE2, SSE4.1), MIPS and ARM (NEON) was released.