Ambisonics has the potential to reproduce the physical sound field surrounding a listener. This is the holy grail of spatial audio. If this goal were achieved over the full audible range, then our perception would be precisely the same as in the recording situation. However, it has been well known ever since the early days of ambisonics that the laws of physics and information theory conspire to ensure that this goal is only achievable in a small region of space . The size of this region depends on the wavelength of the sound and the number of recorded audio channels. With four channels, as in the original B-format, the region has a diameter of about half a wavelength, which limits its usefulness to signals with bandwidth is less than 800 Hz. One clearly needs to increase this frequency range, or increase the size of the reconstruction region, which are two sides of the same coin.
Several solutions and work-arounds have been proposed. The most successful approaches until now have been higher-order ambisonics (HOA)  and shelf filtering , which may be applied separately or in combination. HOA basically solves the problem by using more than four audio channels. However, to double the diameter of the reconstruction region, one would need four times the number of channels . A 20 cm sweet zone, barely big enough for both ears, would require 1600 audio channels to span 20 kHz.
Shelf filtering sidesteps the ultimate goal and aims to reproduce the auditory scene where one cannot reproduce the physical sound field. The auditory scene can be defined as the listener's perception of the sound field. However, the perceptual model that was used to arrive at the shelf filtering method was tailored to match the technology which was available to implement decoders at the time the method was conceived. The drawback of this method is that either the sweet-spot is very small or the localization blurry or both, particularly at high frequencies.
As more signal processing power has become available, more advanced methods have become feasible, allowing for more advanced perceptual models and more precise reproduction of the auditory scene.
The main advantage of parametric decoding is that it can determine the direction of each sound source with great accuracy, even when only four channels are available. These directions are parameters that are determined by the decoder, hence the name. Intuitively, one can imagine how the precise direction of any isolated sound source can be determined by studying the amplitude ratios coming out of four cardioid (slightly directive) microphones arranged in a tetrahedron. A B-format signal can easily be converted into this form, called the A-format, but it turns out that it is easier to carry out the necessary calculations directly in the B-format.
When sound sources are not isolated, one can use filters to separate them to some extent. This is why all parametric decoders comprise some form of filter bank. Knowing where the sound came from at the recording site, one can choose between a number of methods to create speaker feeds that create the illusion of sound coming from that direction at the playback site.
But how about the cases where filters are not able to separate the sound sources? This is very often the case. In a musical ensemble, for instance, even if each instrument plays a different note, they are often harmonically related, which by definition means that some of their partials will share the same frequencies and can not be separated by filters. Proponents of parametric decoding will claim that our auditory system will disregard those frequencies for the purpose of localization and localize each instrument based only on those partials where the instrument can be heard in isolation. After all, our ears also use filter banks to tease sound sources apart. There is undoubtably some truth in this claim, but as is often the case in psychoacoustics, it is not as simple as that.
Sometimes, all partials of an instrument will overlap with another instrument. Consider a guitar and a snare drum, for instance. Since the spectrum of the drum spans wider than the spectrum of the guitar, there is no way that a filter can isolate any part of the guitar sound, so the direction estimates for all partials of the guitar will be pulled in the direction of the snare drum for every beat. Will our auditory system ignore such an error? Experiments  indicate that the answer is no. The result is a sound image which is crystal clear and perfectly sharp, but where sound sources seem to move in response to each other, especially when the auditory scene gets crowded.
Harpex is a parametric decoder. What distinguishes it from previous parametric decoders is the way it calculates its direction estimates. Instead of looking only at the A-format amplitudes, it pays equal attention to the phases of the signals. This way, it is possible to figure out the precise direction of arrival of two sound waves (planewaves) in each frequency band. Therefore, even if the snare drum and guitar could not be separated through filtering, Harpex is able to perfectly separate and localize them both.
Of course, this only moves the localization errors one step away, and three overlapping sources will inevitably cause Harpex to produce inaccurate direction estimates. However, the human capability to detect such errors must have a limit somewhere, and it seems that Harpex is able to fly below that radar. The problem with unwanted movement is resolved, and the result is a sound image which is not only clear and sharp, but also stable.
© 2011–2017 Harpex Ltd