Advanced DSP Technologies from BIAS
BIAS's audio software products employ a number of powerful digital signal processing technologies created by the company. This paper discusses only a few of these technologies in order to convey the potency they exhibit as they are now used in BIAS products, as well as the enormous potential for applications going well beyond their current implementations. These technologies are: Partial Harmonic Audio Technology, Dither Cloning Audio Technology, Ultimate Sample Rate Conversion, and Corrective Linear-Phase Noise Reduction.
Complex digital signal processing always entails a tradeoff between quality and efficiency, but this choice involves considerably more than selecting a point on a "slider" between the two. While any approach taken to programming DSP algorithms must finesse this balance, some approaches are able to achieve higher levels of both qualities than others. In all cases, BIAS's emphasis in developing technologies has been to achieve the highest possible quality, while optimizing algorithms to realize the greatest efficiency attainable.
Partial Harmonic Audio Technology (PHAT) and the Power of Analysis/Resynthesis
PHAT is an analysis/resynthesis engine that uses highly developed additive synthesis techniques to enable a wide variety of sound modification capabilities. PHAT performs its work in the frequency domain, which allows possibilities not available in the time domain. However, realizing these possibilities depends on well-designed processing of the data after conversion into the frequency domain.
Performing a Fast Fourier Transform (FFT) on the signal puts the data into the frequency domain, but while the FFT provides an accurate spectral breakdown, the center frequencies of the FFT channels (also called "frequency bins") do not directly represent the harmonics of a pitched signal, such as a musical instrument sound. In order to extract the harmonics, PHAT subjects the FFT data to a sophisticated analysis that computes the trajectory of each harmonic partial. PHAT analysis is has been optimized to allow it to work in real time, while producing high-quality output. The complexity of this analysis limits the current implementation of PHAT to working on monophonic sources only.
Natural sounds, especially musical instrument sounds, typically involve a noise component (referred to in technical terminology as the "residual" signal), in addition to the harmonic components. This might be breath in a wind instrument sound, or a bow scraping a violin string as it sets the string into motion. Because it is based on an additive model, PHAT is able to separate the residual signal from the harmonic material.
With access to each individual partial and the residual signal, many useful processes can be performed on the signal data before it is resynthesized by PHAT's efficient additive synthesizer.
The Perpetual Looper: A PHAT-based Tool
The Perpetual Looper in Peak Pro provides an excellent example of the capabilities of PHAT. We will start with a look at the task of looping and the most common loop authoring method. Then we will see how PHAT takes an alternative approach that solves problems which were insurmountable using previous techniques.
One of the most common sound design jobs is creating loops in instrumental sounds for use in samplers. Typically, a sampler plays the attack portion of a sampled note, then loops the middle portion of the sound as long as the user desires, after which it plays the rest of the file, called the "release" portion.
Traditionally, looping of the middle section has been accomplished by the user placing loop start and end markers in a standard time-domain waveform view of the note, then moving these points through trial and error until a smooth loop is obtained. This process is difficult and time-consuming at best, and, at worst, entirely unsuccessful. The most common problem is a click at the loop point, usually due to a waveform discontinuity. However, even when no discontinuity is visible in a time-domain waveform display, clicks can still occur, often because, while there may appear to be continuity in the loop of the waveform, one or more of the harmonic partials may be out of phase at the loop point, creating the click. The amplitude of the harmonic may be small enough that a discontinuity in it at the loop point is masked in the time domain, making it invisible, but the ear, which works in the frequency domain, is able to distinguish it.
There are a variety of tricks and techniques employed by sound designers, such as crossfade looping, which are used to enhance the looping process. Though these are often helpful, loop authoring remains a hit-or-miss process, filled with operational annoyances like having to zoom in on the waveform (to see it in sufficient detail to find loop points) so far that the entire loop cannot be onscreen at one time.
Peak Pro's Perpetual Looper uses PHAT to take the looping process into the frequency domain, where the characteristics of each partial can be treated separately.
With the harmonic trajectories being provided by PHAT, it is possible to essentially loop each partial independently, assuring continuity at the loop points of all partials. Nonetheless, there still exists the possibility of timbral artifacts from discontinuities in vibrato (periodic, low-frequency modulation of frequency and amplitude) or tremolo (periodic modulation of amplitude only). The Perpetual Looper is capable of smoothing these discontinuities for each harmonic, after which the amplitude, frequency, and phase of each harmonic all have been properly handled. Further, the user is provided with controls for varying amplitude and frequency modulation independently, enabling effects that don't occur in nature. For example, the Perpetual Looper will allow the transformation of an instrument sound with vibrato into one that has only frequency modulation and no amplitude modulation, or one that has tremolo instead of vibrato.
Additionally, the Perpetual Looper can smooth timbral discontinuities (pulsation artifacts) caused by the natural decay of high frequencies over the duration of the looped section.
But that is only for a single channel; what if the source is a stereo file? In traditional time-domain looping, stereo files just double the difficulty, as it is practically impossible to find a set of loop points that works for both channels. The Perpetual Looper is capable of performing all of its processes on each channel independently.
Note that all of this processing and loop calculation has been accomplished with no actions by the user beyond specifying desired loop start and end points.
Once all of these parameters have been calculated, PHAT resynthesizes the sound, giving the user control of the extracted residual signal and its gain in the loop. PHAT also automatically smoothes the transitions from the noise level in the attack portion of the note into the noise level at the loop start, and from the noise level at the loop end into that of the release portion of the sound.
PHAT's capabilities offer the possibility of users dropping loop markers in a collection of files with no attempt to tune the loops, and then using the Perpetual Looper in a batch process to obtain smooth loops in every file.
Much More to Come
It is clear that PHAT, by working with an additive model in the frequency domain and applying highly-crafted, intelligent algorithms, enables the Perpetual Looper to radically increase the efficiency of the looping process, while producing a result superior to traditional methods, and, at the same time, offering new sound contouring options.
The Perpetual Looper in Peak Pro is only one of PHAT's potential applications. PHAT's highly developed analysis algorithms and refined additive synthesis make possible a multitude of processes, including timbral modification, alterations to harmonic structure, time scaling, pitch shifting, sound morphing, and so forth. PHAT is a flexible and potent engine that can be used for applications that were previously impossible and/or unimaginable.
Dither Cloning Audio Technology (DCAT): Color Picker for Dithering
DCAT is a flexible dither modeling system that lets users choose the most appropriate type of dither for their application. DCAT implements the most common form of dither, triangular density probability function (TPDF), but its flexibility derives from its ability to produce dither having any noise-shaping curve.
Dithering Background
The use of dithering in digital audio is well established. Dithering deliberately adds a small amount of noise to a signal in order to reduce quantization error, thus improving the linearity of low-level signal reproduction when a digital audio signal is converted to the analog domain. While this means that little more hiss can be heard in the signal, hiss is less objectionable to the ear than the sound of quantization error.
To minimize the intrusiveness of dither on a signal, noise shaping is used to move energy in the dithering signal to higher frequencies where the ear is less sensitive. The key question with noise shaping is how much energy to push higher in the spectrum. While it is possible to push a great deal of energy to higher frequencies and reduce the audibility of dither greatly, high-frequency headroom is correspondingly reduced, increasing the risk of time-domain amplitude overload (clipping) if further processing — even reverberation — is applied.
This is why many products that add dither will offer two levels of dither, usually labeled something like "normal" and "ultra." "Ultra" usually means that it pushes much more energy to the high end of the spectrum, making it most suitable for final mastering, where no further processing is anticipated, while "Normal" is more conservative and leaves more headroom, thus allowing for additional production.
There are a variety of noise-shaping curves in use, each with its own effect on timbre. Choosing a noise-shaping curve for dither is a subjective choice that will vary with the source material.
Dither Modeling with DCAT
While most programs offer only a few choices of noise-shaping curves, DCAT can import and even edit noise-shaping curves, thereby offering the user a broad selection of options from which to choose. DCAT's dither can be further adapted using its noise-shaping tone control, which introduces a dip in the frequency response at a frequency and depth of the user's choice.
In Peak Pro, DCAT is available from the "Save As" dialog, where users currently have a choice of 10 different types of dither (including an implementation of POW-r dither, which was already in the program and is not modeled). This selection includes models based on the dither types that fared the best in The Great Dither Shootout.
While users generally will not want to directly edit noise-shaping curves in DCAT, the ability to do so enables developers to offer users curves designed to flatter specific types of audio sources, add new dither types that may be introduced into the market, or simply present users with as large a selection of dither types as seems desirable.
Ultimate Sample Rate Conversion (USRC) Brings High Accuracy to Sample Rate Conversion
USRC is a sample rate conversion algorithm with high efficiency and extreme accuracy. Sample rate conversion is a critically important process, especially since it is often the final step in production and delivery. While the methods for sample rate conversion are well documented, it is certainly the case that all implementations do not produce the same level of audio quality. Sample rate conversion involves both upsampling and downsampling of the signal. Poor quality filtering in these processes can result in imaging or aliasing. Rounding errors and calculation jitter can also degrade audio quality. Clearly, great care must be taken with the algorithms used in the conversion process.
USRC utilizes supercomputer-level computation, and was engineered with meticulous attention to detail in order to minimize numerical error, and rigorous optimization to maintain efficient execution. In addition to maintaining high audio quality, USRC opens up new avenues of flexibility. For example, sample rates do not need to be integer or even fractional values; sample rates at any real number value are possible.
USRC also has the ability to vary the sample rate continuously. In Peak Pro, this is currently used for scrubbing, but there are many other potential applications, one example being varispeed recording, which was a standard production technique with analog tape ever since the Beatles pioneered it in the '60s. Unlike most audio production tasks, varispeed became more difficult to achieve with digital audio technology. Using USRC, varispeed recording could resume its rightful place in the production toolbox.
Corrective Linear-Phase Noise Reduction (CLNR): A Comprehensive Solution for Removing Noise and Artifacts
Denoising software performs the complex task of reducing or removing undesirable noise and artifacts from audio, while affecting the program material as little as possible. CLNR is a patent-pending set of algorithms that provides a complete denoising toolkit that is highly effective, yet has a lower CPU cost than other solutions. CLNR is the basis for BIAS's SoundSoap 2 and SoundSoap Pro products, which provide consumers and professionals with high-performance, cost- effective denoising solutions.
Denoising is usually broken down into three tasks: click and crackle removal, hum and rumble removal, and broadband noise reduction. Each of these tasks is achieved through entirely different processes. The methods used and the order in which processes are executed are both determinants of the level of success a denoising tool can achieve.
Typically, removal of clicks and crackle is accomplished using a model that splits a signal into two parts: the tonal component and everything else that is not tonal. The non-tonal component of the signal is then examined, and crackle identified in it and removed. Many denoising systems use a process that fails to eliminate all of the crackle.
CLNR uses a more rigorous method of expunging crackle, ensuring maximum success in its removal. This type of processing can be very compute-intensive, but BIAS has developed a particularly efficient model for the high degree of effectiveness it exhibits.
Click removal looks for signals whose amplitude is out of proportion to the rest of the signal. CLNR has an intelligent envelope detector that yields excellent accuracy in click detection.
Rumble removal, the simplest processing in the package, consists simply of a high-pass filter, but the filter applied by CLNR is of a careful, linear-phase design. The hum reduction is also accomplished with linear-phase filtering, but it attenuates at the frequency set by the user using a slider, and then also attenuates at integer multiples of that frequency. A bandwidth parameter allows the user to adjust for the level of "dirtiness" in the hum, and an attenuation parameter lets the user determine how heavily the process is applied.
Broadband noise reduction is achieved through a well-known spectral subtraction technique: first, the profile of the noise floor in the signal is learned, then an expander applied to each FFT bin attenuates portions of the spectrum that are below the noise floor, thus pushing down the noise level. The thresholds are set initially as a result of the analysis, but CLNR provides 12 sliders that allow the thresholds to be modified by hand. The expansion ratios are also available for modification, in order to allow the processing to be adjusted for each application.
The level of complexity in the user interface is scalable to a level appropriate to the user and application. SoundSoap Pro is aimed primarily at audio professionals, so it offers users access to a considerable number of processing parameters, including the gate thresholds in the broadband noise reduction section. SoundSoap 2 is targeted at a broader audience, so it presents a very stripped-down interface, with processing parameters like the gate thresholds being handled intelligently by the program "under the hood." The underlying technology is the same in both products.
Conclusion
BIAS has developed a body of technology underlying the company's digital audio products that offers broad possibilities for future applications of all types. Care has been taken in creating these technologies to ensure that they are of the highest audio quality and greatest flexibility, ensuring that they can be leveraged across many applications.