from Michael Klingbeil:
I don't know how useful sinusoidal analysis/resynthesis would be for a kiosk type project, but I thought I would say a few words about it since I've been messing around with it for some time.
Also I would recommend checking out the following two projects (which I mention below):
Loris http://www.cerlsoundgroup.org/Loris/
ATS http://www.dxarts.washington.edu/ats/
Both of these are nicely packaged up ways to do analysis resynthesis, and actually might be better to use than my own program (which at the moment is pretty heavily gui oriented and will take work be more independently decoupled). ATS has a C library, a command line program and an gtk gui. Loris has a C++ backend with C and Python APIs.
Existing work:
Robert McAulay and Thomas Quatieri
- 1986 - "MQ Technique"
Julius Smith and Xavier Serra
- 1987 - PARSHL
Robert Maher and James Beauchamp
- 1988 - MQAN
These approaches are essentially the same. McAulay and Quatieri figured out the cubic phase interpolation method which maintains accurate phase under unmodified resynthesis. Maher and Beauchamp added the idea of a high frequency emphasis curve which helps in tracking soft high frequency partials. This was actually suggested in the PARSHL paper to compensate for "the asymptotic roll-off of all natural spectra." This really does seem to improve the results of the resynthesis and is an area I am continuing to tweak. Smith and Serra suggest some kind of adaptive equalization.
Xavier Serra
- 1989 - Spectral Modeling Synthesis (SMS)
The big breakthrough here is an attempt to model non-sinusoidal aspects of the signal (i.e. noise). After sinusoidal modeling is performed the sinusoidal peaks are subtracted out (this can also be done in the time domain if phase accurate synthesis is performed). The resulting spectral envelope can be used to filter broadband noise. Another approach is to store the exact residual signal and add this back in. This isn't so useful if the model is to undergo extensive modifications.
Kelly Fitz and Lippold Haken
- 1994 - Lemur 2000 - Loris
Lemur was a really cool program that ran on the Mac. It is no longer maintained or supported. It performed basic sinusoidal modeling (MQAN style) and offered a graphical display of the analysis. It was possible to perform some modifications graphically -- you could select regions and you could label partials. The graphics display was pretty slow and you couldn't analyze very long sounds.
10 years later my program SPEAR kind of attempts to pick up where Lemur left off. Computers are a lot faster so you can analyze longer sounds, perform complex graphical selections, and resynthesize in real-time (even while performing modifications). The other nice thing about SPEAR is that the data structures are not tied to the notion of distinct analysis "frames" (i.e. a quantized time grid). So you can really stretch, slide, cut and paste partials in an arbitrary fashion.
Loris is a pretty cool project which has a new way to deal with the problem of how to model the noisy part of a signal (noise modulated partials). They also use the method of time and frequency reassignment to deal better with transients. They also describe how this model is good for audio morphs.
Juan Pampin
- 2000 - ATS
Scott Levine
- 1998 - Multiresolution Sinusoidal Modeling
Sinusoidal modeling doesn't necessarily work very well with heavily polyphonic (wideband) inputs. The multiresolution approach is an attempt to deal with some of these problems.
If we are feeding entire song or some kind of iPod input into a kiosk and we do sinusoidal analysis/resynthesis the result isn't going to sound that much like the original -- the attack transients will be really smeared out which makes any kind of drums sound really bad. However sinusoidal modeling could be useful for purely analytical purposes. The results could feed some other kind of mashup algorithms.
Transient detection and preservation is possible. Having a robust transient detector would be pretty useful for almost any kind of mashup project.
