Tutorial - Understanding spectrograms

Under construction

Example sounds
In this tutorial, several example sounds are shown repeatedly with variations of display settings. Follow these instructions to recreate those sounds.


 * Silence: Use the Silence generator to create silence of any given duration.
 * Impulses: To create the briefest possible click sound or "impulse," enable the Draw Tool, then zoom the view of a track until individual samples are visible. Hold  and drag one sample upward or downward as far as possible.
 * Chromatic Scale: Generate one second of silence. Select the silence.  Use the Nyquist Prompt effect, and copy and paste the following code into the text entry.  Click .  The result is an ascending chromatic scale of pure tones separated by rests.  The notes begin at middle C and continue for four octaves.  Each is one-half second long with a tapering envelope.

(let ((scale (s-rest 0)) (env (hzosc (/ (get-duration 1)))))     (dotimes (ii 49) (setq scale           (sum scale (at ii                 (seq (prod                       (cue env)                        (cue env)                        (cue (osc (+ 60 ii) 0.5))) (s-rest 0.5))))))   (prod 0.99 scale))


 * Overtone scale: The following Nyquist code generates the note two octaves below middle C, and successive whole-number multiples of that frequency.

(let ((scale (s-rest 0)) (env (hzosc (/ (get-duration 1)))))     (dotimes (ii 32) (setq scale           (sum scale (at ii                 (seq (prod                       (cue env)                        (cue env)                        (cue (osc (hz-to-step (* 65.406 (1+ ii))) 0.5))) (s-rest 0.5))))))     (prod 0.99 scale))


 * Decaying tone: First generate exactly two minutes of silence, then select it and use the following Nyquist code to make a tone that decays at precisely 1 dB per second.

(let ((env (pwev 1.0 1.0 (db-to-linear -120.0))))     (prod env (hzosc (/ *sound-srate* 256.0))))


 * Pluck: The Pluck generator simulates a plucked string.
 * Chirp: The Chirp generator creates a continuously varying tone, rising or falling linearly or exponentially in frequency.

Default Preferences
Unless otherwise specified, all images use default values for Spectrogram Preferences. These are:
 * WindowSize: 256
 * WindowType: Hanning
 * Zero Padding Factor: 1
 * Minimum Frequency: 0 Hz
 * Maximum Frequency: 8000 Hz
 * Gain: 20 dB
 * Range: 80 dB
 * Frequency gain: 0 dB/decade
 * Grayscale: off

Linear and logarithmic scales
See the linear view of a pluck, up to 22050 Hz (shift-right click in vertical scale). The overtone series appears evenly spaced. Many naturally occurring sounds have similar series. See the logarithmic view. Now the spacing between overtones grows ever smaller as frequency increases.

Compare linear and logarithmic views of a chromatic scale (from 200 to 4800 Hz, window size 4096). The logarithmic view spaces musical semitones evenly on the vertical scale, so that the sequence of notes falls on a straight line.

Maybe generate an overtone scale and make the same contrast.

Window size
Generate a chirp, Sine waveform, 20 to 2000 Hz, constant 0.8 amplitude, Linear, 1 s. (Just show a screenshot of those settings).

Create a superposed impulse at approximately 0.5 seconds. All frequency components of an impulse have equal magnitude, so that the impulse appears as a vertical band of color in the spectrogram.

View with linear spectrogram, scale from 0 to 3000 Hz, otherwise defaults.

View again with window size of 4096.

This demonstrates the tradeoff between time and frequency localization. While the impulse looks narrower in the first picture, the white band in the chirp is also wider.

Perhaps it would also be useful to generate two impulses, say exactly 1024 samples apart, and show how a narrower window resolves them but a wide one does not. The impulse influences the colors over an interval of exactly the window size divided by the sample rate.

Perhaps a real-world example of voice or an instrument, showing that longer windows resolve the overtone series more distinctly, especially so when the fundamental is higher.

Zero padding factor
Increased zero padding can make a more smoothed image for shorter window size, at the expense of increased computation time. The product of window size and zero padding factor is not allowed to exceed the maximum window size without zero padding.

Reproduce the two examples of the previous section, but with Zero padding factor increased to 8. Note that there is no effect on the localization tradeoff. However the images appear much more smoothed, more so for the shorter window size. The white band has straighter boundaries and the "side lobes" of the window function's spectral leakage appear more distinctly as ripples.

Window type
I really don't know what is helpful to communicate to ordinary users, and I am not deeply versed in this stuff myself. Maybe one Hann vs. Rectangular contrast is more than enough.

Examples of pure tones and chirps may be useful.

The rectangular window can make the "main lobe," which appears in white, narrower than any other window type. Thus it can distinguish a mix of close frequencies better than other windows. However its spectral leakage is also worse.

Maybe this is just too academic?

Minimum and Maximum Frequencies
These determine the upper and lower bounds of both linear and logarithmic frequency scales. However you can use the vertical scale to override this preference during an editing session. If you click in Spectrogram Preferences, the scale is restored.

Gain and Range
This example uses the decaying tone, with a precisely chosen frequency and window type that makes the spectrogram appear very sharp, and a precisely determined exponential decay that aligns the colors of the palette to certain times.

Generate the decaying tone. Use Rectangular window and maximum frequency of 400.

Observe that the colored band is indistinguishably white from 0 to 20 seconds, that is, from 0 to -20 dB. From -20 to -100 dB are gradations of color, with red at -40, magenta at -60, pale blue at -80, and gray at -100. From -100 to -120 dB is indistinguishably gray.

The Gain preference for spectrogram views defaults to 20 dB, and here determines the width of the white band. The Range preference defaults to 80 dB, and here determines the width of the varying colors. The sum of the two is 100 dB, and in this example, the colors have faded into the background gray at exactly 100 seconds.

Vary Gain and Range and observe how the display changes.