Spectrogram View

From Audacity Development Manual
Jump to: navigation, search
The Spectrogram View of an audio track provides a visual indication of how the energy in different frequency bands changes over time. The Spectrogram can show sudden onset of a sound, so it can often be easier to see clicks and other glitches or to line up beats in this view rather than in one of the waveform views.
Tip Spectral selections, made in Spectrogram view, are used to make selections that include a frequency range as well as a time range on tracks.
  • They can be used with special spectral editing effects to make changes to the frequency content of the selected audio.
  • Among other purposes, spectral selection and editing can be used for cleaning up unwanted sound, enhancing certain resonances, changing the quality of a voice or removing mouth sounds from voice work.
  • For full details, see Spectral Selection and Editing.
Note: The images on this page show spectrograms with the non-default range of 8kHz.

Contents

  1. Selecting Spectrogram View
  2. Comparing Waveform View to Spectrogram View
  3. Per Track Spectrogram Settings
  4. What the Colors Mean
  5. Time Smearing and Frequency Smearing
  6. Vertical Zooming
  7. Effect of Different Window Types
  8. Zero padding factor
  9. Different Spectrogram views
  10. Algorithm
  11. Example of choosing the right settings for the job
  12. Spectral selection
  13. Multi-view - Spectrogram and Waveform


Selecting Spectrogram View

To select Spectrogram view, click on the track name (or the black triangle) in the Track Control Panel which opens the Track Dropdown Menu, where the required view can be selected..


Comparing Waveform View to Spectrogram View

Here is a mono music recording in waveform view with the exact same audio reset to spectrogram view below:

SpectrogramView Intro 00.png

The Waveform view can be switched to a Spectrogram view (and vice versa) or you can have both simultaneously with Multi-view selected from the Track Control Panel dropdown menu.


Per track Spectrogram Settings

It is possible to temporarily change the Spectrogram settings for a particular Spectrogram track by opening the Audio Track Dropdown Menu on the Spectrogram track you want to change, then choose Spectrogram Settings. This opens a dialog similar to Spectrograms Preferences with the same settings available.

Spectrogram Track Settings.png

Changes you make when you press the OK button only persist for that track while the project window is open. This is the case even if you save a project. Use Spectrograms Preferences instead to make permanent changes to the default Spectrogram settings with which a new Spectrogram track will open.

See Spectrogram Settings for more details.


What the Colors Mean

To demonstrate how the various settings affect the appearance of an audio track in spectrogram view, we will start with this artificially constructed test track. It consists of 10 segments of a sine wave tone at 2000 Hz, each 2 seconds long. The level of each segment in dB is indicated by the labels below the audio track.

This is how the track appears in waveform dB view.

SpectrogramView 01.png

This is how the track appears in spectrogram view, using the default settings.

SpectrogramView 02.png

The default settings are can be viewed at Spectrograms Preferences or above on this page.

Frequency settings

As you can clearly see, the minimum and maximum frequency settings determine the minimum and maximum frequencies displayed, as indicated in the track vertical scale.

Gain

Gain can be said to increase the "brightness" of the display. It does this by amplifying the signal by the indicated amount. With the default setting of 20 dB, any frequency band that originally had (before amplification) a level of -20 dB or greater (and now, after amplification has a level greater than 0 dB) will be displayed as white. Similarly the "lower" level bands will also "get brighter".

Color bands

There are five color bands in the default spectrogram color scheme: white, orange, magenta, dark blue, and black. The Range setting determines the spacing between colors.

Here is the the previous image, zoomed in around the 2000 Hz mark to better show the spectrogram colors.

SpectrogramView 02a.png

With the default settings of Gain = 20 dB and Range = 80 dB, the colors correspond to the following levels:

  • anything above -20 dB is indistinguishably white (the tone at -10 dB in the image above is white)
  • levels from -20 dB to -40 dB transition from white to orange (the tone at -30 dB in the image above is light orange)
  • levels from -40 dB to -60 dB transition from orange to magenta (the tone at -60 dB in the image above is magenta)
  • levels from -60 dB to -80 dB transition from magenta to blue (the tone at -70 dB in the image above is purple)
  • levels from -80 dB to -100 dB transition from blue to black (the tone at -100 dB in the image above is black)
  • anything below -100 dB is black.

Other color schemes

Here is the classic color scheme.

SpectrogramView 02b.png

Here is the inverse grayscale color scheme.

SpectrogramView 02c.png

Here is the grayscale color scheme.

SpectrogramView 02d.png

Time Smearing and Frequency Smearing

Spectrogram view uses the Fast Fourier Transform (FFT) to display the frequency information versus time. There is an inherent trade-off between frequency resolution and time resolution.

The image below shows the spectrogram view of a pure 1000Hz tone with two clicks very close together. With a window size of 256 we can see the two clicks.

SpectrogramView 03.png

Changing the Window Size to 2048 results in better frequency resolution (the white band is narrower). However the time resolution is worse. The two clicks have been smeared together into one.

SpectrogramView 05.png

The image below shows the spectrogram view of a musical note with many overtones. With a window size of 256 the overtones are not clear.

SpectrogramView 03a.png

When we change the window size to 2048 we can see the overtones.

SpectrogramView 05a.png

When choosing which window size to use, the general rules are:

  • if you need good time resolution (for example to find clicks) use a smaller window size
  • if you need good frequency resolution (for example to find an annoying tone) use a larger window size.


Vertical Zooming

Magnifiers

You can zoom in on the vertical (frequency) axis by left-clicking in the Vertical Scale and using the magnifiers (when these are enabled inTracks Behaviors Preferences).

In the image below we are about to zoom in on one overtone of the musical note.

SpectrogramView 06.png

After zooming in, the vertical ruler changes to allow greater precision of the scale.

SpectrogramView 07.png

Context menu

Alternatively you can right-click in the Vertical scale to bring up a dropdown context menu which has commands for vertical zooming:

VS spectrogram context menu - Simple mode.png


Effect of Different Window Types

The image above uses the Hann Window Type.

Changing to the Blackman-Harris Window Type gets rid of much of the spectral leakage at the expense of lower frequency resolution (note that the red band near the 2.0k mark is wider).

SpectrogramView 08.png

Changing to a rectangular window causes the track to be redrawn a little faster at the expense of very bad spectral leakage. However, the frequency resolution is better (the red band near the 2.0k mark is narrower).

SpectrogramView 09.png

There is no "right" window type. When you are using spectrogram view to analyze audio, or to track down certain elements in a recording, use whichever window type best highlights the information you are trying to find.


Zero padding factor

Larger values give finer interpolation of the colors along the vertical axis, at the expense of more computation time. This setting does not affect the time vs. frequency resolution tradeoff. In other words it does not give better frequency resolution.

Here is the musical note again, with a zero padding factor of 1:

SpectrogramView 07.png

Here is the same note, with a zero padding factor of 8:

SpectrogramView 15.png


Different Spectrogram views

Logarithmic Spectrogram View

Choosing Logarithmic from the Spectrogram Settings in the Track Control Panel dropdown menu will display a logarithmic vertical scale.

Here again is the musical note with overtones shown in Spectrogram view:

SpectrogramView 05a.png

Here is the same note, this time in Logarithmic Spectrogram view:

SpectrogramView 12.png

Musical overtones form a linear sequence and are generally best viewed in Linear Spectrogram view.

Here is a chromatic scale shown in Spectrogram view:

SpectrogramView 13.png

Here is the same scale, this time in Logarithmic Spectrogram view:

SpectrogramView 14.png

A musical scale is an exponential sequence, and is generally best viewed in Logarithmic Spectrogram view.

Mel, Bark and ERB Spectrogram views

There are three additional styles of Spectrogram view that van be selected from the Track Control Panel dropdown menu or from Preferences:

  • Mel: The name Mel comes from the word melody to indicate that the scale is based on pitch comparisons. See this Wikipedia page.
  • Bark: This is a psychoacoustical scale based on subjective measurements of loudness. It is related to, but somewhat less popular than, the Mel scale. See this Wikipedia page.
  • ERB: The Equivalent Rectangular Bandwidth scale or ERB is a measure used in psychoacoustics, which gives an approximation to the bandwidths of the filters in human hearing. It is implemented as a function ERBS(f) which returns the number of equivalent rectangular bandwidths below the given frequency f. See this Wikipedia page.
The above three scales approximate to linear in low frequencies but to logarithmic in high frequencies, thereby concentrating screen height in middle to high frequencies.

These scales aid spectral editing in that you can see down to 0 Hz without too much screen height devoted to the low frequencies, where thumps might need treating with a highpass filter in Spectral edit multi tool and the geometric mean frequency line is unimportant. In contrast, within higher frequencies you often want to set a notch with multi tool or use parametric equalization, drawing a spectral selection around an undesirable sound with the geometric mean line approximately centered in that selection.

Comparison of Mel, Logarithmic and Linear Spectrogram views:

The image below shows the scaling differences in different Spectrogram views of the same audio:

Mel-Log-Linear Spectrogram annotated.png

Period Spectrogram view

  • Period: This scale is the reciprocal of frequency (1/frequency) and attempts to visualise Enhanced Autocorrelation. It is therefore best used with the "Pitch (EAC)" algorithm, which is the same as the "Pitch (EAC)" View Mode choice in previous Audacity versions. To aid comparison with other scales, small period values (high frequencies) are plotted at the top. This scale tends to give the most screen estate to plotted areas, but Logarithmic scale gives the more correct representation of pitch, because Equal Temperament divides the octave into 12 parts, all of which are equal on a logarithmic scale.


Algorithm

  • Algorithm:
    • Frequencies (default): Audio frequency determines the pitch of a sound. Measured in Hz, higher frequencies have higher pitch. See this Wikipedia article.
    • Reassignment: The method of reassignment sharpens blurry time-frequency data by relocating the data according to local estimates of instantaneous frequency and group delay. This mapping to reassigned time-frequency coordinates is very precise for signals that are separable in time and frequency with respect to the analysis window.
    • Pitch (EAC): Highlights the contour of the fundamental frequency (musical pitch) of the audio, using the Enhanced Autocorrelation (EAC) algorithm. The EAC Algorithm was developed to produce a mathematical representation of the changes of pitch in a piece of audio. The aim was to allow automated comparison of sound files so that two versions of the same tune could be recognized as being similar, even if played in different keys, or on different instruments.
  • Window Size: The dropdown menu lets you choose the size of the Fast Fourier Transform (FFT) window which affects how much vertical (frequency) detail you see. Larger FFT window sizes give more low frequency resolution and less temporal resolution, and are slower.
  • Window type: Determines precisely how the spectrogram is computed. Hann is the default setting. 'Rectangular' is slightly faster than other methods, but introduces some artifacts. All methods give broadly similar results.
  • Zero padding factor: Larger values give finer interpolation of the colors along the vertical axis, at the expense of more computation time. Does not affect the time vs. frequency resolution tradeoff. This option has no effect and is grayed out when the Pitch (EAC) algorithm is selected.


Example of choosing the right settings for the job

Default settings

Here is a music track displayed in Spectrogram view with the default settings of: Window size of 256, Window type of Hann, Minimum Frequency 0 and Maximum frequency 8000. This is not very useful for identifying the different musical elements:

SpectrogramView 10a.png

Logarithmic

Here is the same track displayed in Logarithmic Spectrogram view. This is still not very useful for identifying the different musical elements:

SpectrogramView 10.png

Custom settings

Different settings can improve the visibility of certain elements in the recording. In the image below the settings were:

  • Window size of 2048 (larger window size improves frequency resolution)
  • Window type of Hann (no change from previous)
  • Zero padding factor of 1 (no change from previous)
  • Minimum Frequency 20 (remove display of sub-sonic frequencies)
  • Maximum frequency 22000 (include display of higher frequencies).
SpectrogramView 11.png


Spectral selection

Spectral Selection is used to make selections that include a frequency range as well as a time range on tracks in Spectrogram view. Spectral Selection is used with special spectral editing effects to make changes to the frequency content of the selected audio. Among other purposes, spectral selection and editing can be used for cleaning up unwanted sound, enhancing certain resonances, changing the quality of a voice or removing mouth sounds from voice work. For full details, see Spectral Selection and Editing.

To define a time range combined with a spectral range, hover at a vertical position that you want to be the approximate center frequency to act on then click and drag a selection horizontally. A horizontal line appears beside the I-Beam mouse pointer that defines the center frequency.

Drag vertically, with or without continuing to drag horizontally, to define the range of frequencies to be acted on. A "box" containing a combined frequency and time range is now drawn in a colored tint as shown below (the exact color of the tint will depend on the version of Audacity and the settings of your monitor):

SpectrogramView Edit.png

The frequencies in the spectral selection can then be filtered in various ways, affecting their amplitude, using the special Spectral edit effects in the Effect Menu. This can be useful to remove unwanted extraneous noises from the audio or to apply very specific tone quality changes to it.

Tip In order to define a spectral selection you need to be in Spectrogram view.

Also you must have checked "on" the Enable Spectral Selection in either Spectrograms Preferences, or the dropdown menu of the Track Control Panel choosing Spectrogram Settings.


Multi-view - Spectrogram and Waveform

It is also possible to work with a Spectrogram view and a Waveform view in the same track:

Multi-view mono default 50-50.png
Example of a mono audio track with a Multi-view split 50:50 Waveform/Spectrogram

To get a split Multi-view for a track select Multi-view from the track's Track Control Panel dropdown menu.

For details see Multi-view.