From Audacity Development Manual
Jump to: navigation, search
This page gives very brief explanations of technical terms related to digital audio, with some links to Wikipedia for much more comprehensive explanations.

General Terms

Term Description
Wikipedia1.png ADC: Analog to digital converter. The part of an audio interface which records an analog, real world sound like a voice or guitar and converts it to a numerical representation of the audio that a computer can manipulate.
Wikipedia1.png Algorithm: A set of steps or a procedure that will produce a desired result.
Wikipedia1.png Aliasing: Aliasing is an effect that causes different audio signals to become indistinguishable (or aliases of one another) when sampled. It also refers to the distortion or artifact that results when the signal reconstructed from samples is different from the original continuous signal.
Wikipedia1.png ALSA: A Linux kernel component for providing device drivers for audio interfaces. Known as an audio host in Audacity.
Wikipedia1.png Amplitude: The level or magnitude of a signal. Audio signals with a higher amplitude will sound louder.
Wikipedia1.png Artifact: Sonic material that is accidental or unwanted, resulting from the editing of another sound.
Wikipedia1.png ASIO: Audio Stream Input/Output (ASIO) is a computer audio interface driver protocol for digital audio on Windows, created by Steinberg. It provides a low-latency, multi-channel interface between a software application and the audio interface.
Wikipedia1.png Audacity Project Format (.aup): The format in which Audacity formerly stored its projects. This consists of a reference file with the extension .aup and a large number of small audio files with extension .AU. This structure makes it quicker for Audacity to move audio around - ideal for cutting and pasting audio in a project. The project format for Audacity 2.4.2 and earlier.
Wikipedia1.png Audacity Unitary Project format (.aup3): The format in which Audacity stores its projects. This consists of a SQLite database file with the extension .aup3. The project format for Audacity 3.0.0 and later.
Wikipedia1.png Audio CDs: CDs containing PCM audio data in accordance with the Red Book standard. They can be played on any standalone CD player as well as on computers.
Wikipedia1.png Band-pass filter: A band-pass filter is a filter that passes frequencies within a certain range and rejects (attenuates) frequencies outside that range. A band-boost filter is similar to a band-pass filter except that it amplifies frequencies within a certain range and passes frequencies outside that range untouched.
Wikipedia1.png Band-stop filter: A band-stop filter or band-rejection filter is a filter that passes most frequencies unaltered, but attenuates those in a specific range to very low levels. It is the opposite of a band-pass filter. A band-cut filter is a band-stop filter that attenuates the frequencies in a given frequency band by a specified amount. A notch filter is a band-stop filter with a narrow stopband (high Q factor).
Wikipedia1.png Batch Processing: Automation of a series of repetitive tasks on a computer so that the tasks run without manual intervention. In the early days of computers this was done by processing stacks of punch cards. In Audacity, repetitive tasks are handled by creating a Macro. The Macro can apply a predetermined sequence of effects to the current project, or can be run unattended to apply effects and/or format conversions to a batch of external audio files.
Wikipedia1.png Bit: A measure of quantity of data. A bit is one binary digit, a 0 or a 1.
Wikipedia1.png Bit Rate: The number of computer bits conveyed or processed per unit of time. Normally expressed in kilobits per second (kbps). For an uncompressed, PCM file, kbps bit rate is sample rate multiplied by sample format multiplied by number of channels, divided by 1000, giving 1411 kbps for Red Book WAV or AIFF. Rates are much lower for compressed or lossy formats like MP3. For MP3 at constant bit rate, reducing sample rate does not reduce the bit rate and hence does not make the MP3 smaller, except for 11,025 Hz and below.
Wikipedia1.png CBR: Constant Bit Rate - In this format, the rate at which audio uses its data bits does not vary. Silence uses as much disk space as audible sound.
Wikipedia1.png Cepstrum: The cepstrum of an audio signal is related to the spectrum, but presents the rate of change in the different spectrum bands. It's particularly useful for properties of vocal tracks and is used, for example, in software to identify speakers by their voice characteristics.
Wikipedia1.png Clipping: Distortion to sound, usually due to the audio being too loud. Unless the original audio is 32-bit sample format, waveforms louder than 0 dB will have their tops lopped off (flattened) at 0 dB, rather than showing smooth curves. Clipping can also be an intentional distortion effect that lops off part of the waveform, reducing its amplitude and changing its frequency content.
Wikipedia1.png Codec: A computer program capable of encoding and/or decoding a digital data stream. The term is a portmanteau (a blending of two or more words) of coder and decoder.
Wikipedia1.png Companding: Refers to the process of compressing the dynamic range of an audio signal before storage or transmission, then expanding the signal on retrieval or reception. The term is a portmanteau (a blending of two or more words) of compressing and expanding.
Wikipedia1.png Compressed Audio Format: Any format that will reduce the space required in storing or representing an audio signal. Space savings can be made for example by discarding certain frequency components which may be inaudible. MP3 takes this approach. Other formats such as FLAC compress without audio loss, but achieve lower compression rates.
Wikipedia1.png Compression: A process that tends to even out the overall volume level by increasing the level of softer passages and decreasing the level of louder passages. See also Compressed Audio Format.
Wikipedia1.png Cycle: An audio tone consists of an oscillating sound pressure on the ear. One cycle is one full transition of positive pressure through to negative pressure, back to positive pressure again.
Wikipedia1.png DAC: Digital to analog converter. The part of an audio interface which plays back a numerical representation of audio as an analog, real world sound like a voice or guitar.
Wikipedia1.png Data CDs: Data CDs contain data intended to be read directly by a computer. The data may include audio and any other types of file such as images and documents. Most standalone CD players will not play data CDs, but some DVD players will. Including compressed audio files on a data CD can greatly increase the playing time compared to audio CDs.
Wikipedia1.png dB: Decibels. A logarithmic unit (typically of sound pressure) describing the ratio of that unit to a reference level. The B is capitalized as this is a name-derived unit named after Alexander Graham Bell.
Wikipedia1.png DC Offset: An offsetting of a signal from zero. A signal with DC Offset would appear in the Audacity Default Waveform view to be not centered on the 0.0 horizontal line. DC Offset results in reduced headroom and can cause clicks at the start and end or distortion after running effects. It can be corrected in Audacity by running Normalize.
Wikipedia1.png Dither: Intentional noise which is added so as to randomize the quantization errors (rounding errors) that occur when downsampling the Bit Depth of an audio stream to a lower resolution than the current format.
Wikipedia1.png Dynamic Range: The difference between the loudest and softest part in an audio recording, the maximum possible being determined by its sample format. For a device, the difference between its maximum possible undistorted signal and its Noise Floor.
Wikipedia1.png Dropout: A dropout is a momentary loss of signal in your recording. Dropouts may be caused by a disk drive that cannot keep up with the recording. This can happen, for example, with a slow USB or network drive, or if antivirus software is slowing writing to disk, or if other activity on the computer is slowing the computer down.
Wikipedia1.png Exponential: A non-linear relationship where a change in value is proportional to the current level. If you double the value in a time period, it doubles again in the next period; if you halve the level in a time period, it halves again in the next period. For an exponential fade in, the curve becomes "steeper" with time; an exponential fade out becomes "flatter" with time. See also Logarithmic.
Wikipedia1.png FFT : Fast Fourier Transform. A method for performing Fourier transforms quickly.
Wikipedia1.png File name extension: A suffix of three or four characters added to a file name which defines the format of its contents. The suffix is separated from the file name by a dot (period), as in "song.mp3". The extension of common formats is often hidden on Windows, but can be turned on in the system's Folder Options.
Wikipedia1.png Filter: A sound effect that lets some frequencies through and suppresses others.
Wikipedia1.png Fourier Transform: A method for converting a waveform to a spectrum, and back.
Wikipedia1.png Frequency: Audio frequency determines the pitch of a sound. Measured in Hz, higher frequencies have higher pitch. See this Wikipedia article.
Wikipedia1.png Gain: A measure of how much a signal is amplified. Usually expressed in dB, positive gain increases the amplitude of a signal, while negative gain reduces it.
Wikipedia1.png Harmonics: Most sounds are made up of a mix of different frequencies. In musical sounds, the component frequencies are simple multiples of each other, for example 100 Hz, 200 Hz, 300 Hz. These are called harmonics of the lowest frequency sound.
Wikipedia1.png Headroom: The difference between the peak level of an audio track and the maximum level that can be achieved without clipping. Recording at -6 dB below maximum level is a good compromise between getting far enough above the noise floor while having sufficient headroom to make edits that increase loudness.
Wikipedia1.png High Pass Filter: A filter that lets high frequencies through
Wikipedia1.png Hz: Hertz. Measures a frequency event in number of cycles per second. See Frequency and Sample Rate, both of which are measured in Hz.
Wikipedia1.png ID3: ID3 is a metadata container most often used in conjunction with the MP3 audio file format. It allows information such as the title, artist, album, track number, and other information about the file to be stored in the file itself.
Wikipedia1.png Interpolation: Completing waveform data by estimating missing values. The values are estimated as being between other known values. To convert a waveform recorded at 22,000 Hz or samples per second to one at a higher rate such as 44000 samples per second requires interpolation.
Wikipedia1.png IVR: Interactive Voice Response is a technology that allows a computer to interact with humans through the use of voice and DTMF tones input via keypad.
Wikipedia1.png kHz: One kilohertz (kHz) is 1000 Hz. For example, the common audio sample rate of 44,100 Hz can also be expressed as 44.1 kHz.
Wikipedia1.png LAME: A software library that converts audio to MP3 format.
Wikipedia1.png Latency: A short delay between an audio signal being sent and received. In computer audio this is due to analog-to-digital and digital-to-analog conversion. Most commonly refers to the delay between recording a sound and a) hearing its playthrough or b) laying it down on disk.
Wikipedia1.png Linear: A simple, directly proportional, one-to-one, "straight-line" relationship. This term is used to contrast with exponential, logarithmic, or other complex relationships.
Wikipedia1.png Logarithmic: A non-linear relationship where one item is proportional to the logarithm of the other item. So for a logarithmic fade in, the curve becomes "flatter" with time; a logarithmic fade-out becomes "steeper" with time. Some measures, such as dB, are logarithmic by definition. See also Exponential.
Wikipedia1.png Lossless: A format that does not lose any information. It may be either a size-compressing format like FLAC where the quality is exactly as good as before compression, or an uncompressed format like WAV.
Wikipedia1.png Lossy: A format for size-compressing audio that may sacrifice a small amount of quality in order to reduce the file size more than lossless compression. Examples are MP3 and OGG.
Wikipedia1.png Low Pass Filter: A filter that lets low (bass) frequencies through.
Wikipedia1.png Metadata: Metadata tags - digital audio files can be labeled with more information than can be contained in just the file name, that descriptive information is called the audio tag or audio metadata. The metadata for compressed and uncompressed digital music is often encoded in the ID3 tag.
Wikipedia1.png MME: Multimedia Extensions to Windows 3 appeared in Autumn 1991 as the first standardized Windows interface to support audio interfaces. It is one of the "audio hosts" selectable in Device Toolbar. MME was superseded in 1995 by Windows DirectSound.
Wikipedia1.png MP3 CDs: A specific type of data CD containing only MP3 audio files. All computers can play them as can some DVD and portable MP3 players.
Wikipedia1.png Noise Floor: A level or amplitude representing the amount of near-continuous background noise present in the signal. A background hiss would raise the noise floor, and could prevent a faint signal (one below the noise floor) being heard at all. Unwanted sporadic noise such as a member of the audience coughing is noise, but it does not contribute to the noise floor.
Wikipedia1.png Notch filter: A notch filter is a band-stop filter with a narrow stopband (high Q factor).
Wikipedia1.png Oversampling: Oversampling is the process of sampling a signal with a sampling frequency significantly higher than the Nyquist rate. Oversampling improves resolution, reduces noise and helps avoid aliasing and phase distortion by relaxing anti-aliasing filter performance requirements.
Wikipedia1.png Pan: Panning is the spread of a sound signal (either monaural or stereophonic pairs) into a new stereo or multi-channel sound field.
Wikipedia1.png PCM: Pulse code modulation. A method of converting audio into binary numbers to represent it digitally, then back to audio. The waveform is measured at evenly spaced intervals and the amplitude of the waveform noted for each measurement.
Wikipedia1.png Pitch: Generally synonymous with the fundamental frequency of a note, but in music, often also taken to imply a perceived measurement that can be affected by overtones above the fundamental.
Wikipedia1.png Red Book: The most widely used standard for representing audio on CD, requiring stereo, 16-bit, 44,100 Hz.
Wikipedia1.png Resampling: Converting a sampled signal from one sample rate to another without changing the length of the audio (hence without changing the playback speed or pitch). This necessarily changes the number of samples that the audio contains. Resampling can also mean converting from one sample format to another which changes the precision of each sample but not the number of samples.
Wikipedia1.png RMS: Root Mean Square, sometimes also abbreviated in technical literature as "rms". A method of calculating a numerical value for the average sound level of a waveform. The RMS level (colored lighter blue in Audacity) equates very approximately to how loud the audio sounds.
Wikipedia1.png Roll-off: A gradually reduced response at the upper or lower ends of the working frequency range.
Wikipedia1.png Sample: A discrete value at a point in a waveform representing the audio at that point. Also the act of taking a sequence of such values. All digital audio must be sampled at discrete points. By contrast, analog audio (such as the sound from a loudspeaker) is always a continuous signal.
Wikipedia1.png Sample Rate: Measured in Hz like frequency, this represents the number of digital samples captured per second in order to represent the waveform. See Sample Rates for more details.
Wikipedia1.png Sample Format: Also known as Bit Depth or Word Size. The number of computer bits present in each audio sample. Determines the dynamic range of the audio. See Sample Format - Bit Depth for more details.
Wikipedia1.png Snapshot: A read-only copy of the project database frozen at a point in time enabling Recovery, following a crash, to recover to the last edit you made.
Wikipedia1.png Spectrum: Presentation of a sound in terms of its component frequencies.
Uncompressed Audio Format: An audio format in which every sample of sound is represented by a binary number. Examples are WAV or AIFF.
Wikipedia1.png VBR: Variable Bit Rate. A method for compressing audio which does not always use the same number of bits to record the same duration of sound.
Wikipedia1.png Waveform: A visual representation of an audio signal.
Wikipedia1.png Windows DirectSound: A Windows interface between applications (such as Audacity) and the audio interface driver. It is one of the "audio hosts" selectable in Device Toolbar. DirectSound was released in 1995 as a replacement for the older MME and has an option to bypass the kernel mixer and so reduce latency.
Wikipedia1.png Windows WASAPI: The most recent Windows interface between applications (such as Audacity) and the audio interface driver. It is one of the "audio hosts" selectable in Device Toolbar. WASAPI was first officially released in 2007.
Wikipedia1.png Zero Crossing: The point where a line joining the audio samples crosses the zero horizontal line.

Audio File Formats

There are numerous audio file formats for storing audio on a computer.
  • WAV format is widely used on Windows and is needed for creating an audio CD.
  • AIFF is widely used on Apple's operating systems.
  • Compressed formats (like MP3 and AAC) are used on portable music players.
Term Description
Wikipedia1.png AAC: A lossy, size-compressed audio codec and its reference audio codec implementation. AAC files usually have M4A extension, with variants such as M4P (protected) and M4R (ringtones). Usually gives better quality for the same bit rate than the older MP3 format. It is the default audio format for Apple Music/iTunes\xc2\xae, iPhone\xc2\xae, iPad\xc2\xae and iPod\xc2\xae and Sony PlayStation 3.
Wikipedia1.png AIFF: A container format, almost always used for lossless, uncompressed, PCM audio with similar file size to WAV. Although the classic AIFF format is in Apple's earlier Big-endian byte order, Mac OS X /macOS has always written "AIFF-C/sowt" files. These have the same AIFF extension as classic AIFF and are identical to it except for being Little-endian like WAV format. Rarely, files with AIFC extension can contain compressed formats.
Wikipedia1.png Allegro: A text-based language for music representation. In common with MIDI it represents notes, tempo, and other commands that may instruct a synthesizer or sampler what to play. In Audacity, Allegro (.gro) files may be imported as Note tracks or exported from Note tracks
Wikipedia1.png Apple Lossless: Also known as Apple Lossless Audio Codec (ALAC) or Apple Lossless Encoder (ALE), this is a lossless, size-compressed codec usually stored within an MP4 container format with M4A extension. ALAC is Apple's equivalent of FLAC (which is not officially supported by Apple).
Wikipedia1.png AU: A container format, formerly used by Audacity (2.4.2 and earlier) for storage of lossless, uncompressed, PCM audio data. Not be confused with Sun/NeXT AU files, which are usually U-Law encoded PCM files but may be headerless.
Wikipedia1.png CAF: A container format for storing audio, developed by Apple Inc. It is designed to overcome limitations of older digital audio formats. Unlike WAV and AIFF its size is virtually unlimited and can theoretically save hundreds of years of recorded audio due to its use of 64-bit file offsets.
Wikipedia1.png FLAC: An Open Source lossless, size-compressed audio format
Wikipedia1.png GSM 6.10: Global System for Mobile communications is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe protocols for second-generation (2G) digital cellular networks used by mobile phones. As of 2014 it has become the default global standard for mobile communications - with over 90% market share, operating in over 219 countries and territories.
Wikipedia1.png MIDI: MIDI is a small-sized file format which stores how to play notes, widely used for keyboard instruments. It is not an audio file format like WAV that uses thousands of samples to record the full sound of the notes actually being played.
Wikipedia1.png MP2: A lossy, size-compressed audio format mainly used by the broadcast media
Wikipedia1.png MP3: A lossy, size-compressed audio format which is the main format for transmitting audio over the Internet
Wikipedia1.png Opus: An Open Source size-compressed and lossy audio format developed for Internet streaming. It uses both SILK (used by Skype) and CELT (from Xiph.Org) codecs and supports variable bit rates from 6 kbps to 510 kbps.
Wikipedia1.png Ogg Vorbis: An Open Source lossy, size-compressed audio format, strictly speaking the Vorbis format in a container having OGG extension.
Wikipedia1.png RAW: RAW Audio format is an audio file format for storing uncompressed audio in raw form. Comparable to WAV or AIFF in size, RAW Audio file does not include any header information (sampling rate, bit depth, endian, or number of channels).
Wikipedia1.png RF64: A container format, based on the Microsoft RIFF/WAVE format and Wave (WAV) Format. It allows for more than 4 GB file sizes when needed (the maximum filesize is now approximately 16 exabytes, which is effectively unlimited.
Wikipedia1.png WAV: A container format, almost always used for lossless, uncompressed, PCM audio. The format is in Microsoft's Little-Endian byte order.
Wikipedia1.png WMA: A container format. Windows Media Audio is a lossy, size-compressed audio format developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs.