Signal averaging is a signal processing technique that tries to remove unwanted random disturbances from a signal through the process of averaging.

  • Averaging often takes the form of summing a series of signal samples and then dividing that sum by the number of individual samples.

The following equation represents a N-point moving average filter, with input the array x and outputs the averaged array y:

$$ y(n)=\frac{1}{N}\sum_{k=0}^{N-1}x(n-k) $$

 

Implementing in Python:

### 1. Simple example
import numpy as np

values = np.array([3., 9., 3., 4., 5., 2., 1., 7., 9., 1., 3., 5., 4., 9., 0., 4., 2., 8., 9., 7.])
N = 3

averages = np.empty(len(values))
for i in range(1, len(values)-1):
    averages[i] = (values[i-1]+values[i]+values[i+1])/N

# Preserve the edge values
averages[0] = values[0]
averages[len(values)-1] = values[len(values)-1]
### 2. Use numpy.convolve
window = np.ones(3)
window /= sum(window)
averages = np.convolve(values, window, mode='same')
### 3. Use scipy.ndimage.uniform_filter1d
from scipy.ndimage.filters import uniform_filter1d
averages = uniform_filter1d(values, size=3)

 

Averaging low-pass filter

In signal processing, the moving average filter can be used as a simple low-pass filter. The moving average filter smooths out a signal, removing the high frequency components from it, and this is what a low-filter does!

 

FIR (Finite Impulse Response) filers

In signal processing, a FIR filer is a filter whose impulse response (or response to any finite length input) is of finite duration, because it settles to zero in finite time. For a general N-tap FIR filter, the nth output is:

$$  y(n)=\sum_{k=0}^{N-1}h(k)x(n-k) $$

$$ h(n)=\frac{1}{N} $$

$$ n=0,1,...,N $$

This fomula has already been used above, since the moving average filter is a kind of FIR filter.

 

Implementing in Python:

import numpy as np
from thinkdsp import SquareSignal, Wave

# suppress scientific notation for small numbers
np.set_printoptions(precision=3, suppress=True)

# The wave to be filtered
from thinkdsp import read_wave
my_sound = read_wave('../Audio/429671__violinsimma__violin-carnatic-phrase-am.wav')
my_sound.make_audio()

# Make a 5-tap FIR filter using the following coefficients: 0.1, 0.2, 0.2, 0.2, 0.1
window = np.array([0.1, 0.2, 0.2, 0.2, 0.1])

# Apply the window to the signal using np.convolve
filtered = np.convolve(my_sound.ys, window, mode='same')
filtered_violin = Wave(filtered, framerate=my_sound.framerate)
filtered_violin.make_audio()

 

LTI (Linear Time Invariant) systems

  • It it happens to be a LTI system, we can represent its behaviour as a list of numbers known as an IMPULSE RESPONSE.

An impulse response is the response of an LTI system to the impulse signal.

  • An impulse is one single maximum amplitude sample.

Example of an impulse:

There is one stalk that is reaching up to 0.

Example of an impulse response:

It is a bunch of stalks (a set of numbers).

Given an impulse response, we can easily process any signal with that system using convolution.

 

  • We can derive the output of a discrete linear system, by adding together the system's response to each input sample separately. This operation is known as convolution.

$$ y[n]=x[n]*h[n]=\sum_{m=0}^{\infty }x[m]h[n-m] $$

※ The convolution operator is indicated by the '*' operator

 

Three characteristics of LTI systems

Linear systems have very specific characteristics which enable us to do the convolution:

  1. Homogeneity (or linear with respect to scale)
    : Multiply the signal by 0.5 (scale it by 0.5), shove both the signals through the systemsl, and get the outputs
    1) Convolve the signal with the system
    2) Receive the output
    → It doesn't matter if the signal is scaled becuse we know tht it will produce the same scaled output.
  2. Additivity (decompose)
    : Separately process simple signals and add results together
  3. Shift invariance
    : Shift a signal across (e.g. delay by one unit)

Implement an impulse response by hand:

  • Signal = [1.0, 0.75, 0.5, 0.75, 1.0]
  • System = [0.0, 1.0, 0.75, 0.5, 0.25]
    • Decompose:
      • input = [0.0, 0.0, 0.0, 0.0, 0.0]
      • input = [0.0, 1.0, 0.0, 0.0, 0.0]
      • input = [0.0, 0.0, 0.75, 0.0, 0.0]
      • input = [0.0, 0.0, 0.0, 0.5, 0.0]
      • input = [0.0, 0.0, 0.0, 0.0, 0.25]
    • Scale:
      • output = [0.0, 0.0, 0.0, 0.0, 0.0]
      • output = [1.0, 0.75, 0.5, 0.75, 1.0]
      • output = [0.75, 0.5625, 0.375, 0.5625, 0.75]
      • output = [0.5, 0.375, 0.25, 0.375, 0.5]
      • output = [0.25, 0.1875, 0.125, 0.1875, 0.25]
    • Shift:
      • output = [0.0, 0.0, 0.0, 0.0, 0.0]
      • output = [0.0, 1.0, 0.75, 0.5, 0.75, 1.0] // delay by one unit
      • output = [0.0, 0.0, 0.75, 0.5625, 0.375, 0.5625, 0.75] // delay by two units
      • output = [0.0, 0.0, 0.0, 0.5, 0.375, 0.25, 0.375, 0.5] // delay by three units
      • output = [0.0, 0.0, 0.0, 0.0, 0.25, 0.1875, 0.125, 0.1875, 0.25] // delay by four units
    • Synthesise (add the components back together):
      • output (result) = [0.0, 1.0, 1.5, 1.5625, 1.75, 2.0, 1.25, 0.6875, 0.25]

Implement in Python:

import numpy as np

def convolve(signal, system):
    rst = np.zeros(len(signal) + len(system) - 1)
    for sig_idx in range(len(signal)):
        sygval = signal[sig_idx]
        for sys_idx in range(len(system)):
            sysval = system[sys_idx]
            scaled = sygval * sysval
            out_idx = sig_idx + sys_idx
            rst[out_idx] += scaled
    return rst

 

'IntelligentSignalProcessing' 카테고리의 다른 글

(w10) Offline ASR (Automatic Speech Recognition) system  (0) 2024.06.11
(w06) Complex synthesis  (0) 2024.05.14
(w03) Audio processing  (0) 2024.04.26
(w01) Digitising audio signals  (0) 2024.04.11
(w01) Audio fundamentals  (0) 2024.04.11

Analogue to Digital Converter (ADC)

  1. The microphone (transducer) converts air pressure changes into an electrical signal.
  2. The electrical energy generated by a microphone is usually quite small, so we need a device, called a preamplifier, to convert the weak electrical signal generated by the microphone into an output signal strong (larger) enough to be digitised.
  3. ADC samples incoming analogue voltage at a specific rate and assigns a digital value to each sample. These digital values are then usable by the digital devices.

The act of assigning an amplitude value to the sample is called quantising and the number of amplitude values available to the ADC is called the sample resolution.

 

Once the audio has entered the digital domain, the possibilities for editing, processing, and mixing are nearly endless. When digital audio is played back, the signal is first sent through a DAC.

 

Digital to Analogue Converter (DAC)

In the opposite case,

  1. DAC converts the digital signal back into an analogue electrical signal.
  2. An amplifier amplifies the level of the signal and sends this signal to a speaker or headphones that will generate the sound wave.

We can perceive the sound wave as a sound. In the context of digital audio playback, the DAC is built into the audio output of the computer or into an audio interface. Some computer speakers connect directly to the computer via USB and therefore have DACs built into them.

 

Audio recording path summary

  1. Vibrations in the air are converted to an analogue electrical signal by a microphone.
  2. The microphone signal is increased by a preamplifier.
  3. The preamplifier signal is converted to a digital signal by an ADC.
  4. The digital signal is stored, edited, processed, mixed, and mastered in software.
  5. The digital signal is played back and converted to an analogue electrical signal by a DAC.
  6. The analogue electrical signal is made larger by an amplifier.
  7. The output of the amplifier is converted into vibrations in the air by a loudspeaker.

 

Sampling rate (frequency)

  • Each measurement of the waveform's amplitude is called a sample.
  • The number of measurements (samples) taken per second is called the sampling rate (Hz).

The faster we sample the better the quality, but the more samples we take the more memory size we need.

 

The Nyquist-Shannon sampling theorem

The Nyquist theorem defines the minimum sample rate for the highest frequency that we want to measure. The Nyquist frequency, also called the Nyquist limit, is the sample rate divided by two.

  • This theorem says that the signal above the Nyquist frequency is not recorded properly by ADCs, introducing artificial frequencies in a process called aliasing. If the Nyquist theorem is not obeyed, higher frequency information is recorded in too low a sample rate, resulting in aliasing artefacts.
  • The sampling rate must be at least twice the frequency of the signal being sampled.

An anti-aliasing filter is a low-pass filter that eliminates frequencies above the Nyquist frequency before audio reaches the ADC.

 

Bit depth

  • Bit depth, also known as sample width and quantisation level, is the number of bits used to record the amplitude measurements.
  • The more bits we use, the more accurately we can measure the analogue waveform and the more hard disk space or memory size we need.

Common bit widths used for digital sound representation are 8, 16, 24, and 32 bits.

더보기

For example, what is approximately the size of an uncompressed stereo audio file, the sound time of which is one minute at a sampling frequency of 44.1 kHz and a resolution of 16 bits? The answer is as follows:

44,100 samples/second * 16 bits * 60 seconds * 2 channels = 84,672,000 bits = 10.584 MB

 

Clipping

Clipping occurs in an ADC when the analog input signal exceeds the converter's maximum capacity. This overload forces the ADC to assign either the maximum or minimum digital value to affected samples, resulting in a flat-topped or flat-bottomed waveform. This distortion is undesirable and should be avoided. If the level meter reads zero (or the clipping indicator turns red), this means the signal is clipping!

 

Digital audio representation

All these processes generate an array of samples that we can use to create a new file to process the audio in real-time in the computer, to store the data in a CD, etc. There are two ways of representing digital audio:

1. The time domain representation gives the amplitude of the signal at the instance of time during which it was sampled.

  • Time can be expressed as a decimal format. It can also be expressed in terms of samples, for example, we have seconds in the graph.
  • Amplitude has normalised values between 1 and -1. In using the normalised values, we can find programs that in decibels or even in the sample values.

We can use decibels to represent the values of the samples, but that is not the same as dB SPL. dB FS stands for decibels Full Scale.

  • For example, in Audacity, the meters are in decibels that go from zero to minus infinite.

2. The frequency domain gives us information about the frequencies of a sound (sounds can be composed of millions of frequencies as they don't just have one frequency).

A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time.

A spectrogram is very similar to the frequency domain representation, but it provides more information about the time-varying nature of vibration, while frequency domain analyses provide information at a specific moment or as an average over time.

'IntelligentSignalProcessing' 카테고리의 다른 글

(w10) Offline ASR (Automatic Speech Recognition) system  (0) 2024.06.11
(w06) Complex synthesis  (0) 2024.05.14
(w04) Filtering  (0) 2024.05.01
(w03) Audio processing  (0) 2024.04.26
(w01) Audio fundamentals  (0) 2024.04.11

+ Recent posts