Welcome To Sora Park

전체 글

Metrics 2025.06.26
(w10) Offline ASR (Automatic Speech Recognition) system 2024.06.11
(w06) Bio-inspired computing (BIC) 2024.05.15
(w06) Complex synthesis 2024.05.14

Metrics

welcometosorapark 2025. 6. 26. 05:11

2025. 6. 26. 05:11

Confusion matrix

		Predicted Class
		Positive	Negative
Actual Class	Positive	True Positive (TP)	False Positive (FP)
Actual Class	Negative	False Negative (FN)	True Negative (TN)

Accuracy measures the proportion of correctly classified instances among the total number of instances:

$$ Accuracy=\frac{TP+TN}{TP+TN+FP+FN} $$

Precision measures the proportion of true positives among all positive predictions made by the model.
- How many of the instances predicted as positive were actually positive?

$$ Precision=\frac{TP}{TP+FP} $$

Recall (also known as Sensitivity or True Positive Rate) measures how well the model captures actual positive cases.
- How many did the model correctly identify?

$$ Recall=\frac{TP}{TP+FN} $$

F1-Measure (also known as F1-Score) is the harmonic mean of precision and recall, providing a single score that balances both metrics.
- How well does the model balance both precision and recall?

$$ F1-Measure=2\times \frac{Precision\times Recall}{Precision+Recall} $$

Dice Coefficient (also known as Dice Similarity Coefficient or DSC) measures the overlap between the predicted segmentation mask and the ground truth mask. It ranges from 0 (no overlap) to 1 (perfect overlap).

$$ Dice Coefficient=\frac{2\times \left | A\cap B\right |}{\left | A\right |+\left | B\right |}=\frac{2\times TP}{2\times TP+FP+FN} $$

Jaccard Index (also called Intersection over Union, IoU), the Jaccard Index measures the size of the intersection divided by the size of the union between the predicted and ground truth masks.

$$ Jaccard=\frac{\left | A\cap B\right |}{\left | A\cup B\right |}=\frac{TP}{TP+FP+FN} $$

'MachineLearning > Evaluation' 카테고리의 다른 글

Loss (3)	2025.06.26

(w10) Offline ASR (Automatic Speech Recognition) system

welcometosorapark 2024. 6. 11. 22:54

2024. 6. 11. 22:54

pocketsphinx

Offline speech recognition in real-time on mobile devices, ported from the CMUSphinx project

How to use pocketsphinx:

pip install pocketsphinx

# Pocketsphinx on live input
from pocketsphinx import LiveSpeech
for phrase in LiveSpeech(): print(phrase)

# Pocketsphinx for keywords
from pocketsphinx import LiveSpeech
speech = LiveSpeech(lm=False, keyphrase='move forward', kws_threshold=le-20)
for phrase in speech: print(phrase.segments(detailed=True))

# Specify phrases in an external file
from pocketsphinx improt LiveSpeech
speech = LiveSpeech(lm=False, kws='./kws.text')
for phrase in speech: print(phrase.segments(detailed=True))
# File contents:
#	move forward /le-40/
#	go backwards /le-40/
#	turn left /le-20/
#	turn right /le-20/

# Pocketsphinx and audio files
from pocketsphinx import Pocketsphinx
ps = Pocketsphinx()
ps.decode(audio_file='nines.wav')
ps.hypothesis()
ps.confidence()
ps.best(count=4)

from pocketsphinx import Pocketsphinx
ps = Pocketsphinx(l=False, kws='./kws.txt')
ps.decode(audio_file='nines.wav')
ps.hypothesis()

vosk

An easy to install API which is able to run efficient offline Kaldi models
A neat wrapper around kaldi models

kaldi

Large, open source collection of components for constructing ASR system based on finite-state transducers
- Finite-state transducer
  : Intuitively - a simplified version of an HMM (Hidden Markov Model)
  → The tagging speed when using transducers is up to five times higher than when using the underlying HMMs. The main advantage of transforming an HMM is that the resulting transducer can be handled by finite state calculus.

Mozilla DeepSpeech

An open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry pi 4 to high power GPU servers

'UoL > IntelligentSignalProcessing' 카테고리의 다른 글

(w06) Complex synthesis (0)	2024.05.14
(w04) Filtering (0)	2024.05.01
(w03) Audio processing (0)	2024.04.26
(w01) Digitising audio signals (0)	2024.04.11
(w01) Audio fundamentals (0)	2024.04.11

(w06) Bio-inspired computing (BIC)

welcometosorapark 2024. 5. 15. 19:33

2024. 5. 15. 19:33

The meaning of 'bio-inspired' is as follows:

Looking at the algorithms and functions in the natural world and thinking there must be something we can use computationally here

According to Wikipedia, bio-inspired computing (short for biologically inspired computing) is a field of study which seeks to solve computer science problems models of biology.

It relates to connectionism, social behaviour, and emergence.

Within computer science, bio-inspired computing relates to artificial intelligence (AI) and machine learning (ML).

Bio-inspired computation has emerged as one of the most studied branches of AI during the last decades, according to the referece:

Del Ser, Javier, et al. "Bio-inspired computation: Where we stand and what's next." Swarm and Evolutionary Computation 48 (2019): 220-250.

Brief overview of bio-inspired computing:

1950s: Cybernetics
- "the science of control and communication, in the animal and the machine" - Weiner, 1948
- "Co-ordination, regulation and control will be its themes, for these are of the greatest biological and practical interest" - Ashby, 1961
1960s: Connectionism (neural networks)
- "The perceptron is a minimally constrained 'nerve net' consisting of logically simplified neural elements, which has been shown to be capable of learning to discriminate and to recognise perceptual patterns" - F. Rosenblatt, 'Perceptron Simulation Experiments,' in Proceedings of the IRE, vol. 48, no. 3, pp. 301-309, March 1960, doi: 10.1109/JRPROC.1960.287598
1970s: Genetic algorithms
- The algorithms were introduced in the US by John Holland at University of Michigan
1980s: Artificial life
1990s: Onwards and upwards!

Genetic algorithms (GAs) are a type of search algorithm inspired by natural selection. They mimic the process of evolution to find optimal solutions to problems:

GAs are probabilistic search procedures designed to work on large spaces involving states that can be represented by strings. A space is defined as the space of possible solutions to a problem.
- Probabilistic search: selection and breeding
- States: phenotype
- Strings: genotype

The complete set of genetic material, including all chromosomes, is called a genome.

The genotype encodes the range of characteristics of the individual DNA (the set of genes in genome).
- DNA is passed from one generation to the next: heredity
The phenotype is the actual individual that manifests in the world.

Genotype is set at birth, determined by the DNA inherited from the parents. It's fixed at conception. Phenotype is influenced by both genotype and environment. While genotype provides the blueprint, the environment plays a role in shaping the phenotype. For example, even with a genotype for tallness, nutrition can affect how tall someone becomes.

Nature vs. Nurture:
- DNA = Nature
- Environment = Nurture, defining where in these ranges the characteristics actually fall

The environment plays a crucial role in shaping evolution. It exerts selective pressure on phenotypes, favouring those that are better suited to survive and reproduce.

These successful phenotypes, with their underlying genotypes, are more likely to be passed on to future generations (heredity).

GAs are inspired by the principles of Darwinian evolution. In GAs, we simulate a population of individuals with varying traits (analogous to phenotypes):

Popluation and Variation
: We start with a population of candidate solutions, each representing a potential answer to the problem. Each solution has its own unique set of characteristics (like genes in an organism).
Selection
: We then select solutions that perform well based on a defined fitness function (similar to how successful phenotypes survive in nature).
Reproduction
: These 'fit' solutions are then used to create new solutions (like breeding in evolution). Techniques like crossover (combining characteristics) and mutation (introducing variations) are used to mimic the processes of inheritance and random genetic changes.

Advantages of GAs:

Finding optimal solutions
: A key advantage of GAs is their ability to locate both local and global maxima (points of highest fitness) within a given search space. This makes them superior to older 'hill-climbing' algorithms that can get stuck at local maxima.
Exploring combinations
: GAs go beyond simply testing individual components. They employ a technique called hyperplane sampling. This essentially means they evaluate various different combinations of these components, mimicking how different genes interact in an organism. This allows GAs to explore a broader range of potential solutions and potentially discover more optimal combinations.

How GAs work:

Selection
: Imagine a roulett wheel where each slice represents a member of the population. The size of a slice is determined by a 'fitness function' that evaluates how well that member solves the problem.
- The fitter a member, the larger its slice, giving it a higher chance of being selected for reproduction. This mimics how natural selection favours organisms better suited to their environment.
Hyperplane sampling and schemata
: GAs don't just evaluate individual components of a solution, like bricks in a wall. They can also test different combinations of these components (like building different wall structures).
- This allows them to find better overall solutions by exploring how different components work together. The schema theorem is a complex concept that supports this ability of GAs.
Parallelism
: GAs can leverage parallelism to speed up the search process. There are two main types:
- Implicit parallelism: This uses the population model itself to explore multiple solutions simultaneously.
  → Imagine pairs competing in a tournament, with the winners progressing to the next round. In implicit parallelism, you can only evaluate pairings of individuals in sequence, once at a time.
- Computational parallelism: If you have a computer with multiple cores, you can use them to evaluate several combinations of individuals at the same time, significantly speeding up the search.
  → In computational parallelism, you can evaluate several combinations of individuals at the same time, depending on how many cores you have on your processor.

'UoL > ArtificialIntelligence' 카테고리의 다른 글

(w05) Ethics of game playing AI (0)	2024.05.06
(w02) Markov Decision Process and Deep Q Network (0)	2024.04.16

(w06) Complex synthesis

welcometosorapark 2024. 5. 14. 19:15

2024. 5. 14. 19:15

A Fourier Transform (FT) is an integral transform that takes a function as input and outputs another function that describes the extent to which various frequencies are present in the original function.

Discrete Fourier Transform (DFT)

Since the real world deals with discrete data (samples), the DFT is a crucial tool. It's the discrete version of the FT, specifically designed to analyse finite sequences of data points like those captured by computers.
The DFT transforms these samples into a complex-valued function of frequency called the Discrete-Time Fourier Transform (DTFT).

DFT converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex-valued function of frequency.

Discrete Cosine Transform (DCT)

The DCT is closely related to the DFT. While DFT uses both sines and cosines (complex functions), DCT focuses solely on cosine functions. This makes DCT computationally simpler and often preferred for tasks where the data has even symmetry (like audio signals).

DCT expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.

Fast Fourier Transform (FFT)

The DFT is powerful, but calculating it directly can be computationally expensive for large datasets. This is where the Fast Fourier Transform (FFT) comes in. It's a highly optimized algorithm specifically designed to compute the DFT efficiently, especially when the data length is a power of 2 (like 16, 32, 64 etc.).
The FFT is not a theoretical transform. It is just a fast algorithm to implement the transforms when N=2^k.

FFT is an algorithm that computes the Discrete Fourier Transform (DFT) of a sequence, or its inverse (IDFT). IDFT is a Fourier series, using the DTFT samples as coefficients of complex sinusoids at the corresponding DTFT frequencies.

The first step is to move from simple synthesis to complex synthesis.

What's wrong with the DCT?

The properties of sinusoids (sine waves): frequency, phase, and amplitude

We need a way to store phase and amplitude in the same place. That is called a complex numbers.

A complex number is basically two numbers stuck together.
- Think of them as a way of storing phase and amplitude in one place, with associated mathematics to work with the complex numbers similarly to how we work with 'simple' numbers.
  → Complex numbers have a real and an imaginary part (two numbers).

Complex numbers in Python:

We need a way to computer waveforms from complex numbers: the exponential function

numpy.exp(x) is the exponential function, or
e^x (where e is Euler's number: approximately 2.718281)

→ numpy.exp(1j * x) converts x into a complex number then does exp!

Simple synthesis:

def synthesise_simple(amps, fs, ts):
    args = np.outer(ts, fs)
    M = np.cos(np.pi*2 * args)
    ys = np.dot(M, amps)
    return ys

Complex synthesis:

def synthesise_complex(amps, fs, ts):
    args = np.outer(ts, fs)
    M = np.exp(1j * np.pi*2 * args) # Swap out from np.cos(np.pi*2 * args)
    ys = np.dot(M, amps)
    return ys

The complex analysis problem:

Given a signal and a set of frequencies, how can we find the amplitude and phase of each frequency component?

Simple analysis with numpy.linalg.solve and the DCT:

def analyse_simple(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.cos(np.pi*2 * args)
    amps = np.linalg.solve(M, ys)
    return amps
    
# More constrained version
def dct_iv(ys):
    N = len(ys)
    ts = (0.5 + np.arange(N)) / N
    fs = (0.5 + np.arange(N)) / 2
    args = np.outer(ts, fs)
    M = np.cos(np.pi*2 * args)
    amps = np.dot(M, ys) / (N / 2)
    return amps

Complex analysis with np.linalg.solve:

def analyse_complex(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.exp(1k * np.pi*2 * args)
    amps = np.linalg.solve(M, ys)
    return amps

def analyse_nearly_dft(ys, fs, ts):
    N = len(fs)
    args = np.outer(ts, fs)
    M = np.exp(1j * np.pi*2 * args)
    amps = M.conj().transpose().dot(ys) / N # Swap out from 'amps = np.linalg.solve(M, ys)
    return amps

Final steps for the actual DFT:

# Calculate the frequency and time matrix
def synthesis_matrix(N):
    ts = np.arange(N) / N
    fs = np.arange(N)
    args = np.outer(ts, fs)
    M = np.exp(1j * np.pi*2 * args)
    return M
    
# Transform
def dft(ys):
    N = len(ys)
    M = synthesis_matrix(N)
    amps = M.conj().transpose().dot(ys) # No more '/ N'!
    return amps

Fast convolution with the DFT

Convolving signals in the time domain is equivalent to multiplying their Fourier transforms in frequency domain.

The inverse DFT

def idft(ys):
    N = len(ys)
    M = synthesis_matrix(N)
    amps = M.dot(ys) / N
    return amps

'UoL > IntelligentSignalProcessing' 카테고리의 다른 글

(w10) Offline ASR (Automatic Speech Recognition) system (0)	2024.06.11
(w04) Filtering (0)	2024.05.01
(w03) Audio processing (0)	2024.04.26
(w01) Digitising audio signals (0)	2024.04.11
(w01) Audio fundamentals (0)	2024.04.11

PREV 이전 1 2 3 4 5 ···19 NEXT 다음

Welcome To Sora Park

전체 글

Metrics

'MachineLearning > Evaluation' 카테고리의 다른 글

(w10) Offline ASR (Automatic Speech Recognition) system

'UoL > IntelligentSignalProcessing' 카테고리의 다른 글

(w06) Bio-inspired computing (BIC)

'UoL > ArtificialIntelligence' 카테고리의 다른 글

(w06) Complex synthesis

'UoL > IntelligentSignalProcessing' 카테고리의 다른 글

+ Recent posts

티스토리툴바