Dice Coefficient (also known as Dice Similarity Coefficient or DSC) measures the overlap between the predicted segmentation mask and the ground truth mask. It ranges from 0 (no overlap) to 1 (perfect overlap).
Jaccard Index (also called Intersection over Union, IoU), the Jaccard Index measures the size of the intersection divided by the size of the union between the predicted and ground truth masks.
Offline speech recognition in real-time on mobile devices, ported from the CMUSphinx project
How to use pocketsphinx:
pip install pocketsphinx
# Pocketsphinx on live input
from pocketsphinx import LiveSpeech
for phrase in LiveSpeech(): print(phrase)
# Pocketsphinx for keywords
from pocketsphinx import LiveSpeech
speech = LiveSpeech(lm=False, keyphrase='move forward', kws_threshold=le-20)
for phrase in speech: print(phrase.segments(detailed=True))
# Specify phrases in an external file
from pocketsphinx improt LiveSpeech
speech = LiveSpeech(lm=False, kws='./kws.text')
for phrase in speech: print(phrase.segments(detailed=True))
# File contents:
# move forward /le-40/
# go backwards /le-40/
# turn left /le-20/
# turn right /le-20/
# Pocketsphinx and audio files
from pocketsphinx import Pocketsphinx
ps = Pocketsphinx()
ps.decode(audio_file='nines.wav')
ps.hypothesis()
ps.confidence()
ps.best(count=4)
from pocketsphinx import Pocketsphinx
ps = Pocketsphinx(l=False, kws='./kws.txt')
ps.decode(audio_file='nines.wav')
ps.hypothesis()
vosk
An easy to install API which is able to run efficient offline Kaldi models
A neat wrapper around kaldi models
kaldi
Large, open source collection of components for constructing ASR system based on finite-state transducers
Finite-state transducer : Intuitively - a simplified version of an HMM (Hidden Markov Model) → The tagging speed when using transducers is up to five times higher than when using the underlying HMMs. The main advantage of transforming an HMM is that the resulting transducer can be handled by finite state calculus.
Mozilla DeepSpeech
An open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry pi 4 to high power GPU servers
Looking at the algorithms and functions in the natural world and thinking there must be something we can use computationally here
According to Wikipedia, bio-inspired computing (short for biologically inspired computing) is a field of study which seeks to solve computer science problems models of biology.
It relates to connectionism, social behaviour, and emergence.
Within computer science, bio-inspired computing relates to artificial intelligence (AI) and machine learning (ML).
Bio-inspired computation has emerged as one of the most studied branches of AI during the last decades, according to the referece:
Del Ser, Javier, et al. "Bio-inspired computation: Where we stand and what's next." Swarm and Evolutionary Computation 48 (2019): 220-250.
Brief overview of bio-inspired computing:
1950s: Cybernetics
"the science of control and communication, in the animal and the machine" - Weiner, 1948
"Co-ordination, regulation and control will be its themes, for these are of the greatest biological and practical interest" - Ashby, 1961
1960s: Connectionism (neural networks)
"The perceptron is a minimally constrained 'nerve net' consisting of logically simplified neural elements, which has been shown to be capable of learning to discriminate and to recognise perceptual patterns" - F. Rosenblatt, 'Perceptron Simulation Experiments,' in Proceedings of the IRE, vol. 48, no. 3, pp. 301-309, March 1960, doi: 10.1109/JRPROC.1960.287598
1970s: Genetic algorithms
The algorithms were introduced in the US by John Holland at University of Michigan
1980s: Artificial life
1990s: Onwards and upwards!
Genetic algorithms (GAs) are a type of search algorithm inspired by natural selection. They mimic the process of evolution to find optimal solutions to problems:
GAs are probabilistic search procedures designed to work on large spaces involving states that can be represented by strings. A space is defined as the space of possible solutions to a problem.
Probabilistic search: selection and breeding
States: phenotype
Strings: genotype
The complete set of genetic material, including all chromosomes, is called a genome.
Thegenotypeencodes the range of characteristics of the individual DNA (the set of genes in genome).
DNA is passed from one generation to the next: heredity
Thephenotypeis the actual individual that manifests in the world.
Genotype is set at birth, determined by the DNA inherited from the parents. It's fixed at conception. Phenotype is influenced by both genotype and environment. While genotype provides the blueprint, the environment plays a role in shaping the phenotype. For example, even with a genotype for tallness, nutrition can affect how tall someone becomes.
Nature vs. Nurture:
DNA = Nature
Environment = Nurture, defining where in these ranges the characteristics actually fall
The environment plays a crucial role in shaping evolution. It exerts selective pressure on phenotypes, favouring those that are better suited to survive and reproduce.
These successful phenotypes, with their underlying genotypes, are more likely to be passed on to future generations (heredity).
GAsare inspired by the principles of Darwinian evolution. In GAs, we simulate a population of individuals with varying traits (analogous to phenotypes):
Popluation and Variation : We start with a population of candidate solutions, each representing a potential answer to the problem. Each solution has its own unique set of characteristics (like genes in an organism).
Selection : We then select solutions that perform well based on a defined fitness function (similar to how successful phenotypes survive in nature).
Reproduction : These 'fit' solutions are then used to create new solutions (like breeding in evolution). Techniques like crossover (combining characteristics) and mutation (introducing variations) are used to mimic the processes of inheritance and random genetic changes.
Advantages of GAs:
Finding optimal solutions : A key advantage of GAs is their ability to locate both local and global maxima (points of highest fitness) within a given search space. This makes them superior to older 'hill-climbing' algorithms that can get stuck at local maxima.
Exploring combinations : GAs go beyond simply testing individual components. They employ a technique called hyperplane sampling. This essentially means they evaluate various different combinations of these components, mimicking how different genes interact in an organism. This allows GAs to explore a broader range of potential solutions and potentially discover more optimal combinations.
How GAs work:
Selection : Imagine a roulett wheel where each slice represents a member of the population. The size of a slice is determined by a 'fitness function' that evaluates how well that member solves the problem.
The fitter a member, the larger its slice, giving it a higher chance of being selected for reproduction. This mimics how natural selection favours organisms better suited to their environment.
Hyperplane sampling and schemata : GAs don't just evaluate individual components of a solution, like bricks in a wall. They can also test different combinations of these components (like building different wall structures).
This allows them to find better overall solutions by exploring how different components work together. The schema theorem is a complex concept that supports this ability of GAs.
Parallelism : GAs can leverage parallelism to speed up the search process. There are two main types:
Implicit parallelism: This uses the population model itself to explore multiple solutions simultaneously. → Imagine pairs competing in a tournament, with the winners progressing to the next round. In implicit parallelism, you can only evaluate pairings of individuals in sequence, once at a time.
Computational parallelism: If you have a computer with multiple cores, you can use them to evaluate several combinations of individuals at the same time, significantly speeding up the search. → In computational parallelism, you can evaluate several combinations of individuals at the same time, depending on how many cores you have on your processor.
A Fourier Transform (FT) is an integral transform that takes a function as input and outputs another function that describes the extent to which various frequencies are present in the original function.
Discrete Fourier Transform (DFT)
Since the real world deals with discrete data (samples), the DFT is a crucial tool. It's the discrete version of the FT, specifically designed to analyse finite sequences of data points like those captured by computers.
The DFT transforms these samples into a complex-valued function of frequency called the Discrete-Time Fourier Transform (DTFT).
DFT converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex-valued function of frequency.
Discrete Cosine Transform (DCT)
The DCT is closely related to the DFT. While DFT uses both sines and cosines (complex functions), DCT focuses solely on cosine functions. This makes DCT computationally simpler and often preferred for tasks where the data has even symmetry (like audio signals).
DCT expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.
Fast Fourier Transform (FFT)
The DFT is powerful, but calculating it directly can be computationally expensive for large datasets. This is where the Fast Fourier Transform (FFT) comes in. It's a highly optimized algorithm specifically designed to compute the DFT efficiently, especially when the data length is a power of 2 (like 16, 32, 64 etc.).
The FFT is not a theoretical transform. It is just a fast algorithm to implement the transforms when N=2^k.
FFT is an algorithm that computes the Discrete Fourier Transform (DFT) of a sequence, or its inverse (IDFT). IDFT is a Fourier series, using the DTFT samples as coefficients of complex sinusoids at the corresponding DTFT frequencies.
The first step is to move from simple synthesis to complex synthesis.
What's wrong with the DCT?
The properties of sinusoids (sine waves): frequency, phase, and amplitude
We need a way to store phase and amplitude in the same place. That is called a complex numbers.
A complex number is basically two numbers stuck together.
Think of them as a way of storing phase and amplitude in one place, with associated mathematics to work with the complex numbers similarly to how we work with 'simple' numbers. → Complex numbers have a real and an imaginary part (two numbers).
Complex numbers in Python:
We need a way to computer waveforms from complex numbers: the exponential function
numpy.exp(x) is the exponential function, or
e^x (where e is Euler's number: approximately 2.718281)
→ numpy.exp(1j * x) converts x into a complex number then does exp!
def analyse_nearly_dft(ys, fs, ts):
N = len(fs)
args = np.outer(ts, fs)
M = np.exp(1j * np.pi*2 * args)
amps = M.conj().transpose().dot(ys) / N # Swap out from 'amps = np.linalg.solve(M, ys)
return amps
Final steps for the actual DFT:
# Calculate the frequency and time matrix
def synthesis_matrix(N):
ts = np.arange(N) / N
fs = np.arange(N)
args = np.outer(ts, fs)
M = np.exp(1j * np.pi*2 * args)
return M
# Transform
def dft(ys):
N = len(ys)
M = synthesis_matrix(N)
amps = M.conj().transpose().dot(ys) # No more '/ N'!
return amps
Fast convolution with the DFT
Convolving signals in the time domain is equivalent to multiplying their Fourier transforms in frequency domain.
The inverse DFT
def idft(ys):
N = len(ys)
M = synthesis_matrix(N)
amps = M.dot(ys) / N
return amps