pocketsphinx

  • Offline speech recognition in real-time on mobile devices, ported from the CMUSphinx project

How to use pocketsphinx:

pip install pocketsphinx

 

# Pocketsphinx on live input
from pocketsphinx import LiveSpeech
for phrase in LiveSpeech(): print(phrase)
# Pocketsphinx for keywords
from pocketsphinx import LiveSpeech
speech = LiveSpeech(lm=False, keyphrase='move forward', kws_threshold=le-20)
for phrase in speech: print(phrase.segments(detailed=True))
# Specify phrases in an external file
from pocketsphinx improt LiveSpeech
speech = LiveSpeech(lm=False, kws='./kws.text')
for phrase in speech: print(phrase.segments(detailed=True))
# File contents:
#	move forward /le-40/
#	go backwards /le-40/
#	turn left /le-20/
#	turn right /le-20/
# Pocketsphinx and audio files
from pocketsphinx import Pocketsphinx
ps = Pocketsphinx()
ps.decode(audio_file='nines.wav')
ps.hypothesis()
ps.confidence()
ps.best(count=4)
from pocketsphinx import Pocketsphinx
ps = Pocketsphinx(l=False, kws='./kws.txt')
ps.decode(audio_file='nines.wav')
ps.hypothesis()

 

vosk

  • An easy to install API which is able to run efficient offline Kaldi models
  • A neat wrapper around kaldi models

kaldi

  • Large, open source collection of components for constructing ASR system based on finite-state transducers
    • Finite-state transducer
      : Intuitively - a simplified version of an HMM (Hidden Markov Model)
      → The tagging speed when using transducers is up to five times higher than when using the underlying HMMs. The main advantage of transforming an HMM is that the resulting transducer can be handled by finite state calculus.

Mozilla DeepSpeech

  • An open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry pi 4 to high power GPU servers

 

'IntelligentSignalProcessing' 카테고리의 다른 글

(w06) Complex synthesis  (0) 2024.05.14
(w04) Filtering  (0) 2024.05.01
(w03) Audio processing  (0) 2024.04.26
(w01) Digitising audio signals  (0) 2024.04.11
(w01) Audio fundamentals  (0) 2024.04.11

Words with similar contexts have similar meanings

  • Zellig Harris (1954): 'If A and B have almost identical environments, we say that they are synonyms' (e.g. doctor|surgeon (patient, hospital, treatment, etc)
    → This notion is referred to as the distributional hypothesis

Distributional hypothesis

  • Is concerned with the link between similar distributions and similar meanings
  • Assumes words with similar contexts have similar meanings

 

Distributional models are based on a co-occurrence matrix

  • Term-document matrix
  • Term-term matrix

Overall matrix is |V| by |D|

  • Similar documents have similar words: represented by the column vectors
  • Similar words occur in similar documents: represented by the row vectors

Overall matrix is |V| by |V|

  • Represents co-occurrences in some corpus
  • Context is usually restricted to a fixed window (e.g. +/- 4 words)
  • Similar terms have similar vectors

Problems with term-term matrices:

  • Term-term matrices are sparse
    • Term vectors are long |V|
    • Most entries are zero

Doesn't reflect underlying linguistic structure: 'food is bad' and 'meal was awful'

 

Word embeddings

  • Represent words as low-dimensional vectors
  • Capture semantic and syntactic similarities between words (e.g. food|meal, bad|awful, etc)
  • Typically have 50-300 dimensions rather than |V| (compared to the vast vocabulary size)
  • Most vector elements are non-zero

Benefits

  • Classifiers need to learn far fewer weights
    → significantly reduce the number of parameters for classifiers
  • Improve generalisation and prevent overfitting
  • Capture semantic relationships like synonymy

Word2Vec software package: Static embeddings (unlike BERT or ELMo)

  • Key idea
    • Predict rather than count
    • Binary prediction task: 'Is word x likely to co-occur with word y?'
    • Use the learned classifier weights
    • Running text is the training data
  • Basic algorithm (skip-gram with negative sampling)
    • Treat neighbouring context words as positive samples
    • Treat other random words in the vocabulary (V) as negative samples
      → negative samples are randomly selected words that did not appear in the context window
    • Train a logistic regression classifier to distinguish these classes
    • Use learned weights as embeddings

The benefits of using word embeddings compared to traditional vector representations:

  1. They often better generalisation capabilities
  2. They are more compact

 

'NaturalLanguageProcessing > Concept' 카테고리의 다른 글

(w07) Lexical semantics  (0) 2024.05.22
(w06) N-gram Language Models  (0) 2024.05.14
(w04) Regular expression  (0) 2024.04.30
(w03) Text processing fundamentals  (0) 2024.04.24
(w02) NLP evaluation -basic  (0) 2024.04.17

NLP tasks:

  1. Classification taks (e.g. spam detection)
  2. Sequence tasks (e.g. text generation)
  3. Meaning tasks

A lexical database:

  • Nodes are synsets
  • Correspond to abstract concepts
  • Ployhierarchical structure
    • A polyhierarchical structure is one that allows multiple parents.

According to WordNet which is a large lexical database of English, a synset or synonym set is defined as a set of one or more synonyms that are interchangeable in some context without changing the truth value of the proposition in which they are embedded.

 

Using WordNet, we can programmatically:

  • Identify hyponyms (child terms) and hypernyms (parent terms)
  • Measure semantic similarity

The process of classifying words into their parts of speech and labelling them accordingly is known as parts-of-speech tagging, POS-tagging, or simply tagging. Parts-of-speech are also known as word classes or lexical categories.

  • POS-tagger processes a sequence of words, and attaches a part of speech tag to each word.
  • The collection of tags used for a particular task is known as a tagset.

 

'NaturalLanguageProcessing > Concept' 카테고리의 다른 글

(w08) Vector semantics and embeddings  (0) 2024.05.27
(w06) N-gram Language Models  (0) 2024.05.14
(w04) Regular expression  (0) 2024.04.30
(w03) Text processing fundamentals  (0) 2024.04.24
(w02) NLP evaluation -basic  (0) 2024.04.17

The meaning of 'bio-inspired' is as follows:

  • Looking at the algorithms and functions in the natural world and thinking there must be something we can use computationally here

According to Wikipedia, bio-inspired computing (short for biologically inspired computing) is a field of study which seeks to solve computer science problems models of biology.

  • It relates to connectionism, social behaviour, and emergence.

Within computer science, bio-inspired computing relates to artificial intelligence (AI) and machine learning (ML).

 

Bio-inspired computation has emerged as one of the most studied branches of AI during the last decades, according to the referece:

  • Del Ser, Javier, et al. "Bio-inspired computation: Where we stand and what's next." Swarm and Evolutionary Computation 48 (2019): 220-250.

Brief overview of bio-inspired computing:

  • 1950s: Cybernetics
    • "the science of control and communication, in the animal and the machine" - Weiner, 1948
    • "Co-ordination, regulation and control will be its themes, for these are of the greatest biological and practical interest" - Ashby, 1961
  • 1960s: Connectionism (neural networks)
    • "The perceptron is a minimally constrained 'nerve net' consisting of logically simplified neural elements, which has been shown to be capable of learning to discriminate and to recognise perceptual patterns" - F. Rosenblatt, 'Perceptron Simulation Experiments,' in Proceedings of the IRE, vol. 48, no. 3, pp. 301-309, March 1960, doi: 10.1109/JRPROC.1960.287598
  • 1970s: Genetic algorithms
    • The algorithms were introduced in the US by John Holland at University of Michigan
  • 1980s: Artificial life
  • 1990s: Onwards and upwards!

Genetic algorithms (GAs) are a type of search algorithm inspired by natural selection. They mimic the process of evolution to find optimal solutions to problems:

  • GAs are probabilistic search procedures designed to work on large spaces involving states that can be represented by strings. A space is defined as the space of possible solutions to a problem.
    • Probabilistic search: selection and breeding
    • States: phenotype
    • Strings: genotype

 

The complete set of genetic material, including all chromosomes, is called a genome.

  • The genotype encodes the range of characteristics of the individual DNA (the set of genes in genome).
    • DNA is passed from one generation to the next: heredity
  • The phenotype is the actual individual that manifests in the world.

 

Genotype is set at birth, determined by the DNA inherited from the parents. It's fixed at conception. Phenotype is influenced by both genotype and environment. While genotype provides the blueprint, the environment plays a role in shaping the phenotype. For example, even with a genotype for tallness, nutrition can affect how tall someone becomes.

  • Nature vs. Nurture:
    • DNA = Nature
    • Environment = Nurture, defining where in these ranges the characteristics actually fall

The environment plays a crucial role in shaping evolution. It exerts selective pressure on phenotypes, favouring those that are better suited to survive and reproduce.

  • These successful phenotypes, with their underlying genotypes, are more likely to be passed on to future generations (heredity).

 

GAs are inspired by the principles of Darwinian evolution. In GAs, we simulate a population of individuals with varying traits (analogous to phenotypes):

  1. Popluation and Variation
    : We start with a population of candidate solutions, each representing a potential answer to the problem. Each solution has its own unique set of characteristics (like genes in an organism).
  2. Selection
    : We then select solutions that perform well based on a defined fitness function (similar to how successful phenotypes survive in nature).
  3. Reproduction
    : These 'fit' solutions are then used to create new solutions (like breeding in evolution). Techniques like crossover (combining characteristics) and mutation (introducing variations) are used to mimic the processes of inheritance and random genetic changes.

 

Advantages of GAs:

  1. Finding optimal solutions
    : A key advantage of GAs is their ability to locate both local and global maxima (points of highest fitness) within a given search space. This makes them superior to older 'hill-climbing' algorithms that can get stuck at local maxima.
  2. Exploring combinations
    : GAs go beyond simply testing individual components. They employ a technique called hyperplane sampling. This essentially means they evaluate various different combinations of these components, mimicking how different genes interact in an organism. This allows GAs to explore a broader range of potential solutions and potentially discover more optimal combinations.

 

How GAs work:

  • Selection
    : Imagine a roulett wheel where each slice represents a member of the population. The size of a slice is determined by a 'fitness function' that evaluates how well that member solves the problem.
    • The fitter a member, the larger its slice, giving it a higher chance of being selected for reproduction. This mimics how natural selection favours organisms better suited to their environment.
  • Hyperplane sampling and schemata
    : GAs don't just evaluate individual components of a solution, like bricks in a wall. They can also test different combinations of these components (like building different wall structures).
    • This allows them to find better overall solutions by exploring how different components work together. The schema theorem is a complex concept that supports this ability of GAs.
  • Parallelism
    : GAs can leverage parallelism to speed up the search process. There are two main types:
    • Implicit parallelism: This uses the population model itself to explore multiple solutions simultaneously.
      → Imagine pairs competing in a tournament, with the winners progressing to the next round. In implicit parallelism, you can only evaluate pairings of individuals in sequence, once at a time.
    • Computational parallelism: If you have a computer with multiple cores, you can use them to evaluate several combinations of individuals at the same time, significantly speeding up the search.
      → In computational parallelism, you can evaluate several combinations of individuals at the same time, depending on how many cores you have on your processor.

'ArtificialIntelligence > Concept' 카테고리의 다른 글

(w05) Ethics of game playing AI  (0) 2024.05.06
(w02) Markov Decision Process and Deep Q Network  (0) 2024.04.16

+ Recent posts