Natural Language Processing (NLP) is informed by a number of perspectives (disciplines contribute to NLP):
- Computer/data science
- Theoretical foundation of computation and practical techniques for implementation
- Information science
- Analysis, classification, manipulation, retrieval and dissemination of information
- Computational Linguistics
- Use of computational techniques to study linguistic phenomena
- Cognitive science
- Study of human information processing (perception, language, reasoning, etc.)
NLP adopts multiple paradigms:
- Symbolic approaches
- Rule-based, hand coded (by linguists/subject matter experts)
- Knowledge-intensive
- Statistical approaches
- Distributional & neural approaches, supervised or unsupervised
- Data-intensive
NLP applications:
- Text categorisation
- Media monitoring
- Classify incoming news stories
- Search engines
- Classify query intent, e.g. search for 'LOG313'
- Spam detection
- Media monitoring
- Machine translation
- Fully automatic, e.g. Google translate
- Semi-automated
- Helping human translators
- Text summarisation
: to manage information in overload, we need to abstract it down to the most important elements or summarise it- Summarisation
- Single-document vs. multi-document
- Search results
- Word processing
- Research/analysis tools
- Summarisation
- Dialog systems
- Chatbots
- Smartphone speakers
- Smartphone assistants
- Call handling systems
- Travel
- Hospitality
- Banking
- Sentiment Analysis
: identify and extract subjective information
- Several sub-tasks:
- Identify polarity
e.g. of movie reviews
e.g. positive, negative, or neutral - Identify emotional states
e.g. angry, sad, happy, etc - Subjectivity/objectivity identification
e.g. “fact” from opinion - Feature/aspect-based
: differentiate between specific features or aspects of entities
- Identify polarity
- Several sub-tasks:
- Text mining
- Analogy with Data Mining
- Discover or infer new knowledge from unstructured text resources
- A<->B and B<->C
- Infer A<->C?
e.g. link between migraine headaches and magnesium deficiency
- Infer A<->C?
- Applications in life sciences, media/publishing, counter terrorism and competitive intelligence
- Analogy with Data Mining
- Question answering
- Going beyond the document retrieval paradigm
: provide specific answers to specific questions
- Going beyond the document retrieval paradigm
- Natural language generation
- Speech recognition & synthesis
…and lots more
History of NLP
- Foundational Insights: 1940s and 1950s
- Two foundational paradigms:
1. The automaton, which is the essential information processing unit
2. Probabilistic or information-theoretic models - The automaton arose out of Turing’s (1936) model of algorithmic computation
- Chomsky (1956) considered finite state machines as a way to characterise a grammar
: he was one of the first people to use these ideas
- Chomsky (1956) considered finite state machines as a way to characterise a grammar
- Shannon (1948) borrowed the concept of entropy from thermodynamics
: Entropy is a measure of uncertainty (as entropy approaches 1.0, uncertainty increases)
- As a way of measuring the information content of a language
- Measured of the entropy of English by using probabilistic techniques based on the concept of entropy
- Two foundational paradigms:
- Two camps: 1960s and 1970s
- Speech and language processing split into two paradigms:
1. Symbolic:
- Chomsky and others on parsing algorithms
- Artificial intelligence (1956) work on reasoning and logic
- Early natural language understanding (NLU) systems:
- Single domains pattern matching
- Keyword search
- Heuristics for reasoning
2. Statistical (stochastic)
- Mosteller and Wallace (1964) applied Byesian methods to the problem of authorship attribution on The Federalist Papers
- Speech and language processing split into two paradigms:
- Early NLP systems
: ELIZA and SHRDLU were the highly influential early NLP systems
- ELIZA
- Wiezenbaum 1966
- Pattern matching (ELIZA used elementary keyword spotting techniques)
- First chatbot
- SHRDLU
- Winograd 1972
- Natural language understanding
- Comprehensive grammar of English
They created this imaginary world called the block’s world (simulated a robot embedded in a world of toy blocks). The user could interact with this block’s world by asking questions and giving commands.
- Further developments in the 1960s
- First text corpora (corpora is plural of corpus)
- The Brown corpus: a one-million-word collection of samples from 500 written texts from different genres (newspaper, novels, non-fiction, academic, etc.), assembled at Brown University in 1963-64 (Kuˇcera and Francis, 1967; Francis, 1979; Francis and Kuˇcera, 1982), and William S. Y. Wang’s 1967 DOC (Dictionary on Computer)
- First text corpora (corpora is plural of corpus)
- Empiricism: 1980s and 1990s
: The rise of the WWW emphasised the need for language-based information retrieval and information extraction.
- The return of two classes of models that had lost popularity:
1. Finite-state models:
- Finite-state morphology by Kaplan and Kay (1981) and models of syntax by Church (1980)
2. Probabilistic and data-driven approaches:
- From speech recognition to part-of-speech tagging, parsing and semantics - Model evaluation
- Quantitative metrics, comparison of performance with previous published research
- Regular competitive evaluation exercises such as the Message Understanding Conferences (MUC)
- The return of two classes of models that had lost popularity:
- The rise of machine learning: 2000s
: Large amounts of spoken and written language data became available, including annotated collections
e.g. Penn Treebank (Marcus et al. 1993)- Traditional NLP problems, such as parsing and semantic analysis, became problems for supervised learning
- Unsupervised statistical approaches began to receive renewed attention
- Statistical approaches to machine translation (Brown et al., 1990; Och and Ney, 2003) and topic modelling (Blei et al., 2003) demonstrated that effective applications could be constructed from systems trained on unannotated data
- Cost and difficulty of producing annotated corpora became a limiting factor for supervised approaches
- Ascendance of deep learning: 2010s onwards
- Deep learning methods have become pervasive in NLP and AI in general
- Advances in technology such as GPUs developed for gaming
- Plummeting costs of memory
- Wide availability of software platforms
- Classic ML methods require analysts to select features based on domain knowledge
- Deep learning introduced automated feature engineering: generated by the learning system itself
- Collobert et al (2011) applied convolutional neural nets (CNNs) to POS tagging, chunking, NE tags and language modelling
- CNNs unable to handle long-distance contextual information
- Recurrent neural networks (RNNs) process items as a sequence with a "memory" of previous inputs'
: The method is very useful for what we call sequence labelling tasks.- Applicable to many tasks such as:
- Word-level: named entity recognition, language modelling
- Sentence-level: sentiment analysis, selecting responses to messages
- Language generation for machine translation, image captioning, etc.
- Applicable to many tasks such as:
- Deep learning methods have become pervasive in NLP and AI in general
- ELIZA
RNNs are supplemented with long short-term memory (LSTM) or gated recurrent units (GRUs) to improve training performance (the 'vanishing gradient problem').
'NaturalLanguageProcessing > Concept' 카테고리의 다른 글
(w07) Lexical semantics (0) | 2024.05.22 |
---|---|
(w06) N-gram Language Models (0) | 2024.05.14 |
(w04) Regular expression (0) | 2024.04.30 |
(w03) Text processing fundamentals (0) | 2024.04.24 |
(w02) NLP evaluation -basic (0) | 2024.04.17 |