NLP tasks:

  1. Classification taks (e.g. spam detection)
  2. Sequence taks (e.g. text generation)
  3. Meaning tasks

A lexical database:

  • Nodes are synsets
  • Correspond to abstract concepts
  • Ployhierarchical structure
    • A polyhierarchical structure is one that allows multiple parents.

According to WordNet which is a large lexical database of English, a synset or synonym set is defined as a set of one or more synonyms that are interchangeable in some context without changing the truth value of the proposition in which they are embedded.

 

Using WordNet, we can programmatically:

  • Identify hyponyms (child terms) and hypernyms (parent terms)
  • Measure semantic similarity

The process of classifying words into their parts of speech and labelling them accordingly is known as parts-of-speech tagging, POS-tagging, or simply tagging. Parts-of-speech are also known as word classes or lexical categories.

  • POS-tagger processes a sequence of words, and attaches a part of speech tag to each word.
  • The collection of tags used for a particular task is known as a tagset.

 

'NaturalLanguageProcessing > Concept' 카테고리의 다른 글

(w08) Vector semantics and embeddings  (0) 2024.05.27
(w06) N-gram Language Models  (0) 2024.05.14
(w04) Regular expression  (0) 2024.04.30
(w03) Text processing fundamentals  (0) 2024.04.24
(w02) NLP evaluation -basic  (0) 2024.04.17

+ Recent posts