'binary' 태그의 글 목록

binary

welcometosorapark 2024. 4. 17. 19:19

2024. 4. 17. 19:19

NLP tasks can be categorised by problem type:

How to evaluate the models? Here is an example for a classification problem below:

Imagine we are building a spam classifier
: Predict whether email messages will be filtered or not
- Input = feature matrix (email message)
- Output = target vector (yes/no)
Model could be Naive Bayes, k-nearest neighbour, etc.
This is a binary classification problem

In the case, the goal is to predict 'spam' or 'not spam' for email messages. As before:

To measure performance, we should consider several factors, including

Metric(s)
- These are quantitative measures that assess how well a model performs.
  → A common metric is accuracy, which is calculated as the number of correct predictions divided by the total number of predictions (n).
Balance of the dataset
- This refers to the distribution of classes within your data.
  → An imbalanced dataset can skew the performance metrics, so it's important to consider this factor as well (for an unbalanced dataset, we can achieve high accuracy simply by selecting the majority class).

Another example for a classification problem:

Imagine you work in a hospital
: Predict whether a CT scan shows tumour or not
- Tumours are rare events, so the classes are unbalanced
  : The cost of missing a tumour is much higher than a 'false alarm'
Accuracy is not a good metric

In the case, the confusion matrix can be used to compare the predicted values with actual values (ground truth):

Accuracy = (TP + TN) / (TP + TN + FP + FN)
Recall = TP / (TP + FN)
: Recall is the proportion of actual positive values that are predicted positive
Precision = TP / (TP + FP)
: Precision is the proportion of predicted positive values that are actually positive

(w07) Lexical semantics (0)	2024.05.22
(w06) N-gram Language Models (0)	2024.05.14
(w04) Regular expression (0)	2024.04.30
(w03) Text processing fundamentals (0)	2024.04.24
(w01) NLP applications (0)	2024.04.17