'lexicalanalysis' 태그의 글 목록

lexicalanalysis

Lexical analysis 2021.09.02

Lexical analysis

welcometosorapark 2021. 9. 2. 16:56

2021. 9. 2. 16:56

lexical analysis, lexing or tokenisation is the process of converting a sequence of characters (such as in a computer programme or web page) into a sequence of tokens (strings with an assigned and thus identified meaning).

A programme that performs lexical analysis may be termed a lexer, tokeniser, or scanner, although scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyse the syntax of programming languages, web pages, and so forth.

Lexical Analysis can be implemented with the deterministic finite automata.

2021.08.27 - [CS_Theory] - [Regular Languages] Determinism (DFA)

[Regular Languages] Determinism (DFA)

sorapark.tistory.com

The output is a sequence of tokens, which is sent to the parser for syntax analysis.

	Read characters		Token
Input	→ ←	Lexical Analyser	→ ←	Syntax Analyser
	Push back extra characters		Ask for token

Lexeme

A lexeme is a sequence of characters in the source programme that matches the pattern for a token and is identified by the lexical analyser as an instance of that token.
Some authors term this a "token", using "token" interchangeably to represent the string being tokenised, and the token data structure resulting from putting this string through the tokenisation process.

Token

A lexical token or simply token is a string with an assigned and thus identified meaning.
It is structured as a pair consisting of a token name and an optional token value.

The token name is a category of lexical unit. Common token names are:

identifier: names the programmer chooses
ex) x, colour and UP
keyword: names already in the programming language
ex) if, while and return
separator (also known as punctuators): punctuation characters and paired-delimiters
ex) }, ( and ;
operator: symbols that operate on arguments and produce results
ex) +, < and =
literal: numeric, logical, textual, reference literals
ex) true, 6.02e23 and "music"
comment: line, block (depends on the compiler if compiler implements comments as tokens otherwise it will be stripped)
ex) /* */ and //

For example,

x = a + b * 2;

All the valid tokens are:

x
=
a
+
b
*
2
;

The lexical analysis of the expression yields the following sequence of tokens:

[(identifier, x), (operator, =), (identifier, a), (operator, +), (identifier, b), (operator, *), (literal, 2), (separator, ;)]

'UoL > CS_Theory' 카테고리의 다른 글

[Logic] Proof (0)	2021.09.02
[Logic] Quantifiers (0)	2021.09.01
[Logic] Logical Equivalences (0)	2021.09.01
[Logic] Propositions (0)	2021.09.01
[Turing Machines] Decidable/Undecidable Problems (0)	2021.08.31

PREV 이전 1 NEXT 다음

Welcome To Sora Park