'Computing' 카테고리의 글 목록

Computing

Lexical analysis 2021.09.02

Lexical analysis

welcometosorapark 2021. 9. 2. 16:56

2021. 9. 2. 16:56

lexical analysis, lexing or tokenisation is the process of converting a sequence of characters (such as in a computer programme or web page) into a sequence of tokens (strings with an assigned and thus identified meaning).

A programme that performs lexical analysis may be termed a lexer, tokeniser, or scanner, although scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyse the syntax of programming languages, web pages, and so forth.

Lexical Analysis can be implemented with the deterministic finite automata.

2021.08.27 - [CS_Theory] - [Regular Languages] Determinism (DFA)

[Regular Languages] Determinism (DFA)

sorapark.tistory.com

The output is a sequence of tokens, which is sent to the parser for syntax analysis.

	Read characters		Token
Input	→ ←	Lexical Analyser	→ ←	Syntax Analyser
	Push back extra characters		Ask for token

Lexeme

A lexeme is a sequence of characters in the source programme that matches the pattern for a token and is identified by the lexical analyser as an instance of that token.
Some authors term this a "token", using "token" interchangeably to represent the string being tokenised, and the token data structure resulting from putting this string through the tokenisation process.

Token

A lexical token or simply token is a string with an assigned and thus identified meaning.
It is structured as a pair consisting of a token name and an optional token value.

The token name is a category of lexical unit. Common token names are:

identifier: names the programmer chooses
ex) x, colour and UP
keyword: names already in the programming language
ex) if, while and return
separator (also known as punctuators): punctuation characters and paired-delimiters
ex) }, ( and ;
operator: symbols that operate on arguments and produce results
ex) +, < and =
literal: numeric, logical, textual, reference literals
ex) true, 6.02e23 and "music"
comment: line, block (depends on the compiler if compiler implements comments as tokens otherwise it will be stripped)
ex) /* */ and //

For example,

x = a + b * 2;

All the valid tokens are:

x
=
a
+
b
*
2
;

The lexical analysis of the expression yields the following sequence of tokens:

[(identifier, x), (operator, =), (identifier, a), (operator, +), (identifier, b), (operator, *), (literal, 2), (separator, ;)]

PREV 이전 1 NEXT 다음

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Welcome To Sora Park

Computing

Lexical analysis

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역