Lexical analyzer

 Lexical analyzer

Programs that perform lexical analysis are called lexical analyzers or lexers.

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning). A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, although scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

Lexical analysis is the first phase of a compiler. It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code.

LEXICAL ANALYSIS is the very first phase in the compiler designing. A Lexer takes the modified source code which is written in the form of sentences . In other words, it helps you to convert a sequence of characters into a sequence of tokens. The lexical analyzer breaks this syntax into a series of tokens. It removes any extra space or comment written in the source code.

Programs that perform lexical analysis are called lexical analyzers or lexers. A lexer contains tokenizer or scanner. If the lexical analyzer detects that the token is invalid, it generates an error. It reads character streams from the source code, checks for legal tokens, and pass the data to the syntax analyzer when it demands.

Lexical Errors

A character sequence which is not possible to scan into any valid token is a lexical error. Important facts about the lexical error:

  • Lexical errors are not very common, but it should be managed by a scanner
  • Misspelling of identifiers, operators, keyword are considered as lexical errors
  • Generally, a lexical error is caused by the appearance of some illegal character, mostly at the beginning of a token.

Error Recovery in Lexical Analyzer

Here, are a few most common error recovery techniques:

  • Removes one character from the remaining input
  • In the panic mode, the successive characters are always ignored until we reach a well-formed token
  • By inserting the missing character into the remaining input
  • Replace a character with another character
  • Transpose two serial characters

Roles of the Lexical analyzer

  • Helps to identify token into the symbol table
  • Removes white spaces and comments from the source program
  • Correlates error messages with the source program
  • Helps you to expands the macros if it is found in the source program
  • Read input characters from the source program

Advantages of Lexical analysis

  • Lexical analyzer method is used by programs like compilers which can use the parsed data from a programmer's code to create a compiled binary executable code
  • It is used by web browsers to format and display a web page with the help of parsed data from JavsScript, HTML, CSS
  • A separate lexical analyzer helps you to construct a specialized and potentially more efficient processor for the task

Disadvantage of Lexical analysis

  • You need to spend significant time reading the source program and partitioning it in the form of tokens
  • Some regular expressions are quite difficult to understand compared to PEG or EBNF rules
  • More effort is needed to develop and debug the lexer and its token descriptions
  • Additional runtime overhead is required to generate the lexer tables and construct the tokens

Comments