Lab2: Lexer

Introduction

This assignment requires you to write a lexical analyzer of C programming language.

Requirement

You should download this package and finish "c.lex" "tokens.sml". We have provided a test case "test.c"

Input

Your input files are plain C programs like "test.c".

Output

The output should be line by line as this form:

token AT POS: position

position is the start position of the token in input file and starts from 0.

Output format of token:

symbols
+ - * / ( ) [ ] { } ; > < >= <= == != -> .
keywords
int char void struct while for if else return
intconst
INT(intconst) e.g. 1 => INT(1)
stringconst
STRING(stringconst) e.g. "haha" => STRING("haha")
identifier
ID(identifier) e.g. main => ID(main)

For example, the output of "test.c" should be:

int AT POS:9
ID(main) AT POS:13
( AT POS:17
) AT POS:18
{ AT POS:20
int AT POS:24
ID(a) AT POS:28
= AT POS:29
INT(1) AT POS:30
; AT POS:31
while AT POS:35
( AT POS:40
ID(a) AT POS:41
> AT POS:42
INT(1) AT POS:43
) AT POS:44
{ AT POS:50
ID(a) AT POS:58
= AT POS:59
ID(a) AT POS:60
+ AT POS:61
INT(1) AT POS:62
; AT POS:63
} AT POS:69
return AT POS:73
ID(a) AT POS:80
; AT POS:81
} AT POS:83

Hints

  1. ML-lex manual
  2. See Specification of our C language
  3. Finish datatype token and function toString in "tokens.sml"
  4. Finish the rules in "c.lex"
  5. Before compile , you need to manually run "<path to your mlton bin>/mllex <your lex file>". Notice that we use mllex of MLton not SML/NJ