Basic Tokenizer
Level: intro (score: 1)
🎯 Let's move towards something more real world: a parser. We’ll build a tiny tokenizer (lexer) that turns a string into a stream of tokens.
Scope (ASCII-only):
- Identifier: first char letter or
_
, then letters/digits/_
- Integer: one or more digits (base-10; no signs yet)
- Single-char tokens:
(
)
+
-
*
/
,
=
- Whitespace: skip
- Everything else:
Unknown(c)
✅ Your task
- We already defined:
pub enum Token {
Ident(String),
Number(i64),
LParen, RParen,
Plus, Minus, Star, Slash,
Comma, Equal,
Unknown(char),
}
- Implement
tokenize(input: &str) -> Vec<Token>
so that it:
- Skips consecutive whitespace
- Groups digits into one
Number
- Groups identifier chars into one
Ident
- Maps single-char tokens with a
match
- Emits
Unknown(c)
for anything else
You have a starter template with a loop skeleton and helper stubs.
💡 Hints
- Work in this order: whitespace → ident → number → single-char → unknown.
- Use
c.is_whitespace()
andc.is_ascii_digit()
. - Use helpers:
is_ident_start(c)
andis_ident_continue(c)
. - Pattern: mark
start
→ advance while condition → slicestart..i
→ collect/push. - After pushing a token,
continue;
to avoid falling through branches. chars()
is fine; avoidas_bytes()
for now.
Example
let toks = tokenize("sum(x, 42) - y3");
use Token::*;
assert_eq!(
toks,
vec![
Ident("sum".into()), LParen, Ident("x".into()), Comma,
Number(42), RParen, Minus, Ident("y3".into())
]
);