Pybites Logo Rust Platform

Basic Tokenizer

Level: intro (score: 1)

🎯 Let's move towards something more real world: a parser. We’ll build a tiny tokenizer (lexer) that turns a string into a stream of tokens.

Scope (ASCII-only):

  • Identifier: first char letter or _, then letters/digits/_
  • Integer: one or more digits (base-10; no signs yet)
  • Single-char tokens: ( ) + - * / , =
  • Whitespace: skip
  • Everything else: Unknown(c)

Your task

  1. We already defined:
pub enum Token {
    Ident(String),
    Number(i64),
    LParen, RParen,
    Plus, Minus, Star, Slash,
    Comma, Equal,
    Unknown(char),
}
 
  1. Implement tokenize(input: &str) -> Vec<Token> so that it:
  • Skips consecutive whitespace
  • Groups digits into one Number
  • Groups identifier chars into one Ident
  • Maps single-char tokens with a match
  • Emits Unknown(c) for anything else

You have a starter template with a loop skeleton and helper stubs.


💡 Hints

  • Work in this order: whitespace → ident → number → single-char → unknown.
  • Use c.is_whitespace() and c.is_ascii_digit().
  • Use helpers: is_ident_start(c) and is_ident_continue(c).
  • Pattern: mark start → advance while condition → slice start..i → collect/push.
  • After pushing a token, continue; to avoid falling through branches.
  • chars() is fine; avoid as_bytes() for now.

Example

let toks = tokenize("sum(x, 42) - y3");
use Token::*;
assert_eq!(
    toks,
    vec![
        Ident("sum".into()), LParen, Ident("x".into()), Comma,
        Number(42), RParen, Minus, Ident("y3".into())
    ]
);