Defining the parser: memoized functions and inputs

The next step in the calc compiler is to define the parser. The role of the parser will be to take the raw bytes from the input and create the Statement, Function, and Expression structures that we defined in the ir module.

To minimize dependencies, we are going to write a recursive descent parser. Another option would be to use a Rust parsing framework.

The `source_text` the function

Let's start by looking at the source_text function:


#![allow(unused)]
fn main() {
#[salsa::memoized(return_ref)]
pub fn source_text(_db: &dyn crate::Db) -> String {
    panic!("input")
}
}

This is a bit of an odd function! You can see it is annotated as memoized, which means that salsa will store the return value in the database, so that if you call it again, it does not re-execute unless its inputs have changed. However, the function body itself is just a panic!, so it can never successfully return. What is going on?

This function is an example of a common convention called an input. Whenever you have a memoized function, it is possible to set its return value explicitly (the chapter on testing shows how it is done). When you set the return value explicitly, it never executes; instead, when it is called, that return value is just returned. This makes the function into an input to the entire computation.

In this case, the body is just panic!, which indicates that source_text is always meant to be set explicitly. It's possible to set a return value for functions that have a body, in which case they can act as either an input or a computation.

Arguments to a memoized function

The first parameter to a memoized function is always the database, which should be a dyn Trait value for the database trait associated with the jar (the default jar is crate::Jar).

Memoized functions may take other arguments as well, though our examples here do not. Those arguments must be something that can be interned.

Memoized functions with `return_ref`

source_text is not only memoized, it is annotated with return_ref. Ordinarily, when you call a memoized function, the result you get back is cloned out of the database. The return_ref attribute means that a reference into the database is returned instead. So, when called, source_text will return an &String rather than cloning the String. This is useful as a performance optimization.

The `parse_statements` function

The next function is parse_statements, which has the job of actually doing the parsing. The comments in the function explain how it works.


#![allow(unused)]
fn main() {
#[salsa::memoized(return_ref)]
pub fn parse_statements(db: &dyn crate::Db) -> Vec<Statement> {
    // Get the source text from the database
    let source_text = source_text(db);

    // Create the parser
    let mut parser = Parser {
        db,
        source_text,
        position: 0,
    };

    // Read in statements until we reach the end of the input
    let mut result = vec![];
    loop {
        // Skip over any whitespace
        parser.skip_whitespace();

        // If there are no more tokens, break
        if let None = parser.peek() {
            break;
        }

        // Otherwise, there is more input, so parse a statement.
        if let Some(statement) = parser.parse_statement() {
            result.push(statement);
        } else {
            // If we failed, report an error at whatever position the parser
            // got stuck. We could recover here by skipping to the end of the line
            // or something like that. But we leave that as an exercise for the reader!
            parser.report_error();
            break;
        }
    }

    result
}
}

The most interesting part, from salsa's point of view, is that parse_statements calls source_text to get its input. Salsa will track this dependency. If parse_statements is called again, it will only re-execute if the return value of source_text may have changed.

We won't explain how the parser works in detail here. You can read the comments in the source file to get a better understanding. But we will cover a few interesting points that interact with Salsa specifically.

Creating interned values with the `from` method

The parse_statement method parses a single statement from the input:


#![allow(unused)]
fn main() {
    fn parse_statement(&mut self) -> Option<Statement> {
        self.skip_whitespace();
        let word = self.word()?;
        if word == "fn" {
            let func = self.parse_function()?;
            Some(Statement::from(self.db, StatementData::Function(func)))
            //   ^^^^^^^^^^^^^^^           ^^^^^^^^^^^^^^^^^^^^^^^
            //  Create a new interned enum...      |
            //                             using the "data" type.
        } else if word == "print" {
            let expr = self.parse_expression()?;
            Some(Statement::from(self.db, StatementData::Print(expr)))
        } else {
            None
        }
    }
}

The part we want to highlight is how an interned enum is created:


#![allow(unused)]
fn main() {
Statement::from(self.db, StatementData::Function(func))
}

On any interned value, the from method takes a database and an instance of the "data" type (here, StatementData). It then interns this value and returns the interned type (here, Statement).

Creating entity values, or interned structs, with the `new` method

The other way to create an interned/entity struct is with the new method. This only works when the struct has named fields (i.e., it doesn't work with enums like Statement). The parse_function method demonstrates:


#![allow(unused)]
fn main() {
    fn parse_function(&mut self) -> Option<Function> {
        let name = self.word()?;
        let name: FunctionId = FunctionId::new(self.db, name);
        //                     ^^^^^^^^^^^^^^^
        //                Create a new interned struct.
        self.ch('(')?;
        let args = self.parameters()?;
        self.ch(')')?;
        self.ch('=')?;
        let body = self.parse_expression()?;
        Some(Function::new(self.db, name, args, body))
        //   ^^^^^^^^^^^^^
        // Create a new entity struct.
    }
}

You can see that we invoke FunctionnId::new (an interned struct) and Function::new (an entity). In each case, the new method takes the database, and then the value of each field.

Salsa

Defining the parser: memoized functions and inputs

The source_text the function

Arguments to a memoized function

Memoized functions with return_ref

The parse_statements function

Creating interned values with the from method

Creating entity values, or interned structs, with the new method

The `source_text` the function

Memoized functions with `return_ref`

The `parse_statements` function

Creating interned values with the `from` method

Creating entity values, or interned structs, with the `new` method