• Home
  • About
  • Say Hi
  • RESSA, Rusty EcmaScript Syntax Analyzer, overview

    RESSA is a Rusty EcmaScript Syntax Analyzer. It is a member of a family of tools that manage javascript source code. These are RESS, a scanner responsible for reading and tokenizing EcmaScript source code, resast, a collection of ECMAScript AST node types, and RESW, an experimental crate for writing javascript code from resast nodes.

    Parser usage

    Here is how we parse a hello world function:

    let js = "function helloWorld() { alert('Hello world'); }";
    let mut builder = Builder::new();
    let mut parser = builder
         .js(js)
         .module(false)
         .build()
         .unwrap();
    let ast = parser.parse();
    

    We create a builder, we use it to create a parser, then we call parse and get the AST.

    Parser is the main component of RESSA. It navigates the code token by token and builds the AST. We might as well create it with default settings and avoid the builder using Parser::new().

    The builder is a fluent interface. It takes parameters that guide the parsing. is_module here configures the parser to expect an ES6 module. When false, the parsing fails if it encounters an import statement. According to this value also, the result will either be a Program::Mod or a Program::Script variant of the Program enum.

    Parser implements the Iterator trait. If we call parser.next() many times instead of parser.parse(), the parser keeps returning values of ProgramPart. parse indeed calls Iterator#collect to collect these parts and use them to create a Program instance:

    pub fn parse(&mut self) -> Res<Program> {
        if self.context.is_module {
            self.context.strict = true;
        }
        let body: Res<Vec<ProgramPart>> = self.collect();
        Ok(if self.context.is_module {
            Program::Mod(body?)
        } else {
            Program::Script(body?)
        })
    }
    

    Sub-parsers

    Parsing methods return values of Res<T>, which is an alias for the Result type:

    Res<T> = Result<T, Error>
    

    T is the node type. Each sub-parser sets its subject. It is Expr for parse_expression, Stmt for parse_statement, Program for parse, and so on. Each token type (like Expr and Stmt) is enum defined inside resast.

    stmt for example, is defined as:

    pub enum Stmt<'a> {
        Expr(Expr<'a>),
        Block(BlockStmt<'a>),
        Empty,
        Debugger,
        With(WithStmt<'a>),
        Return(Option<Expr<'a>>),
        //...
    

    ProgramPart is:

    pub enum ProgramPart<'a> {
        /// A Directive like 'use strict';
        Dir(Dir<'a>),
        /// A variable, function or module declaration
        Decl(Decl<'a>),
        /// Any other kind of statement
        Stmt(Stmt<'a>),
    }
    

    Building an ast node follows the next pattern. Here we build a loop node:

    // first, we use helpers to parse constituants
    let list = self.parse_variable_decl_list(true)?;
    let init = some(loopinit::variable(kind, list));
    
    let test = some(self.parse_expression()?);
    
    let update = some(self.parse_expression()?)if;
    
    // here, we parse the body
    // we call parse_statement to parse the body.
    // we consider the body as one block statement.
    let body = self.parse_statement(some(stmtctx::for))?;
    
    // then, we instantiate the forstmt struct
    ok(forstmt { init, test, update, body: box::new(body) })
    

    I simplified the code by removing edge cases, error conditions, and debug statements. But, that is how we instantiate structs recursively and build the ast.

    Errors

    error module defines parsing errors. There is an Error enum with parsing errors as variants.

    Error propagation is natural in Rust with the interrogation mark. Each parsing method returns a Result<T, Error> intsance. Wehen the parsing suceeds, it returns an ast node inside the Ok variant:

    Ok(Expr::Spread(Box::new(arg)))
    

    otherwise, it returns an error inside an Err variant:

    if !self.context.is_module {
        return Err(Error::UseOfModuleFeatureOutsideOfModule(
            self.current_position,
            "es6 import syntax".to_string(),
        ));
    }
    

    Scopes

    Parser has lexical_names attribute, which is an instance of DuplicateNameDetector.

    pub struct DuplicateNameDetector<'a> {
        pub states: Vec<Scope>,
        lex: LexMap<'a>,
        var: VarMap<'a>,
        func: LexMap<'a>,
        first_lexes: Vec<Option<Cow<'a, str>>>,
        /// Hashmap of identifiers exported
        /// from this module and a flag for if they
        /// have a corresponding declaration
        undefined_module_exports: HashSet<Cow<'a, str>>,
        exports: HashSet<Cow<'a, str>>,
    }
    

    It tracks lexical scopes inside states. Each state is a variant of the Scope enum:

    pub enum Scope {
        Top,
        FuncTop,
        SimpleCatch,
        For,
        Catch,
        Switch,
        Block,
    }
    

    This stack is managed by new_child and remove_child. Those are, in turn, used by parser methods add_scope and remove_scope, which are called by the parser when entering a node that creates a new lexical scope.

    In parse_func, which creates a function AST node, here is a simplified version of what happens:

    fn parse_func(
        &mut self,
    ) -> Res<Func<'b>> {
        self.add_scope(lexical_names::Scope::FuncTop);
        let params = self.parse_func_params()?;
        let body = self.parse_function_source_el()?;
        self.remove_scope();
        let f = Func {
            id,
            params: params.params,
            body,
            is_async,
            generator: is_gen,
        };
        Ok(f)
    }
    

    DuplicateNameDetector keeps identifiers inside lex, var, and func while assuring the unicity of identifiers.

    func is a map of functions. lex is a map of let or const and let-defined variables. var is a map of var-defined variables.

    The parsing fails if we try to add a new scope element that has the same name as an existing one. DuplicateNameDetector.declare, which adds a new element to the current scope, is implemented as follows (simplified version):

    pub fn declare(
        &mut self,
        i: Cow<'a, str>,
        kind: DeclKind,
        pos: Position
    )
        -> Res<()> {
        match kind {
            DeclKind::Lex(is_module) => {
                self.check_var(i.clone(), pos)?;
                self.check_func(i.clone(), pos)?;
                self.add_lex(i, pos)
            }
            DeclKind::Func(is_module) => {
                self.check_lex(i.clone(), pos)?;
                if !state.funcs_as_var(is_module) {
                    self.check_var(i.clone(), pos)?;
                }
                self.add_func(i, pos)
            }
            DeclKind::SimpleCatch => {
                self.lex.insert(i.clone(), pos);
                Ok(())
            }
        }
    }
    

    If check_var or check_func returns an error, declare and the parser itself return it.