Thursday, April 4, 2013

How it Works - Part One


This is the second in a series of posts about how jsBind works. If you have not already done so you should read part zero first where I cover the background about how observable objects work.

In this part we are going to explore how the expression parser works.  The expression parser reads the binding strings defined in the data-jsBind HTML attributes and converts them into an expression tree so they can be evaluated. The binding strings are a strict subset of the JavaScript language so therefore the parser is implemented as a simplified recursive descent LR JavaScript parser. If that last bit made no sense at all don't worry I’m going to explain as we go, however if you have never read about how parsers work you might like to find out some background. I suggest you read Compilers And Compiler Generators by P. D. Terry. The original edition is now out of print so the author has published the whole book online. Another good book is Compilers, Principles, Techniques and Tools by Aho, Sethi and Ullman. This is otherwise known as the ‘red dragon book’ because of the picture of the red dragon on its cover. This is more theoretical than the first book so I suggest the first if you prefer to read code examples and the second if math works for you.

The parser is split into two parts. The first part called the tokenizer splits the string into pieces called tokens. Each token is equivalent to a word in a spoken language. The tokens are the numbers, literal strings, function names, language keywords, and the punctuation symbols that make up the expression. The tokenizer also takes care of skipping white space and new line characters etc.

The tokenizer is implemented in the getToken function of the Binder object. It uses a number of other functions for tokenizing numbers, strings and keywords etc.

The result of the tokenizer is a sequence of Token objects containing an indication of the kind of token and the value of the token. For example the tokenization of the following expression produces a list of tokens as shown:

“6 * 9 - $d.val”

CategoryValue
LiteralNumber6
Operator“*”
LiteralNumber9
Operator“-”
Identifier“$d”
Punctuation“.”
Identifier“val”

Once tokenized the expression is converted into an expression tree. This done by examining the string one token at a time from the left to the right (this is the LR bit of the parser). As each token is examined the parser either descends into a function that will handle that part of the language grammar or it will consume the token and return an expression object that represents the tokens parsed so far (this is the recursive descent bit). The sequence of function calls starts in the parseExpression function and from there uses all the functions that are prefixed with the name ‘parse’. These functions work to return a tree of Expression objects. There are nine expression object types, each handling a different kind of expression and most of them hold references to other expression objects to form the expression tree.

I mentioned the language grammar in parsing without explaining what it was. The grammar is the definition of what makes a valid sentence in the language. It forms a set of rules that ensure that for example a function call has commas that separate the parameters and that the parameters are enclosed in brackets or it defines that a number must be followed by an operator and another variable or a number. In the jsBind parser the grammar rules are encoded in the structure of the parser itself.

If we pass the token sequence from the previous example through the parser we get a tree structure of expression objects as follows:


Notice how the grammar rules have ensured that the operator precedence between the multiply and subtract operators have been applied to that the subtract is at the top of the tree and the multiplies are below. This will become more evident in the next post when we examine how expressions are evaluated but for now is good to know that the expressions are evaluated bottom up with the multiply happening first and the subtraction happening second.