Formula engine with reverse polish notation in JavaScript

The existing implementations of calculation engines in reverse Polish notation, which can be found on the Internet, are good for everyone, only they do not support functions such as round (), max (arg1; arg2, ...) or if (condition; true; false), which does such engines are useless from a practical point of view. The article presents an implementation of a formula engine based on reverse Polish notation that supports Excel-like formulas, which is written in pure JavaScript in an object-oriented style.



The following code demonstrates the engine's capabilities:



const formula = "if( 1; round(10,2); 2*10)";
const formula1 = "round2(15.542 + 0.5)";
const formula2 = "max(2*15; 10; 20)";
const formula3 = "min(2; 10; 20)";
const formula4 = "round4(random()*10)";
const formula5 = "if ( max(0;10) ; 10*5 ; 15 ) ";
const formula6 = "sum(2*15; 10; 20)";

const calculator = new Calculator(null);
console.log(formula+" = "+calculator.calc(formula));    // if( 1; round(10,2); 2*10) = 10
console.log(formula1+" = "+calculator.calc(formula1));  // round2(15.542 + 0.5) = 16.04
console.log(formula2+" = "+calculator.calc(formula2));  // max(2*15; 10; 20) = 30 
console.log(formula3+" = "+calculator.calc(formula3));  // min(2; 10; 20) = 2
console.log(formula4+" = "+calculator.calc(formula4));  // round4(random()*10) = 5.8235
console.log(formula5+" = "+calculator.calc(formula5));  // if ( max(0;10) ; 10*5 ; 15 )  = 50
console.log(formula6+" = "+calculator.calc(formula6));  // sum(2*15; 10; 20) = 60


Before starting to describe the architecture of the formula engine, a few notes should be made:



  1. The Calculator object as an argument can take a data source of spreadsheet cells in the form of a Map, in which the key is the cell name in A1 format, and the value is a single token or an array of token objects into which the formula string is parsed when it is created. In this example, no cells are used in the formulas, so the data source is specified as null.
  2. Functions are written in the format [function_name] ([argument1]; [argument2]; ...).
  3. Spaces are not taken into account when writing formulas - when splitting a formula string into tokens, all whitespace characters are removed beforehand.
  4. The decimal part of a number can be separated by either a point or a comma - when splitting a formula string into tokens, the decimal point is converted to a point.
  5. Division by 0 results in 0, since in applied calculations in situations of possible division by 0, the function is substituted [if (divisor! = 0; dividend / divisor; 0)]


You can find quite a lot of materials on the Internet about the Polish notation itself, so it's better to start right away with describing the code. The source code for the formula engine itself is hosted at https://github.com/leossnet/bizcalc under the MIT license in the / js / data section and includes the calculator.js and token.js files . You can try the calculator right away in business at bizcalc.ru .



So, let's start with the types of tokens that are concentrated in the Types object:



const Types = {
    Cell: "cell" ,
    Number: "number" ,
    Operator: "operator" ,
    Function: "function",
    LeftBracket: "left bracket" , 
    RightBracket: "right bracket",
    Semicolon: "semicolon",
    Text: "text"
};


The following types have been added in comparison with standard engine implementations:



  • Cell: "cell" is the name of a cell in a spreadsheet that can contain text, a number or a formula;
  • Function: "function" - function;
  • Semicolon: "semicolon" - function argument separator, in this case ";";
  • Text: "text" - text that is ignored by the calculation engine.


As in any other engine, support for five main operators is implemented:



const Operators = {
    ["+"]: { priority: 1, calc: (a, b) => a + b },  // 
    ["-"]: { priority: 1, calc: (a, b) => a - b },  //
    ["*"]: { priority: 2, calc: (a, b) => a * b },  // 
    ["/"]: { priority: 2, calc: (a, b) => a / b },  // 
    ["^"]: { priority: 3, calc: (a, b) => Math.pow(a, b) }, //   
};


To test the engine, the following functions are configured (the list of functions can be expanded):



const Functions = {
    ["random"]: {priority: 4, calc: () => Math.random() }, //  
    ["round"]:  {priority: 4, calc: (a) => Math.round(a) },  //   
    ["round1"]: {priority: 4, calc: (a) => Math.round(a * 10) / 10 },
    ["round2"]: {priority: 4, calc: (a) => Math.round(a * 100) / 100 },
    ["round3"]: {priority: 4, calc: (a) => Math.round(a * 1000) / 1000 },
    ["round4"]: {priority: 4, calc: (a) => Math.round(a * 10000) / 10000 },
    ["sum"]:    {priority: 4, calc: (...args) => args.reduce( (sum, current) => sum + current, 0) },
    ["min"]:    {priority: 4, calc: (...args) => Math.min(...args) }, 
    ["max"]:    {priority: 4, calc: (...args) => Math.max(...args) },
    ["if"]:     {priority: 4, calc: (...args) => args[0] ? args[1] : (args[2] ? args[2] : 0) }
};


I think the above code speaks for itself. Next, consider the code of the token class:



class Token {

    //    "+-*/^();""
    static separators = Object.keys(Operators).join("")+"();"; 
    //    "[\+\-\*\/\^\(\)\;]"
    static sepPattern = `[${Token.escape(Token.separators)}]`; 
    //    "random|round|...|sum|min|max|if"
    static funcPattern = new RegExp(`${Object.keys(Functions).join("|").toLowerCase()}`, "g");

    #type;
    #value;
    #calc;
    #priority;


    /**
     *  ,         , 
     *        
     */
    constructor(type, value){
        this.#type = type;
        this.#value = value;
        if ( type === Types.Operator ) {
            this.#calc = Operators[value].calc;
            this.#priority = Operators[value].priority;
        }
        else if ( type === Types.Function ) {
            this.#calc = Functions[value].calc;
            this.#priority = Functions[value].priority;
        }
    }

    /**
     *      
     */

    /**
     *     
     * @param {String} formula -   
     */
    static getTokens(formula){
        let tokens = [];
        let tokenCodes = formula.replace(/\s+/g, "") //    
            .replace(/(?<=\d+),(?=\d+)/g, ".") //     ( )
            .replace(/^\-/g, "0-") //   0   "-"   
            .replace(/\(\-/g, "(0-") //   0   "-"   
            .replace(new RegExp (Token.sepPattern, "g"), "&$&&") //   &  
            .split("&")  //      &
            .filter(item => item != ""); //     
        
        tokenCodes.forEach(function (tokenCode){
            if ( tokenCode in Operators ) 
                tokens.push( new Token ( Types.Operator, tokenCode ));
            else if ( tokenCode === "(" )  
                tokens.push ( new Token ( Types.LeftBracket, tokenCode ));
            else if ( tokenCode === ")" ) 
                tokens.push ( new Token ( Types.RightBracket, tokenCode ));
            else if ( tokenCode === ";" ) 
                tokens.push ( new Token ( Types.Semicolon, tokenCode ));
            else if ( tokenCode.toLowerCase().match( Token.funcPattern ) !== null  )
                tokens.push ( new Token ( Types.Function, tokenCode.toLowerCase() ));
            else if ( tokenCode.match(/^\d+[.]?\d*/g) !== null ) 
                tokens.push ( new Token ( Types.Number, Number(tokenCode) )); 
            else if ( tokenCode.match(/^[A-Z]+[0-9]+/g) !== null )
                tokens.push ( new Token ( Types.Cell, tokenCode ));
        });
        return tokens;
    }

    /**
     *     
     * @param {String} str 
     */    
    static escape(str) {
        return str.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
	}    
}


The Token class is a container for storing indivisible text units into which a line of formulas is broken, each of which carries a certain functionality.



The constructor of the Token class takes as an argument the token type from the fields of the Types object, and as a value - an indivisible text unit extracted from the formula string.

The internal private fields of the Token class that store the value of the priority and the evaluated expression are defined in the constructor based on the values ​​of the Operators and Functions objects.



As an auxiliary method, the static function escape (str) is implemented, the code which is taken from the first found page on the Internet, escaping characters that the RegExp object perceives as special.



The most important method in the Token class is the getTokens static function, which parses the formula string and returns an array of Token objects. A small trick is implemented in the method - before splitting into tokens, the β€œ&” symbol is added to the separators (operators and parentheses), which is not used in formulas, and only then the β€œ&” symbol is split.



The implementation of the getTokens method itself is a loop comparison of all received tokens with templates, determining the token type, creating an object of the Token class and adding it to the resulting array.



This completes the preliminary work on preparing the calculations. The next step is the calculations themselves, which are implemented in the Calculator class:



class Calculator {
    #tdata;

    /**
     *  
     * @param {Map} cells  ,     
     */
    constructor(tableData) {
        this.#tdata = tableData;
    }

    /**
     *    
     * @param {Array|String} formula -     
     */
    calc(formula){
        let tokens = Array.isArray(formula) ? formula : Token.getTokens(formula);
        let operators = [];
        let operands = [];
        let funcs = [];
        let params = new Map();
        tokens.forEach( token => {
            switch(token.type) {
                case Types.Number : 
                    operands.push(token);
                    break;
                case Types.Cell :
                    if ( this.#tdata.isNumber(token.value) ) {
                        operands.push(this.#tdata.getNumberToken(token));
                    }
                    else if ( this.#tdata.isFormula(token.value) ) {
                        let formula = this.#tdata.getTokens(token.value);
                        operands.push(new Token(Types.Number, this.calc(formula)));
                    }
                    else {
                        operands.push(new Token(Types.Number, 0));
                    }
                    break;
                case Types.Function :
                    funcs.push(token);
                    params.set(token, []);
                    operators.push(token);             
                    break;
                case Types.Semicolon :
                    this.calcExpression(operands, operators, 1);
                    //      
                    let funcToken = operators[operators.length-2];  
                    //           
                    params.get(funcToken).push(operands.pop());    
                    break;
                case Types.Operator :
                    this.calcExpression(operands, operators, token.priority);
                    operators.push(token);
                    break;
                case Types.LeftBracket :
                    operators.push(token);
                    break;
                case Types.RightBracket :
                    this.calcExpression(operands, operators, 1);
                    operators.pop();
                    //       
                    if (operators.length && operators[operators.length-1].type == Types.Function ) {
                        //      
                        let funcToken = operators.pop();        
                        //     
                        let funcArgs = params.get(funcToken);   
                        let paramValues = [];
                        if ( operands.length ) {
                            //    
                            funcArgs.push(operands.pop());     
                            //      
                            paramValues = funcArgs.map( item => item.value ); 
                        }
                        //        
                        operands.push(this.calcFunction(funcToken.calc, ...paramValues));  
                    }
                    break;
            }
        });
        this.calcExpression(operands, operators, 0);
        return operands.pop().value; 
    }

    /**
     *    () 
     * @param {Array} operands  
     * @param {Array} operators   
     * @param {Number} minPriority     
     */
    calcExpression (operands, operators, minPriority) {
        while ( operators.length && ( operators[operators.length-1].priority ) >= minPriority ) {
            let rightOperand = operands.pop().value;
            let leftOperand = operands.pop().value;
            let operator = operators.pop();
            let result = operator.calc(leftOperand, rightOperand);
            if ( isNaN(result) || !isFinite(result) ) result = 0;
            operands.push(new Token ( Types.Number, result ));
        }
    }

    /**
     *   
     * @param {T} func -   
     * @param  {...Number} params -    
     */
    calcFunction(calc, ...params) {
        return new Token(Types.Number, calc(...params));
    }
}


As in the usual formula engine, all calculations are performed in the main function calc (formula), where either a formula string or a ready-made array of tokens is passed as an argument. If a formula string is passed to the calc method, it is pre-converted into an array of tokens.



As a helper method, the calcExpression method is used, which takes as arguments the operand stack, the operator stack, and the minimum operator precedence to evaluate the expression.



As an extension of the usual formula engine, a fairly simple function calcFunction is implemented, which takes the name of the function as arguments, as well as an arbitrary number of arguments to this function. The calcFunction evaluates the value of the formula function and returns a new Token object with a numeric type.



To calculate functions within the general cycle of calculations, a function stack and a Map for function arguments are added to the stacks of operands and operators, in which the key is the name of the function, and the values ​​are the array of arguments.



In conclusion, I will give an example of how you can use a data source in the form of a hash of cells and their values. To begin with, a class is defined that implements the interface that is used by the calculator:

class Data {
    #map;
    //  
    constructor() {
        this.#map = new Map();
    }
    //      
    add(cellName, number) {
        this.#map.set(cellName, number);
    }
    // ,     ,   Calculator.calc()
    isNumber(cellName) {
        return true;
    }
    //    ,   Calculator.calc()
    getNumberToken (token) {
        return new Token (Types.Number, this.#map.get(token.value) );
    }
}


Well, then it's simple. We create a data source containing cell values. Then we define a formula in which the operands are cell references. And in conclusion, we make calculations:

let data = new Data();
data.add("A1", 1);
data.add("A2", 1.5);
data.add("A3", 2);

let formula = "round1((A1+A2)^A3)";
let calculator = new Calculator(data);

console.log(formula+" = "+calculator.calc(formula));  // round1((A1+A2)^A3) = 6.3


Thank you for attention.