I started delving into UglifyJS internals a couple of months ago.
I wrote about the building blocks in the previous post.
Compressor remains the most complicated module of the library.
It minimizes the number of nodes in the result AST.
UglifyJS parses the input file and hands the result to Compressor, which creates the optimized AST.
The Output module transforms that into a shorter code.
I will introduce different parts of the module below.
While the code remains the single source of truth, you may learn about how Compressor plays with nodes.
I will keep updating this post as I learn more about the library. You are welcome to send suggestions and improvements in
the meantime.
Usage
Compressor depends only on AST_Node and its subtypes. You give it an instance of AST_TopLevel.
It gives you back an optimized one.
To compress a node you write:
compressor = new Compressor(options)
compressedNode = compressor.compress(node)
UglifyJS compresses the code after it parses the input into an AST_TopLevel node and before it mangles the names.
In the minification module, the minifier checks for the compress option then optimizes the parse result.
if (options.compress)
toplevel = new Compressor(options.compress).compress(toplevel);
That is the only use of Compressor in UglifyJS.
Even in the test harness, Compressor is considered a part of the minification. It has no tests of its own.
Overview
Compressor is a TreeTransformer, and thus a TreeWalker.
When we call Compressor.compress with an AST_TopLevel node, the Compressor resolves definitions in that scope
and children scopes and sets up scope details for all nodes in the tree using AST_Scope.figure_out_scope.
Then, it attempts to compress and recompress the tree a given number of times.
It executes and re-executes the compression command until the code is compressed or the ceil of tries is reached.
It stops when two successive compressions give the same number of nodes, or when the number of executions the number reaches
the maximum number passes set by options.passes.
To compress a node, Compressor compresses the leaves, then their parents successively, until it reaches the root.
Compressor is a TreeTransformer. It has a before method where it optimize the node and returns the result.
Options
Compressor has its set of options, which
we inject into its constructor. They drive most decisions in the compression logic.
The constructor accepts the options object and creates its proper options object that sets default values
for undefined options. Inside the compression logic, it queries the options using its method Compressor.option(name).
For example:
// Treat parameters as collapsible in IIFE, i.e.
// function(a, b){ ... }(x());
// would be translated into equivalent assignments:
// var a = x(), b = undefined;
if (stat_index == 0 && compressor.option("unused")) extract_args();
The constructor of Compressor accepts two arguments. The options object and falseByDefault.
The second argument is a boolean that, when truthy, tells the Compressor to act conservatively.
Not all options are used in the same shape in which they are received. The following options are mapped to different formats to simplify their consumption:
-
global_defs:Global definitions are received as an object.
Compressorloops through this object and evaluates properties whose key starts with@. This option is used to resolve definitions in the top level node and to send a warning when a global definition is being mutated or redefined. This is done usingAST_TopLevel.resolve_defines, which is called at the beginning of the compression. -
keep_fargs:This option may either be
"strict",true, orfalse. It tellsCompressorwhether to keep unused function arguments or to discard them.Compressordefines a predicateCompressor.drop_fargsthat discards arguments when the value ofkeep_fargsis false, and keeps them when the value is true. If the value is “strict” however, the result of the predicate depends on the definition of the function and its parent node.Compressor.drop_fargsis called inAST_Scope.drop_unusedat the end of the compression of a node, and inAST_Sub.optimize(AST_Subin an index-style property access, i.e.a["foo"]) to see whether the optimization logic should define missing arguments for the function. The last is needed whendrop_unusedreturn true and the body of the function accessesarguments[index]whereindexis greater than the number of defined arguments. -
top_retain:This is a list of toplevel functions and variables that
Compressorshould keep, even when unused. For so, it defines a predicate that given a definition, returns whether the definition can be removed or not. -
pure_funcs:It contains the list of functions to be assumed pure. The
Compressordefines a predicateCompressor.pure_funcsthat, given anAST_Call, tells whether to assume that the call is pure or not.Compressoroften tries to remove statements that have no side effects and that the code has no use of their result. AS it checks different node types for side effects, it usesCompressor.pure_funcsfor calls.
Optimization
Compressor defines a method optimize for each AST_Node, which returns either an optimized AST_Node,
the same node if there is nothing to optimize, or an AST_EmptyStatement if the node should be discarded.
Compressor discards a node in many settings:
when it is already empty, like an AST_Block with no statements, when an option asks for it,
like an AST_Debugger node when options.drop_debugger is truthy, a redefined directive, a return; at the end of a function, …
I will introduce tighten_body, the optimization logic for an array of statements, an array of AST_statement nodes.
It includes many patterns and it manipulates the code in interesting ways.
All of AST_Block, AST_BlockStatement, and AST_Lambda, and AST_Try optimize use tighten_body to optimize their body.
Each contains:
self.body = tighten_body(self.body, compressor);
tighten_body contains the following loop:
do {
CHANGED = false;
eliminate_spurious_blocks(statements);
if (compressor.option("dead_code")) {
eliminate_dead_code(statements, compressor);
}
if (compressor.option("if_return")) {
handle_if_return(statements, compressor);
}
if (compressor.sequences_limit > 0) {
sequencesize(statements, compressor);
sequencesize_2(statements, compressor);
}
if (compressor.option("join_vars")) {
join_consecutive_vars(statements);
}
if (compressor.option("collapse_vars")) {
collapse(statements, compressor);
}
} while (CHANGED && max_iter-- > 0);
eliminate_spurious_blocks
It removes all AST_EmptyStatement, flattens the body of AST_BlockStatement, and removes duplicated directives.
Flattening works if AST_BlockStatement contains a statement that is also an AST_BlockStatement.
It removes it, and puts its statements in the body of the original block.
eliminate_dead_code
It looks for jumps, return, throw, break, or continue.
When it finds an unreachable code, it uses a TreeWalker to navigate it, moves variables and functions declarations
up, and removes what remains.
It removes initialization for variables too, moving only the declaration up.
handle_if_return
This function deals with early jumps within the if statements in the given block.
Each if statement has three components, a condition (a predicate), a body to be executed if the condition is satisfied, and an optional alternative to be executed if the condition is not.
A jump is a break; or a continue;. Both interrupts the control flow.
handle_if_return walks the statements backward to optimize each statement using the optimized next ones.
This helps when you are dealing with early jumps.
It:
-
removes the following statements when block being optimized is a function and there is an
AST_Returnfollowed only by variables declarations (without initialization). -
transforms an
AST_Return.value ==AST_UnaryPrefixwith “void” value into a simple statement withAST_Return.value.expression as the body. -
negates the condition and reverts the body and the alternative when;
1 - The body contains a jump (
breakorcontinue) that jumps back to the beginning of the block being optimized.2 - The if statement has no alternative, it is followed by a jump, and the negated condition is shorter than normal condition.
3 - The alternative contains a jump back to the beginning of the block being optimized.
In 1- and 3-, it removes the non-function-definitions that follow the conditional and put them in the branch free from a jump (body in 1- and alternative in 3-).
-
compresses if statements where the body contains only a return statement and the alternative is empty as follow:
// if (foo()) return; return; ==> foo(); return; // if (foo()) return x; return y; ==> return foo() ? x : y; // if (foo()) return x; [ return ; ] ==> return foo() ? x : undefined; // if (a) return b; if (c) return d; e; ==> return a ? b : c ? d : void e;
sequencesize
It joins consecutive AST_SimpleStatement, or a list of AST_SimpleStatement interrupted only by declarations
(AST_Defun or AST_Definitions with declarations_only) into one AST_Sequence.
An AST_Sequence is a statement that contains an array of expressions.
sequencesize removes expressions in the middle that does not affect the return value of the sequence and have no side effects.
sequencesize_2
It tries to put each AST_SimpleStatement (a statement consisting of one expression, we call it prev) into the following
AST_StatementWithBody (a statement that has a body) as follow:
AST_Exit(returnorthrow): changes the value to anAST_Sequenceincludingprevand the valueAST_ForIn: addsprevto the definition of object we are looping through (creating anAST_Sequnce)AST_If: addsprevto the condition (creating anAST_Sequnce)AST_Switch: addsprevto the “switch” discriminant (creating anAST_Sequnce)AST_With: adds prev to the expression (creating anAST_Sequnce)-
AST_For: It putsprevin the loop init code if init is not an instance ofAST_Definitionsor the loop has no init code, creating anAST_Sequnce. When the init is anAST_Definitions, there may be problems with the result sequence. It avoids addingprevto the init code ifprevcontains a function definition (it won’t be accessible) or an expression with aninoperator (it causes issues with the following expression in a sequence)`for(console.log('hi'), var i = 0;;;) {}` // `SyntaxError: Unexpected token var` `for(console.log('hi'), i = 0;;;) {}` // works
join_consecutive_vars
This is not far from the previous function.
It modifies the same types of AST_StatementWithBody as sequencesize_2 in the same way.
But, instead of joining statements, it joins variable assignments using join_assigns.
join_assigns takes prev and an AST_Assign or an AST_Sequence, and returns an array of expressions.
Compressor uses join_assigns_exprs to build an AST_Sequence using that array.
If prev is AST_Definitions, try to trim_assigns into last AST_VarDef of it exprs (move assignments to prev)
otherwise (none moved or prev is not AST_Definitions),
find an expression of AST_Assign, operator = “=”, .left is AST_SymbolRef, and there is
nothing to trim_assigns from the following expressions. If there is, return;
trim_assigns helps when the prev is an assignment of an object’s property.
It accepts an AST_VarDef name and value, and an array of AST_Assign.
It walks the array from beginning to end, when it finds an assignment to a property of the given AST_VarDef name
that does not change a property in the given AST_VarDef value, it adds the assignment into the AST_VarDef value.
collapse
“This is left as an exercise for the reader” 😈. Have fun!.