TensorFlow is an open-source, stable, and actively maintained solution for creating and evaluating ML models.
It might not be the perfect codebase to study though. It was first written in Python then ported to Typescript. But it’s widespread enough to be interesting.
TensorFlow.js works effortlessly on the browser. And that’s a big win as the latter is increasingly taking over operating system responsibilities.
You can check Tfjs source code on Github. Being a port, you’ll notice some Python design constraints sneaking into TS code. The code in this post is a simplification of the actual code. I tried to keep only the relevant lines.
Input Tensors
ML models are sophisticated “functions”. Their inputs and outputs have explicit shapes and types. And they can learn. They improve their inner working as they get more input/output samples.
Tfjs offers interfaces and building blocks to create models, train them, and directly apply predefined ones.
A model is a sequence of layers.
The input is interpreted and formatted by the first layer. It’s passed down to the following layers. And, it’s transformed into output by the last layer.
The layers in between transform, reshape, and filter the data.
We exercise a layer with predict():
const prediction = model.predict(inputTensor);
Any input needs to be transformed into Tensors before using it for prediction. The model accepts Tensors as input.
The “Tensor” is the “currency” of communication in Tfjs.
It’s a “data unit”, information that can be passed to the model, returned from it, and passed between its internal components.
“Physically”, it’s either a simple value, a one-dimensional array,
or a multi-dimensional array of a dtype type.
class Tensor {
dataId: DataId;
id: number; // Unique id of this Tensor
shape: number[]; // The shape of the Tensor
dtype: 'float32' | 'int32' | 'bool' | 'complex64' | 'string';
// ...
async data() {
return trackerFn().read(this.dataId);
}
// ...
The shape attribute specifies the dimensions of the array
and the size of each dimension.
To create a 3 x 1 int32 array, we can use tensor2d:
const indices = tf.tensor2d([0, 4, 2], [3, 1], 'int32');
The second argument describes the dimensions. The first argument is the value.
tensor3d builds a 2 x 2 x 2 3-dimensional array:
const x = tensor3d([[[1, 2], [3, 4]], [[-1, -2], [-3, -4]]]);
Here we pass the value.
Two attributes of Tensor define the Tensor identity: id and dataId.
id is the tensor unique identifier in the model.
dataId is a reference to the data denoted by the Tensor.
Such reference can be shared by multiple Tensors.
The value of the Tensor is not stored inside an instance of this class because it might be huge. It can be a big image for example.
Tfjs implements its memory management system. Intensive computation requires delicate memory leaks management and garbage collection routines. Due to the browser’s limited memory, v8 unpredictability, and other backends’ limitations, the memory is maintained by Tfjs’ own backend (WASM, GPU, WebGL, or CPU. More about this in the next post).
Tfjs defines another Tensor-like type to work with Tensors-without-data.
It’s used to describe the model inputs and outputs.
For example, we create a model that takes a 2x2 Tensor and we apply the dense layer:
const input = tf.input({shape: [3]}); // SymbolicTensor
const dense = tf.layers.dense({units: 2}).apply(input);
const model = tf.model({inputs: input, outputs: dense});
Both constants, input and dense are SymbolicTensor instances:
export class SymbolicTensor {
id: number;
name: string; // The fully scoped name of this Variable
sourceLayer: Layer; // The Layer that produced this symbolic Tensor.
inputs: SymbolicTensor[]; // The inputs passed to sourceLayer during prediction.
// ...
id is, same as in Tensor, a unique Tensor identifier.
name is a human-readable identifier that’s used for debugging and for tracking Tensors’ flow.
SymbolicTensor has also other attributes from Tensor that describe the
“real” Tensor (the output Tensor when it’ll be calculated for example)
such as shape and dtype.
Tfjs wraps the input Tensor instances inside a FeedDict dictionary
before passing it to the layers.
/**
* FeedDict: A mapping from unique SymbolicTensors to feed values for them.
* A feed value is a concrete value represented as a `Tensor`.
*/
export class FeedDict {
This type maps a SymbolicTensor to a Tensor.
The prediction makes use of both,
SymbolicTensor for the name, which can propagate across layers
and Tensor for the value, used for prediction.
Tfjs can also apply layers on either real Tensors or SymbolicTensors. The first is used for normal prediction. The second can be used to create a model container from a predefined configuration.
A feed is a type that combines both views of a Tensor:
interface Feed {
key: SymbolicTensor;
value: Tensor;
}
Output Tensors
The input and the output of prediction are arrays of Tensor instances.
The model knows the shape of the output when the model is created, before the prediction starts.
For prediction, Tfjs first initializes the output array (the array of Tensor instances) from the array of SymbolicTensor instances (that is, from the model description, which defines the placeholders for the “real” Tensors outputs):
const outSymTensorsNames = outputs.map(t => t.name);
const finalOutputs: Tensor[] = [];
for (const outputName of outSymTensorsNames) {
const nameExistsInInput = inputFeedDict.names().indexOf(outputName) !== -1;
if (nameExistsInInput) {
finalOutputs.push(inputFeedDict.getValue(outputName));
} else {
finalOutputs.push(null);
}
}
outputs is the initial list of output SymbolicTensors (from the model definition).
The logic searches for the output SymbolicTensor name in the input
(which is a FeedDict instance, that contains the input Tensors instances).
If found, it’s dropped into the output.
It’s put in the element with the same index.
If not, the Tensor value is initialized to null.
The holes (the null values) will be filled later, after exercising the layer.
Tfjs then sorts the Tensors identifiers “Topologically”.
A SymbolicTensor instance has an array of SymbolicTensor inputs
and a Layer instance named sourceLayer.
The layer applies the inputs to get the Tensor output.
Tensors depend on each other. An input Tensor can be shared by multiple output Tensors. And during prediction, the dependee should be calculated before the dependent, and calculated once.
Tfjs traverses the tree of SymbolicTensor instances depth-first
and adds each fresh SymbolicTensor
(one whose name has not yet been encountered during the traversal) to the result.
It’s a tree where each node is a SymbolicTensor and the children are its inputs’ SymbolicTensor instances.
The sorting outcome is an array of the model output SymbolicTensor elements
and their inputs’ dependencies (and transitive dependencies) SymbolicTensors elements.
Tfjs then applies the source layer of each element in this sorted array on the feedDict inputs:
const symbolicTensor = sorted[i];
const outputTensors: Tensor[] =
symbolicTensor
.sourceLayer
.apply(feedDictVlalues(symbolicTensor.inputs), /* ... */);
feedDictVlalues returns the input Tensor instances with
the same names as the sorted SymbolicTensor inputs.
Each application result, an array of Tensor instances, fills some
null holes from the output Tensors array (created at the beginning)
and augments the input feedDict for the next sorted element application.
// 1. Applying the source layer
const outputTensors = symbolicTensor.sourceLayer.apply(inputValues);
const nodeOutputs = getNodeOutputs(symbolicTensor);
// 2. Iterating over the `SymbolicTensor` source layer outputs
for (let i = 0; i < nodeOutputs.length; ++i) {
// 3. Adding the new Tensor to the input feed
if (!internalFeedDict.hasKey(nodeOutputs[i])) {
internalFeedDict.add(nodeOutputs[i], outputTensors[i]);
}
// 4. Filling the `null` holes in the final output
const index = outputNames.indexOf(nodeOutputs[i].name);
if (index !== -1) {
finalOutputs[index] = outputTensors[i];
}
}
getNodeOutputs extracts the array of SymbolicTensor instances
returned by the source layer application.
They’re placeholders for the Tensors returned by that layer.
They’re ordered in the same order as the layer Tensors output.
It’s used because it has the name and the id of the Tensor.
These values are used to find the index of the Tensor in the final
output, and to put the output Tensor inside feedDict for the next application.
Source layer application
Each output Tensor (modeled by a SymbolicTensor instance) is one
of multiple outputs of its sourceLayer.
class Layer {
name: string; // Name for this layer. Must be unique within a model.
inboundNodes: Node[];
outboundNodes: Node[];
// ...
Inbound nodes and outbound nodes define the flow of Tensors. The inbound nodes reference the layers that feed inside this layer, and the outbound nodes describe the layers that depend on it.
A Node instance is a joint that connects successive layers:
/**
* Each time a layer is connected to some new input,
* a node is added to `layer.inboundNodes`.
*
* Each time the output of a layer is used by another layer,
* a node is added to `layer.outboundNodes`.
*
*/
export class Node {
inputTensors: SymbolicTensor[]; // List of input Tensors.
outputTensors: SymbolicTensor[]; // List of output Tensors.
getNodeOutputs, from the previous section, searches for the node producing
the sorted Tensor inside symbolicTensor.sourceLayer.
It looks for the symbolicTensor.id inside each inbound node’s outputTensors.
When the node is found, its outputTensors are returned.
The layer itself can be exercised with apply():
const outputTensors: Tensor[] = sourceLayer.apply(inputValues, kwargs);
apply() can also be called directly (without a container model).
This is an example from its documentation:
const flattenLayer = tf.layers.flatten();
// Use tf.layers.input() to obtain a SymbolicTensor as input to apply().
const input = tf.input({shape: [2, 2]});
const output1 = flattenLayer.apply(input);
// output1.shape is [null, 4]. The first dimension is the undetermined
// batch size. The second dimension comes from flattening the [2, 2]
// shape.
console.log(JSON.stringify(output1.shape));
This is a simplified implementation of apply():
const noneAreSymbolic = checkNoneSymbolic(inputs);
// 1. Bulding
this.build(generic_utils.singletonOrArray(inputShapes));
// 2. Collecting output
if (noneAreSymbolic) {
return this.call(inputs, kwargs);
} else {
const inputShape = collectInputShape(inputs);
const outputShape: Shape[] = this.computeOutputShape(inputShape);
const outputDType = guessOutputDType(inputs);
return outputShape
.map((shape, index) => new SymbolicTensor(outputDType, shape, this, inputs, kwargs, this.name, index));
}
inputs is the array of given Tensors.
inputShape is an array of shapes.
Each element is the value of an input Tensor shape attribute.
Dense layer
During prediction, a layer is built and applied.
Both build() and call() are defined in the abstract Layer class and implemented
inside the specific layer class.
Dense implements the operation:
output = activation(dot(input, kernel) + bias)where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable ifuse_biasis True).
call() calculates the output.
It’s essentially matrix multiplication and addition:
K.dot(input, this.kernel.read(), fusedActivationName, this.bias.read());
fusedActivationName is the activation function,
it can be 'relu', 'linear', or 'elu'.
K.dot() is a Tensor multiplication function defined in the backend:
/**
* Multiply two Tensors and returns the result as a Tensor.
*
* For 2D Tensors, this is equivalent to matrix multiplication (matMul).
* For Tensors of higher ranks, it follows the Theano behavior,
* (e.g. `(2, 3) * (4, 3, 5) -> (2, 4, 5)`). From the Theano documentation:
*
* For N dimensions it is a sum product over the last axis of x and the
* second-to-last of y:
*/
More about this in the next post.
The values of this.kernel.read() and this.bias.read() are
returned by the kernel initializer and bias initializer when building the layer.
Building a layer indeed means creating the weights.
Weights control the signal (or the strength of the connection) between two neurons. In other words, a weight decides how much influence the input will have on the output.
Here’s its implementation of build:
inputShape = getExactlyOneShape(inputShape);
const inputLastDim = inputShape[inputShape.length - 1];
this.kernel = new LayerVariable(
this.kernelInitializer.apply(
[inputLastDim, this.units],
),
/* ... */
);
this.bias = new LayerVariable(
this.biasInitializer.apply(
[this.units],
),
/* ... */
);
Tfjs models kernel and bias as layer variables.
/**
* A `tf.layers.LayerVariable` is similar to a `tf.Tensor` in that it has a
* dtype and shape, but its value is mutable. The value is itself represented
* as a`tf.Tensor`, and can be read with the `read()` method and updated with
* the `write()` method.
*/
this.units is passed to the layer during creation.
It specifies the number of neurons in the layer.
The shape of the kernel layer variable is the same as the output shape.
The shape of the bias is a one-dimensional array of this.units elements.
It’s added as-is to the result.
The default bias initializer is 'zeros', which sets all weights to 0.
The default kernel initializer is essentially:
let scale = 1.0 / Math.max(1, (shape[0] + shape[1]) / 2);
return truncatedNormal(shape, 0, Math.sqrt(scale), 'float32', seed);
Check Truncated normal distribution for a deep understanding.