TensorFlow is an open-source, stable, and actively maintained solution for creating and evaluating ML models.
It might not be the perfect codebase to study though. It was first written in Python then ported to Typescript. But it’s widespread enough to be interesting.
TensorFlow.js works effortlessly on the browser. And that’s a big win as the latter is increasingly taking over operating system responsibilities.
You can check Tfjs source code on Github. Being a port, you’ll notice some Python design constraints sneaking into TS code. The code in this post is a simplification of the actual code. I tried to keep only the relevant lines.
Input Tensors
ML models are sophisticated “functions”. Their inputs and outputs have explicit shapes and types. And they can learn. They improve their inner working as they get more input/output samples.
Tfjs offers interfaces and building blocks to create models, train them, and directly apply predefined ones.
A model is a sequence of layers.
The input is interpreted and formatted by the first layer. It’s passed down to the following layers. And, it’s transformed into output by the last layer.
The layers in between transform, reshape, and filter the data.
We exercise a layer with predict()
:
const prediction = model.predict(inputTensor);
Any input needs to be transformed into Tensors before using it for prediction. The model accepts Tensors as input.
The “Tensor” is the “currency” of communication in Tfjs.
It’s a “data unit”, information that can be passed to the model, returned from it, and passed between its internal components.
“Physically”, it’s either a simple value, a one-dimensional array,
or a multi-dimensional array of a dtype
type.
class Tensor {
dataId: DataId;
id: number; // Unique id of this Tensor
shape: number[]; // The shape of the Tensor
dtype: 'float32' | 'int32' | 'bool' | 'complex64' | 'string';
// ...
async data() {
return trackerFn().read(this.dataId);
}
// ...
The shape
attribute specifies the dimensions of the array
and the size of each dimension.
To create a 3 x 1
int32
array, we can use tensor2d
:
const indices = tf.tensor2d([0, 4, 2], [3, 1], 'int32');
The second argument describes the dimensions. The first argument is the value.
tensor3d
builds a 2 x 2 x 2
3-dimensional array:
const x = tensor3d([[[1, 2], [3, 4]], [[-1, -2], [-3, -4]]]);
Here we pass the value.
Two attributes of Tensor
define the Tensor identity: id
and dataId
.
id
is the tensor unique identifier in the model.
dataId
is a reference to the data denoted by the Tensor.
Such reference can be shared by multiple Tensors.
The value of the Tensor is not stored inside an instance of this class because it might be huge. It can be a big image for example.
Tfjs implements its memory management system. Intensive computation requires delicate memory leaks management and garbage collection routines. Due to the browser’s limited memory, v8 unpredictability, and other backends’ limitations, the memory is maintained by Tfjs’ own backend (WASM, GPU, WebGL, or CPU. More about this in the next post).
Tfjs defines another Tensor-like type to work with Tensors-without-data.
It’s used to describe the model inputs and outputs.
For example, we create a model that takes a 2x2
Tensor and we apply the dense layer:
const input = tf.input({shape: [3]}); // SymbolicTensor
const dense = tf.layers.dense({units: 2}).apply(input);
const model = tf.model({inputs: input, outputs: dense});
Both constants, input
and dense
are SymbolicTensor
instances:
export class SymbolicTensor {
id: number;
name: string; // The fully scoped name of this Variable
sourceLayer: Layer; // The Layer that produced this symbolic Tensor.
inputs: SymbolicTensor[]; // The inputs passed to sourceLayer during prediction.
// ...
id
is, same as in Tensor
, a unique Tensor identifier.
name
is a human-readable identifier that’s used for debugging and for tracking Tensors’ flow.
SymbolicTensor
has also other attributes from Tensor
that describe the
“real” Tensor (the output Tensor when it’ll be calculated for example)
such as shape
and dtype
.
Tfjs wraps the input Tensor instances inside a FeedDict
dictionary
before passing it to the layers.
/**
* FeedDict: A mapping from unique SymbolicTensors to feed values for them.
* A feed value is a concrete value represented as a `Tensor`.
*/
export class FeedDict {
This type maps a SymbolicTensor
to a Tensor
.
The prediction makes use of both,
SymbolicTensor
for the name, which can propagate across layers
and Tensor
for the value, used for prediction.
Tfjs can also apply layers on either real Tensors or SymbolicTensors. The first is used for normal prediction. The second can be used to create a model container from a predefined configuration.
A feed is a type that combines both views of a Tensor:
interface Feed {
key: SymbolicTensor;
value: Tensor;
}
Output Tensors
The input and the output of prediction are arrays of Tensor
instances.
The model knows the shape of the output when the model is created, before the prediction starts.
For prediction, Tfjs first initializes the output array (the array of Tensor instances) from the array of SymbolicTensor instances (that is, from the model description, which defines the placeholders for the “real” Tensors outputs):
const outSymTensorsNames = outputs.map(t => t.name);
const finalOutputs: Tensor[] = [];
for (const outputName of outSymTensorsNames) {
const nameExistsInInput = inputFeedDict.names().indexOf(outputName) !== -1;
if (nameExistsInInput) {
finalOutputs.push(inputFeedDict.getValue(outputName));
} else {
finalOutputs.push(null);
}
}
outputs
is the initial list of output SymbolicTensors (from the model definition).
The logic searches for the output SymbolicTensor
name in the input
(which is a FeedDict
instance, that contains the input Tensors instances).
If found, it’s dropped into the output.
It’s put in the element with the same index.
If not, the Tensor value is initialized to null
.
The holes (the null
values) will be filled later, after exercising the layer.
Tfjs then sorts the Tensors identifiers “Topologically”.
A SymbolicTensor
instance has an array of SymbolicTensor
inputs
and a Layer
instance named sourceLayer
.
The layer applies the inputs to get the Tensor output.
Tensors depend on each other. An input Tensor can be shared by multiple output Tensors. And during prediction, the dependee should be calculated before the dependent, and calculated once.
Tfjs traverses the tree of SymbolicTensor
instances depth-first
and adds each fresh SymbolicTensor
(one whose name has not yet been encountered during the traversal) to the result.
It’s a tree where each node is a SymbolicTensor
and the children are its inputs’ SymbolicTensor
instances.
The sorting outcome is an array of the model output SymbolicTensor
elements
and their inputs’ dependencies (and transitive dependencies) SymbolicTensors
elements.
Tfjs then applies the source layer of each element in this sorted array on the feedDict
inputs:
const symbolicTensor = sorted[i];
const outputTensors: Tensor[] =
symbolicTensor
.sourceLayer
.apply(feedDictVlalues(symbolicTensor.inputs), /* ... */);
feedDictVlalues
returns the input Tensor
instances with
the same names as the sorted SymbolicTensor
inputs.
Each application result, an array of Tensor
instances, fills some
null
holes from the output Tensors array (created at the beginning)
and augments the input feedDict
for the next sorted element application.
// 1. Applying the source layer
const outputTensors = symbolicTensor.sourceLayer.apply(inputValues);
const nodeOutputs = getNodeOutputs(symbolicTensor);
// 2. Iterating over the `SymbolicTensor` source layer outputs
for (let i = 0; i < nodeOutputs.length; ++i) {
// 3. Adding the new Tensor to the input feed
if (!internalFeedDict.hasKey(nodeOutputs[i])) {
internalFeedDict.add(nodeOutputs[i], outputTensors[i]);
}
// 4. Filling the `null` holes in the final output
const index = outputNames.indexOf(nodeOutputs[i].name);
if (index !== -1) {
finalOutputs[index] = outputTensors[i];
}
}
getNodeOutputs
extracts the array of SymbolicTensor instances
returned by the source layer application.
They’re placeholders for the Tensors returned by that layer.
They’re ordered in the same order as the layer Tensors output.
It’s used because it has the name and the id of the Tensor.
These values are used to find the index of the Tensor in the final
output, and to put the output Tensor inside feedDict
for the next application.
Source layer application
Each output Tensor (modeled by a SymbolicTensor
instance) is one
of multiple outputs of its sourceLayer
.
class Layer {
name: string; // Name for this layer. Must be unique within a model.
inboundNodes: Node[];
outboundNodes: Node[];
// ...
Inbound nodes and outbound nodes define the flow of Tensors. The inbound nodes reference the layers that feed inside this layer, and the outbound nodes describe the layers that depend on it.
A Node
instance is a joint that connects successive layers:
/**
* Each time a layer is connected to some new input,
* a node is added to `layer.inboundNodes`.
*
* Each time the output of a layer is used by another layer,
* a node is added to `layer.outboundNodes`.
*
*/
export class Node {
inputTensors: SymbolicTensor[]; // List of input Tensors.
outputTensors: SymbolicTensor[]; // List of output Tensors.
getNodeOutputs
, from the previous section, searches for the node producing
the sorted Tensor inside symbolicTensor.sourceLayer
.
It looks for the symbolicTensor.id
inside each inbound node’s outputTensors
.
When the node is found, its outputTensors
are returned.
The layer itself can be exercised with apply()
:
const outputTensors: Tensor[] = sourceLayer.apply(inputValues, kwargs);
apply()
can also be called directly (without a container model).
This is an example from its documentation:
const flattenLayer = tf.layers.flatten();
// Use tf.layers.input() to obtain a SymbolicTensor as input to apply().
const input = tf.input({shape: [2, 2]});
const output1 = flattenLayer.apply(input);
// output1.shape is [null, 4]. The first dimension is the undetermined
// batch size. The second dimension comes from flattening the [2, 2]
// shape.
console.log(JSON.stringify(output1.shape));
This is a simplified implementation of apply()
:
const noneAreSymbolic = checkNoneSymbolic(inputs);
// 1. Bulding
this.build(generic_utils.singletonOrArray(inputShapes));
// 2. Collecting output
if (noneAreSymbolic) {
return this.call(inputs, kwargs);
} else {
const inputShape = collectInputShape(inputs);
const outputShape: Shape[] = this.computeOutputShape(inputShape);
const outputDType = guessOutputDType(inputs);
return outputShape
.map((shape, index) => new SymbolicTensor(outputDType, shape, this, inputs, kwargs, this.name, index));
}
inputs
is the array of given Tensors.
inputShape
is an array of shapes.
Each element is the value of an input Tensor shape
attribute.
Dense layer
During prediction, a layer is built and applied.
Both build()
and call()
are defined in the abstract Layer
class and implemented
inside the specific layer class.
Dense implements the operation:
output = activation(dot(input, kernel) + bias)
where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable ifuse_bias
is True).
call()
calculates the output
.
It’s essentially matrix multiplication and addition:
K.dot(input, this.kernel.read(), fusedActivationName, this.bias.read());
fusedActivationName
is the activation function,
it can be 'relu'
, 'linear'
, or 'elu'
.
K.dot()
is a Tensor multiplication function defined in the backend:
/**
* Multiply two Tensors and returns the result as a Tensor.
*
* For 2D Tensors, this is equivalent to matrix multiplication (matMul).
* For Tensors of higher ranks, it follows the Theano behavior,
* (e.g. `(2, 3) * (4, 3, 5) -> (2, 4, 5)`). From the Theano documentation:
*
* For N dimensions it is a sum product over the last axis of x and the
* second-to-last of y:
*/
More about this in the next post.
The values of this.kernel.read()
and this.bias.read()
are
returned by the kernel initializer and bias initializer when building the layer.
Building a layer indeed means creating the weights.
Weights control the signal (or the strength of the connection) between two neurons. In other words, a weight decides how much influence the input will have on the output.
Here’s its implementation of build
:
inputShape = getExactlyOneShape(inputShape);
const inputLastDim = inputShape[inputShape.length - 1];
this.kernel = new LayerVariable(
this.kernelInitializer.apply(
[inputLastDim, this.units],
),
/* ... */
);
this.bias = new LayerVariable(
this.biasInitializer.apply(
[this.units],
),
/* ... */
);
Tfjs models kernel
and bias
as layer variables.
/**
* A `tf.layers.LayerVariable` is similar to a `tf.Tensor` in that it has a
* dtype and shape, but its value is mutable. The value is itself represented
* as a`tf.Tensor`, and can be read with the `read()` method and updated with
* the `write()` method.
*/
this.units
is passed to the layer during creation.
It specifies the number of neurons in the layer.
The shape of the kernel layer variable is the same as the output shape.
The shape of the bias is a one-dimensional array of this.units
elements.
It’s added as-is to the result.
The default bias initializer is 'zeros'
, which sets all weights to 0
.
The default kernel initializer is essentially:
let scale = 1.0 / Math.max(1, (shape[0] + shape[1]) / 2);
return truncatedNormal(shape, 0, Math.sqrt(scale), 'float32', seed);
Check Truncated normal distribution for a deep understanding.