Create Custom Model Architecture¶
Introduction¶
In this notebook, we will create a custom model architecture in a similar fashion to NSEs built-in architectures. For brevity, we will use a very simple fully-connected topology, but the same principles can be applied to more complex architectures.
Major concepts covered in this notebook:
- Leverage Pydantic to define a custom model architecture parameters
- Create a custom model architecture by subclassing
keras.Model
- Create a functional version of the custom model architecture
import keras
import numpy as np
from pydantic import BaseModel, Field
Define model parameters¶
The first step is to define the parameters for the model. The preferred way to do this is to use Pydantic models or dataclasses. For this model, we will need to take the following parameters:
- The number of fully-connected layers
- The number of neurons in each layer
- The activation function for each layer
Rather than passing these parameters as a nested list, we will leverage Pydantic to create a data model that will make it easier to know what parameters are required, what their types are, and perform validation.
class CustomLayerParams(BaseModel):
"""Fully connected layer parameters
Attributes:
units: int: Number of neurons in the layer
activation: str: Activation
"""
units: int = Field(..., ge=1, description="Number of neurons in the layer")
activation: str = Field("relu", description="Activation function")
class CustomModelParams(BaseModel):
"""Fully connected neural network model parameters
Attributes:
layers: list[CustomLayerParams]: List of layers
"""
layers: list[CustomLayerParams] = Field(..., min_length=1, description="List of layers")
Let's create an example model definition¶
params = CustomModelParams(layers=[
CustomLayerParams(units=64, activation="relu"),
CustomLayerParams(units=32, activation="relu"),
CustomLayerParams(units=16, activation="relu"),
])
# Let's dump the model parameters
print(params.model_dump_json(indent=2))
{ "layers": [ { "units": 64, "activation": "relu" }, { "units": 32, "activation": "relu" }, { "units": 16, "activation": "relu" } ] }
Creating the model¶
Next, let's create the custom model generator routines. We will show two ways to create the model:
- Subclassing
keras.Model
and defining the forward pass in thecall
method - Creating a functional version of the model using the
functional
API
inputs = keras.Input(shape=(128,), name="inputs")
1. Create a keras.Model subclass¶
class MyCustomModel(keras.Model):
def __init__(self, params: CustomModelParams, num_classes: int|None = None, **kwargs):
"""Custom model
Args:
params (CustomModelParams): Model parameters
num_classes (int|None): Number of classes for classification
"""
super().__init__(**kwargs)
self._dense_layers = [keras.layers.Dense(units=layer.units, activation=layer.activation) for layer in params.layers]
if num_classes:
self.output_act = keras.layers.Dense(num_classes, activation="softmax")
else:
self.output_act = None
def call(self, inputs):
"""Forward pass
Args:
inputs: Input tensor
"""
x = inputs
for layer in self._dense_layers:
x = layer(x)
if self.output_act:
x = self.output_act(x)
return x
Now let's instantiate the model and check the summary
model = MyCustomModel(params, num_classes=10, name="custom_model")
model(inputs)
model.summary()
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1724953830.423422 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1724953830.443203 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1724953830.443316 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1724953830.444430 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1724953830.444532 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1724953830.444606 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1724953830.486949 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1724953830.487045 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-08-29 17:50:30.487089: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. I0000 00:00:1724953830.487116 180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-08-29 17:50:30.487159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21870 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9
Model: "custom_model"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 32) │ 2,080 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 16) │ 528 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_3 (Dense) │ (None, 10) │ 170 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 11,034 (43.10 KB)
Trainable params: 11,034 (43.10 KB)
Non-trainable params: 0 (0.00 B)
2. Create a functional version of the model¶
This is the preferred way to build models, as it allows for more flexibility and reusability.
Notice the functions actually returns a closure that builds the model. This is a common pattern in functional programming, and it allows us to pass parameters to the model building function.
def dense_layer(params: CustomLayerParams) -> keras.Layer:
"""Create a dense functional layer
Args:
params: CustomLayerParams: Layer parameters
Returns:
keras.Layer: Closure that creates a dense layer
"""
def layer(x: keras.KerasTensor) -> keras.KerasTensor:
return keras.layers.Dense(units=params.units, activation=params.activation)(x)
return layer
def custom_model_layer(params: CustomModelParams) -> keras.Layer:
"""Create a custom model layer
Args:
params: CustomModelParams: Model parameters
Returns:
keras.Layer: Closure that creates a custom model layer
"""
def layer(x: keras.KerasTensor) -> keras.KerasTensor:
for param in params.layers:
x = dense_layer(param)(x)
return x
return layer
def custom_model(inputs: keras.Input, params: CustomModelParams, num_classes: int|None = None):
"""Create a custom model using functional API
Args:
inputs: keras.Input: Input tensor
params: CustomModelParams: Model parameters
num_classes: int|None: Number of classes
Returns:
keras.Model: Model
"""
outputs = custom_model_layer(params)(inputs)
if num_classes is not None:
outputs = keras.layers.Dense(num_classes, activation="softmax")(outputs)
return keras.Model(inputs=inputs, outputs=outputs, name="custom_model")
Similarly to the subclassed model, we can instantiate the functional model and check the summary
model_fn = custom_model(inputs, params, num_classes=10)
model_fn.summary()
Model: "custom_model"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ inputs (InputLayer) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_4 (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_5 (Dense) │ (None, 32) │ 2,080 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_6 (Dense) │ (None, 16) │ 528 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_7 (Dense) │ (None, 10) │ 170 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 11,034 (43.10 KB)
Trainable params: 11,034 (43.10 KB)
Non-trainable params: 0 (0.00 B)
Validate two versions match¶
Finally, we can validate that the two versions of the model are equivalent by comparing their outputs for a random input tensor.
Since the models are randomly initialized, we will copy the weights from the functional model to the subclassed model to ensure they are the same.
model.set_weights(model_fn.get_weights())
x = keras.random.normal((1, 128))
y = model(x)
y_fn = model_fn(x)
if np.allclose(y, y_fn, rtol=1e-5, atol=1e-5):
print("The model and the functional model are equivalent")
The model and the functional model are equivalent