Create Custom Model Architecture¶

Introduction¶

In this notebook, we will create a custom model architecture in a similar fashion to NSEs built-in architectures. For brevity, we will use a very simple fully-connected topology, but the same principles can be applied to more complex architectures.

Major concepts covered in this notebook:

Leverage Pydantic to define a custom model architecture parameters
Create a custom model architecture by subclassing keras.Model
Create a functional version of the custom model architecture

In [15]:

Copied!

import keras
import numpy as np
from pydantic import BaseModel, Field
import keras
import numpy as np
from pydantic import BaseModel, Field

Define model parameters¶

The first step is to define the parameters for the model. The preferred way to do this is to use Pydantic models or dataclasses. For this model, we will need to take the following parameters:

The number of fully-connected layers
The number of neurons in each layer
The activation function for each layer

Rather than passing these parameters as a nested list, we will leverage Pydantic to create a data model that will make it easier to know what parameters are required, what their types are, and perform validation.

In [3]:

Copied!





class CustomLayerParams(BaseModel):
    """Fully connected layer parameters

    Attributes:
        units: int: Number of neurons in the layer
        activation: str: Activation
    """
    units: int = Field(..., ge=1, description="Number of neurons in the layer")
    activation: str = Field("relu", description="Activation function")

class CustomModelParams(BaseModel):
    """Fully connected neural network model parameters

    Attributes:
        layers: list[CustomLayerParams]: List of layers

    """
    layers: list[CustomLayerParams] = Field(..., min_length=1, description="List of layers")
class CustomLayerParams(BaseModel):
    """Fully connected layer parameters

    Attributes:
        units: int: Number of neurons in the layer
        activation: str: Activation
    """
    units: int = Field(..., ge=1, description="Number of neurons in the layer")
    activation: str = Field("relu", description="Activation function")

class CustomModelParams(BaseModel):
    """Fully connected neural network model parameters

    Attributes:
        layers: list[CustomLayerParams]: List of layers

    """
    layers: list[CustomLayerParams] = Field(..., min_length=1, description="List of layers")

Let's create an example model definition¶

In [4]:

Copied!





params = CustomModelParams(layers=[
    CustomLayerParams(units=64, activation="relu"),
    CustomLayerParams(units=32, activation="relu"),
    CustomLayerParams(units=16, activation="relu"),
])
params = CustomModelParams(layers=[
    CustomLayerParams(units=64, activation="relu"),
    CustomLayerParams(units=32, activation="relu"),
    CustomLayerParams(units=16, activation="relu"),
])

In [5]:

Copied!

# Let's dump the model parameters
print(params.model_dump_json(indent=2))
# Let's dump the model parameters
print(params.model_dump_json(indent=2))

{
  "layers": [
    {
      "units": 64,
      "activation": "relu"
    },
    {
      "units": 32,
      "activation": "relu"
    },
    {
      "units": 16,
      "activation": "relu"
    }
  ]
}

Creating the model¶

Next, let's create the custom model generator routines. We will show two ways to create the model:

Subclassing keras.Model and defining the forward pass in the call method
Creating a functional version of the model using the functional API

In [6]:

Copied!

inputs = keras.Input(shape=(128,), name="inputs")
inputs = keras.Input(shape=(128,), name="inputs")

1. Create a keras.Model subclass¶

In [7]:

Copied!





class MyCustomModel(keras.Model):
    def __init__(self, params: CustomModelParams, num_classes: int|None = None, **kwargs):
        """Custom model

        Args:
            params (CustomModelParams): Model parameters
            num_classes (int|None): Number of classes for classification
        """
        super().__init__(**kwargs)
        self._dense_layers = [keras.layers.Dense(units=layer.units, activation=layer.activation) for layer in params.layers]
        if num_classes:
            self.output_act = keras.layers.Dense(num_classes, activation="softmax")
        else:
            self.output_act = None

    def call(self, inputs):
        """Forward pass

        Args:
            inputs: Input tensor
        """
        x = inputs
        for layer in self._dense_layers:
            x = layer(x)
        if self.output_act:
            x = self.output_act(x)
        return x
class MyCustomModel(keras.Model):
    def __init__(self, params: CustomModelParams, num_classes: int|None = None, **kwargs):
        """Custom model

        Args:
            params (CustomModelParams): Model parameters
            num_classes (int|None): Number of classes for classification
        """
        super().__init__(**kwargs)
        self._dense_layers = [keras.layers.Dense(units=layer.units, activation=layer.activation) for layer in params.layers]
        if num_classes:
            self.output_act = keras.layers.Dense(num_classes, activation="softmax")
        else:
            self.output_act = None

    def call(self, inputs):
        """Forward pass

        Args:
            inputs: Input tensor
        """
        x = inputs
        for layer in self._dense_layers:
            x = layer(x)
        if self.output_act:
            x = self.output_act(x)
        return x

Now let's instantiate the model and check the summary

In [8]:

Copied!

model = MyCustomModel(params, num_classes=10, name="custom_model")
model(inputs)
model.summary()
model = MyCustomModel(params, num_classes=10, name="custom_model")
model(inputs)
model.summary()

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1724953830.423422  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1724953830.443203  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1724953830.443316  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1724953830.444430  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1724953830.444532  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1724953830.444606  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1724953830.486949  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1724953830.487045  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-08-29 17:50:30.487089: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
I0000 00:00:1724953830.487116  180479 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-08-29 17:50:30.487159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21870 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9

Model: "custom_model"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 64)             │         8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 16)             │           528 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 10)             │           170 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 11,034 (43.10 KB)

 Trainable params: 11,034 (43.10 KB)

 Non-trainable params: 0 (0.00 B)

2. Create a functional version of the model¶

This is the preferred way to build models, as it allows for more flexibility and reusability.

Notice the functions actually returns a closure that builds the model. This is a common pattern in functional programming, and it allows us to pass parameters to the model building function.

In [9]:

Copied!





def dense_layer(params: CustomLayerParams) -> keras.Layer:
    """Create a dense functional layer

    Args:
        params: CustomLayerParams: Layer parameters

    Returns:
        keras.Layer: Closure that creates a dense layer
    """
    def layer(x: keras.KerasTensor) -> keras.KerasTensor:
        return keras.layers.Dense(units=params.units, activation=params.activation)(x)
    return layer

def custom_model_layer(params: CustomModelParams) -> keras.Layer:
    """Create a custom model layer

    Args:
        params: CustomModelParams: Model parameters

    Returns:
        keras.Layer: Closure that creates a custom model layer
    """
    def layer(x: keras.KerasTensor) -> keras.KerasTensor:
        for param in params.layers:
            x = dense_layer(param)(x)
        return x
    return layer


def custom_model(inputs: keras.Input, params: CustomModelParams, num_classes: int|None = None):
    """Create a custom model using functional API

    Args:
        inputs: keras.Input: Input tensor
        params: CustomModelParams: Model parameters
        num_classes: int|None: Number of classes

    Returns:
        keras.Model: Model
    """
    outputs = custom_model_layer(params)(inputs)
    if num_classes is not None:
        outputs = keras.layers.Dense(num_classes, activation="softmax")(outputs)
    return keras.Model(inputs=inputs, outputs=outputs, name="custom_model")
def dense_layer(params: CustomLayerParams) -> keras.Layer:
    """Create a dense functional layer

    Args:
        params: CustomLayerParams: Layer parameters

    Returns:
        keras.Layer: Closure that creates a dense layer
    """
    def layer(x: keras.KerasTensor) -> keras.KerasTensor:
        return keras.layers.Dense(units=params.units, activation=params.activation)(x)
    return layer

def custom_model_layer(params: CustomModelParams) -> keras.Layer:
    """Create a custom model layer

    Args:
        params: CustomModelParams: Model parameters

    Returns:
        keras.Layer: Closure that creates a custom model layer
    """
    def layer(x: keras.KerasTensor) -> keras.KerasTensor:
        for param in params.layers:
            x = dense_layer(param)(x)
        return x
    return layer


def custom_model(inputs: keras.Input, params: CustomModelParams, num_classes: int|None = None):
    """Create a custom model using functional API

    Args:
        inputs: keras.Input: Input tensor
        params: CustomModelParams: Model parameters
        num_classes: int|None: Number of classes

    Returns:
        keras.Model: Model
    """
    outputs = custom_model_layer(params)(inputs)
    if num_classes is not None:
        outputs = keras.layers.Dense(num_classes, activation="softmax")(outputs)
    return keras.Model(inputs=inputs, outputs=outputs, name="custom_model")

Similarly to the subclassed model, we can instantiate the functional model and check the summary

In [10]:

Copied!

model_fn = custom_model(inputs, params, num_classes=10)
model_fn.summary()
model_fn = custom_model(inputs, params, num_classes=10)
model_fn.summary()

Model: "custom_model"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ inputs (InputLayer)             │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_4 (Dense)                 │ (None, 64)             │         8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_5 (Dense)                 │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_6 (Dense)                 │ (None, 16)             │           528 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_7 (Dense)                 │ (None, 10)             │           170 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 11,034 (43.10 KB)

 Trainable params: 11,034 (43.10 KB)

 Non-trainable params: 0 (0.00 B)

Validate two versions match¶

Finally, we can validate that the two versions of the model are equivalent by comparing their outputs for a random input tensor.

Since the models are randomly initialized, we will copy the weights from the functional model to the subclassed model to ensure they are the same.

In [19]:

Copied!

model.set_weights(model_fn.get_weights())
model.set_weights(model_fn.get_weights())

In [20]:

Copied!

x = keras.random.normal((1, 128))
y = model(x)
y_fn = model_fn(x)
x = keras.random.normal((1, 128))
y = model(x)
y_fn = model_fn(x)

In [21]:

Copied!

if np.allclose(y, y_fn, rtol=1e-5, atol=1e-5):
    print("The model and the functional model are equivalent")
if np.allclose(y, y_fn, rtol=1e-5, atol=1e-5):
    print("The model and the functional model are equivalent")

The model and the functional model are equivalent