CNTK 102: Hidden layers & live progress report widget

CNTK 102: Hidden layers & live progress report widget

 

Intro

Owing to a significant overlap in functionality and presentation between CNTK-101 and CNTK-102, the scope of this post will be limited to the newly added features of CNTK-102, as well as the new code and the new F# concepts. At the same time the IfSharp jupyter notebook will again closely follow the original in python, while also including the new stuff.

This time I will also largely forego reproducing code for fetching images or rendering latex notation, having made that point in the previous notebook. The Preparing workspace section will be shortened as well, except for the code referencing helper functions created in previous notebooks, which we are now seeing for the first time.

Long story short, I will be skipping the entirety of the linear data set training scenario as it is largely a rehash of the previous notebook, except of course for the code to build fully connected layers.

split icon

 

Retroactive changes

  1. Prepare workspace notebook (blog) updated to include FSharp.Control.AsyncSeq dependency.
  2. Undescore_separated variable names now indicate a global variable original to the Python tutorial
  3. Comments starting with // # indicate an unchanged comment from the original python code.

I'm also moving all reuseable chart related helper functions into their own script file at NbVisual.fsx

Preparing the workspace for CNTK in jupyter, as always

// Setup display support
#load "AsyncDisplay.fsx"
#load "XPlot.Plotly.fsx"

// Set globals for CNTK functions in helpers let device = CNTK.DeviceDescriptor.CPUDevice let dataType = CNTK.DataType.Float
// Helper functions created in previous notebooks #load "fsx/NBHelpers.fsx" #load "fsx/MiscellaneousHelpers.fsx" #load "fsx/NbVisual.fsx" #load "fsx/CntkHelpers.fsx" open NBHelpers open MiscellaneousHelpers open NbVisual open CntkHelpers open XPlot.Plotly open MathNet.Numerics

Since we are using the storage: none option in paket (see Preparing the workspace notebook) attempting to load AsyncDisplay.fsx as is produces errors. This is because the script looks to resolve its dependencies within the IfSharp.exe directory tree.

Running AsyncDisplay.Paket.fsx just like in the IfSharp feature notebook would make the library available there, but it has two important drawbacks:

  1. You are locked into using a particular version of the library
  2. /IfSharp/.paket/load/main.group.fsx now overrides the scope of this notebook, meaning we lose access to the libraries in the global .nuget folder unless we delete that particular script.

A simple solution is to just remove the first #load directive from AsyncDisplay.fsx to prevent it from looking for FSharp.Control.AsyncSeq in the IfSharp tree.

For more info on setting the device descriptor see the previous notebook under the heading Global variables
section splitter

 

CNTK 102: Feed Forward Network with Simulated Data

Link to original python notebook

Introduction

ML theory aside, the gist of this tutorial is:

  1. Learn to connect multiple layers like the one in CNTK 101 in order to form a deep learning model and
  2. Formalize a machine learning pipeline from data to model.

The pipeline consists of Data access, Data transformation, Model creation, Training and Evaluating. This is explictly modelled in ML.NET.

Apart from that, CNTK-102 is just CNTK-101 reloaded, with much of the same code reappearing. So in order to keep things interesting I embelished a bit:

  1. I added a section on using the freshly introduced hidden layer architecture to perform classification on a non linearly separable version of the data generator introduced in 101.
  2. I added an example of using FSharp.Control.AsyncSeq and IfSharp's AsyncDisplay.fsx to display training data on the notebook while training is ongoing.

Feed forward network setup

In C# Variables, Functions and Parameters types are mostly interchangeable when used as function arguments. These tiny helpers will help make the explicit conversion from CNTK.Function to CNTK.Variable easier to integrate in F# pipelines:

/// Convert Function to Variable
/// <remarks> CNTK helper </remarks>
let inline Var (x : CNTK.Function) = new Variable(x)

/// Convert Variable to Function
/// <remarks> CNTK helper </remarks>
let inline Fun (x : CNTK.Variable) = x.ToFunction()

/// Create a new linear layer in the WΒ·x+b pattern
/// <remarks> CNTK helper </helper>
let linearLayer (inputVar : Variable) outputDim =
    let inputDim = inputVar.Shape.[0] 
    
    // Note that unlike the python example, the dimensionality of the output
    // goes first in the parameter declaration, otherwise the connection 
    // cannot be propagated.
    let weightParam = new Parameter(shape [outputDim; inputDim], dataType, initialization, device, "Weights")
    let biasParam = new Parameter(shape [outputDim], dataType, 0.0, device, "Bias")    
    
    let dotProduct = CNTKLib.Times(weightParam, inputVar, "Weighted input")
    CNTKLib.Plus(Var dotProduct, biasParam, "Layer")    

/// Create a new linear layer and fully connect it to 
/// an existing one through a specified differentiable function
/// <remarks> CNTK helper </helper>
let denseLayer (nonlinearity: Variable -> Function) inputVar outputDim  =
    linearLayer inputVar outputDim
    |> Var |> nonlinearity |> Var

I changed the order parameter order of the denseLayer generator function by putting the activation function parameter first, in order to facilitate composition (it will become obvious how very shortly). Also, since the purpose of denseLayer is to connect two layers together, and the existing layer is presented as a CNTK.Variable, it seems prudent to preemptively convert the result from Function to Variable as well.

Just F# problems

It's a matter of time before all this back and forth converting from Function to Variable and back gets tiring, but for now let's just persevere. If you're not inclined to, I urge you to check Mathias Brandewinder's CNTK.FSharp project, where he presents an elegant abstraction in the form of a Tensor discriminated union that encapsulates Functions and Variables in a single type and a lot more on top.

Here's is the dense layer creation function according to the logic of the original in python:

// # Define a multilayer feedforward classification model
let fullyConnectedClassifierNet inputVar numOutputClasses hiddenLayerDim numHiddenLayers nonlinearity =
    let mutable h = denseLayer nonlinearity inputVar hiddenLayerDim
    for i in 1..numHiddenLayers do
        h <- denseLayer nonlinearity h hiddenLayerDim 
    
    // Note that we don't feed the output layer through 
    // the selected nonlinearity/activation function    
    linearLayer h numOutputClasses    

// # Create the fully connected classifier
let z = fullyConnectedClassifierNet input num_output_classes hidden_layers_dim num_hidden_layers (CNTKLib.Sigmoid)

Let's get idiomatic

Sadly there seem to be no managed helpers to facilitate dense layer creation.

So instead, here's a more functional version of our linear layer composing function, without mutable variables or for loops, and with the added bonus of enabling you to arbitrarily set the number and dimension of hidden layers in a single parameter.

/// Fully connected linear layer composition function
/// <remarks> CNTK helper </remarks>
let fullyConnectedClassifierNet' inputVar (hiddenLayerDims: int seq) numOutputClasses nonlinearity =
    (inputVar, hiddenLayerDims) 
    ||> Seq.fold (denseLayer nonlinearity)
    |> fun model -> linearLayer model numOutputClasses

In addition, passing an empty sequence instead of a list of hidden layer dimensions produces a linear model, just like if you had run linearLayer with the same params.

And here's how we would use this to produce a model identical to z:

let z' = 
    fullyConnectedClassifierNet' 
        input [hidden_layers_dim;hidden_layers_dim]  
        num_output_classes (CNTKLib.Sigmoid)

 

Run evaluation / testing

This time, instead of reproducing the identical functionality of the python tutorial let's save our future selves a bit of time and create a reusable version of the evaluation/testing process.

open MathNet.Numerics.LinearAlgebra

/// Evaluation of a Matrix dataset for a trained model
/// <remarks> CNTK helper </remarks>
let testMinibatch (trainer: CNTK.Trainer) (features: Matrix<float32>) (labels: Matrix<float32>) =
    let x,y = matrixToBatch features, matrixToBatch labels
    
    // It should be interesting to see if this convention
    // will hold for other topographies
    let input = trainer.Model().Arguments |> Seq.head
    let label = trainer.LossFunction().Arguments |> Seq.last
    
    let testBatch =
        [ (input, x);(label, y) ]
        |> dict
        |> AsUnorderedMapVariableValue
    
    trainer.TestMinibatch(testBatch , device)  
// # Generate new data
let test_minibatch_size = 25
let x_test,y_test = generateRandomDataSample test_minibatch_size input_dim num_output_classes

testMinibatch trainer x_test y_test
Output:
0.24

Non linear separation example

Seems a shame to go through all this trouble to build a hidden layer topography and not even try it on a more challenging example.

Data Generation

open MathNet.Numerics.LinearAlgebra

// We achieve non linear separation by stealthily adding another output class,
// that we then assign to the first class, thus encircling the rest of the data.
let generateRandomNonlinearlySeparableDataSample sampleCount featureCount labelCount =     
    let x,y = generateRandomDataSample sampleCount featureCount (labelCount+1)
    let y' = 
        y 
        |> Matrix.toRowArrays 
        |> Array.map(
            fun line -> 
                if line.[labelCount] = 1.f 
                then line.[0] <- 1.f
                
                line.[0..labelCount-1])
        |> matrix
    
    x,y'

If you are using the version of MathNet linked in the FsLab nuget package as of the time of this writing you will get a conversion error when creating the new matrix from the transformed label data.

Type mismatch. Expecting a 'float32 [][] -> 'a' but given a 'int list list -> Matrix' The type 'seq' does not match the type 'int list list'

To resolve this make sure your paket dependencies point the latest version of MathNet.Numerics and MathNet.Numerics.FSharp

Nonlinear data sample

generateRandomNonlinearlySeparableDataSample 64 input_dim num_output_classes    
||> simpleScatterPlot "feature 1" "feature 2"
Output: nonlinear dataset sample

Data Transformation

Adding a new class and then assigning it to one of the previous classes after the fact has the side effect of creating and imbalanced dataset, since we now have a class with twice as many samples as any of the others.

In case of a two-class dataset this is the most pronounced, since samples from the two classes will be produced at a 2:1 rate relative to each other. This makes convergence harder than it has to be and also produces misleading evaluation results since for instance always predicting class 1 gives 67% success.

let rnd = new Random()
let shuffle = Seq.sortBy (fun _ -> rnd.Next())

// This slightly awkward function truncates the overpopulated class to match
// the size of the others, and makes sure the selected subset is randomly 
// distributed between the two clusters (i.e. the original doubled class
// and the spurious additional set)
let stratifiedSampling (features: Matrix<float32>) (labels: Matrix<float32>) =    
    let minLength = 
        labels 
        |> Matrix.toRowArrays 
        |> Array.countBy id 
        |> Array.map snd 
        |> Array.min
    
    Seq.zip (features.ToRowArrays()) (labels.ToRowArrays())
    |> shuffle
    |> Seq.groupBy snd
    |> Seq.map (fun (key, grp) -> grp |> Seq.take minLength)
    |> Seq.collect id
    |> shuffle
    |> Seq.map (fun (f,l) -> Seq.append f l)
    |> matrix
    |> fun mtx -> mtx.[*,..1], mtx.[*,2..]

Model Creation

Here are all the parameters necessary for training gathered in one place for your convenience. I've placed them under their own module so we don't mess with the global scope to much, i.e. so you won't have to restart the kernel every time you want to experiment with the parameters, which I very much encourage you to do.

module NonLinear = 
    let inputDim, numOutputClasses = 2,2
    let learningRate = 0.001
    let minibatchSize = 100   
    let trainingCycles = 15000
    let reportSampleRate = 25
    let input = Variable.InputVariable(shape [|inputDim|], dataType, "Features")
    let label = Variable.InputVariable(shape [|numOutputClasses|], dataType, "Labels")
    let z = fullyConnectedClassifierNet' input [50;50] numOutputClasses (CNTKLib.Sigmoid)
    let loss = CNTKLib.CrossEntropyWithSoftmax(Var z, label)
    let error = CNTKLib.ClassificationError(Var z, label)
    let lrSchedule = new CNTK.TrainingParameterScheduleDouble(learningRate, uint32 CNTK.DataUnit.Minibatch)
    let learner = CNTKLib.SGDLearner(z.Parameters() |> ParVec, lrSchedule)
    let trainer = CNTK.Trainer.CreateTrainer(z, loss, error, ResizeArray<CNTK.Learner>([learner]))

Extra! Training (with live progress report!)

Converting our synchronous training loop is as easy as placing it in an async computational expression. We could stop there and have a very serviceable Async<> object to iterate over with AsyncSeq, but I thought I should pass a few arguments both to make the iterations per cycle to allow for some customization, and to make some additional info available about the general progress of the training.

From then on, displaying an updateable label is as simple as returning a string from Async<> and applying IfSharp Display to the resulting AsyncSeq. And you can even return renderable html strings!

In fact, here's a simple Bootstrap progress bar to get you started:

/// Bootstrap progress bars for training data reporting
/// <remarks> Notebook helper function </remarks>
let reportHtml info progress loss error =
    let progressBar kind label value =    
        System.String.Format(
            """<div class='progress' style='margin-top:5px; width: 600px'>
                   <div class='progress-bar progress-bar-{0} progress-bar-striped' 
                         role='progressbar' aria-valuenow='{0:f2}'
                         aria-valuemin='0' aria-valuemax='100' style='width: {1:f2}%'>
                        <span>{1:f2}% ({2})</span>
                   </div>
                </div>""", kind, value, label)

    [ progressBar "info" "Progress" progress
      progressBar "warning" "Loss" (loss * 100.)
      progressBar "danger" "Error" (error * 100.) ]
    |> List.reduce (+)
    |> sprintf """<div class='container'><h2>%s</h2>%s</div>""" info
IfSharp/Jupyter supports Bootstrap to some extent out of the box, so don't worry about having to do any extra work referencing css & js libraries.

And here's the asynchronous training loop:

open FSharp.Control

let trainCycle iterations finalCycle currentCycle htmlReport =
    // Our training cycle
    for i in 0..iterations do
        let features,labels =
            generateRandomNonlinearlySeparableDataSample NonLinear.minibatchSize 
                (NonLinear.input.Shape.[0]) NonLinear.numOutputClasses
            ||> stratifiedSampling
            |> fun (x,y) -> matrixToBatch x, matrixToBatch y

        let trainingBatch = 
            [(NonLinear.input, features);(NonLinear.label, labels)] |> dict

        NonLinear.trainer.TrainMinibatch(trainingBatch, true, device)
        |> ignore

        (* Let's skip the logging code to keep things shorter *)


    // Calculate training info
    let lossAverage = NonLinear.trainer.PreviousMinibatchLossAverage()
    let evaluationAverage = NonLinear.trainer.PreviousMinibatchEvaluationAverage()
    let current = 100. * (float currentCycle + 1.)/(float finalCycle)

    async {
        // Create report text
        let progress = 
            sprintf "[%s] %.1f%%" ("".PadLeft(int current,'=').PadRight(100,' ')) current
        let info =
            sprintf "Minibatch: %d of %d, Loss: %.4f, Error: %.2f" 
                ((currentCycle+1)*iterations) (finalCycle * iterations) lossAverage evaluationAverage;
        let progressBar = 
            if htmlReport then 
                reportHtml info current lossAverage evaluationAverage
            else 
                sprintf "<pre>%s\n %s</pre>" progress info
        // Send result to AsyncSeq 
        return progressBar |> Util.Html 
    } 

Let's try it:

let totalCycles = 128
let iterationsPerCycle = NonLinear.trainingCycles / totalCycles

AsyncSeq.initAsync (int64 totalCycles) 
    (fun i -> trainCycle iterationsPerCycle totalCycles (int i) true)  
|> Display
Output:
live training bootstrap widget demo

 

In the remote case that your browser of choice or jupyter installation has trouble displaying bootstrap components you can run the training function with htmlReport set to false and see a lo-fi version of the reporting widget instead:

 

live training bootstrap widget demo, ascii version

Evaluation

let test_minibatch_size = 128
let x_test,y_test = generateRandomNonlinearlySeparableDataSample test_minibatch_size input_dim num_output_classes

testMinibatch NonLinear.trainer x_test y_test
Output:
0.1953125

Visualization

modelSoftmaxOutputHeatmap "feature 1" "feature 2" [|0. .. 0.1 .. 15.|] NonLinear.z 
Output:non linear classification network output heatmap

The difference hidden layers make

Here's what happens when we attempt to classify this dataset by using a network without hidden layers:

let z = fullyConnectedClassifierNet' input [] numOutputClasses (CNTKLib.Sigmoid)
Output:
heatmap and plot of failed nonlinear separation

It's pretty obvious that a network without hidden layers simply can't deal with the nonlinearly separable dataset, for all intents and purposes treating the dataset like random noise, the error rate remaining steady at 50% and the loss value oscillating randomly.

In fact, the development of techniques that allow us to train multiple layer networks was an achievement of historic significance for the field of neural networks, dramatically expanding the scope of their usage.

My next batch of posts will follow the CNTK tutorial into image recognition and straight into another historic development that happened in relatively recent times, the emergence of convolutional topographies!

 

 

Links in article