# CNTK 102: Hidden layers & live progress report widget

# Intro

Owing to a significant overlap in functionality and presentation between CNTK-101 and CNTK-102, the scope of this post will be limited to the newly added features of CNTK-102, as well as the new code and the new F# concepts. At the same time the IfSharp jupyter notebook will again closely follow the original in python, while also including the new stuff.

This time I will also largely forego reproducing code for fetching images or rendering latex notation, having made that point in the previous notebook. The **Preparing workspace **section will be shortened as well, except for the code referencing helper functions created in previous notebooks, which we are now seeing for the first time.

Long story short, I will be skipping the entirety of the linear data set training scenario as it is largely a rehash of the previous notebook, except of course for the code to build fully connected layers.

# Retroactive changes

- Prepare workspace notebook (blog) updated to include FSharp.Control.AsyncSeq dependency.
- Undescore_separated variable names now indicate a global variable original to the Python tutorial
- Comments starting with
`// #`

indicate an unchanged comment from the original python code.

I'm also moving all reuseable chart related helper functions into their own script file at **NbVisual.fsx**

# Preparing the workspace for CNTK in jupyter, as always

```
// Setup display support
#load "AsyncDisplay.fsx"
#load "XPlot.Plotly.fsx"
```

// Set globals for CNTK functions in helpers
let device = CNTK.DeviceDescriptor.CPUDevice
let dataType = CNTK.DataType.Float

// Helper functions created in previous notebooks
#load "fsx/NBHelpers.fsx"
#load "fsx/MiscellaneousHelpers.fsx"
#load "fsx/NbVisual.fsx"
#load "fsx/CntkHelpers.fsx"
open NBHelpers
open MiscellaneousHelpers
open NbVisual
open CntkHelpers
open XPlot.Plotly
open MathNet.Numerics

Since we are using the `storage: none`

option in paket (see Preparing the workspace notebook) attempting to load **AsyncDisplay.fsx** as is produces errors. This is because the script looks to resolve its dependencies within the IfSharp.exe directory tree.

Running **AsyncDisplay.Paket.fsx** just like in the IfSharp feature notebook would make the library available there, but it has two important drawbacks:

- You are locked into using a particular version of the library
*/IfSharp/.paket/load/main.group.fsx*now overrides the scope of this notebook, meaning we lose access to the libraries in the global .nuget folder unless we delete that particular script.

A simple solution is to just remove the first #load directive from **AsyncDisplay.fsx** to prevent it from looking for **FSharp.Control.AsyncSeq** in the IfSharp tree.

**Global variables**

ML theory aside, the gist of this tutorial is:

- Learn to connect multiple layers like the one in CNTK 101 in order to form a deep learning model and
- Formalize a machine learning pipeline from data to model.

The pipeline consists of Data access, Data transformation, Model creation, Training and Evaluating. This is explictly modelled in ML.NET.

Apart from that, CNTK-102 is just CNTK-101 reloaded, with much of the same code reappearing. So in order to keep things interesting I embelished a bit:

- I added a section on using the freshly introduced hidden layer architecture to perform classification on a non linearly separable version of the data generator introduced in 101.
- I added an example of using
**FSharp.Control.AsyncSeq**and IfSharp's**AsyncDisplay.fsx**to display training data on the notebook while training is ongoing.

## Feed forward network setup

In C# Variables, Functions and Parameters types are mostly interchangeable when used as function arguments. These tiny helpers will help make the explicit conversion from CNTK.Function to CNTK.Variable easier to integrate in F# pipelines:

```
/// Convert Function to Variable
/// <remarks> CNTK helper </remarks>
let inline Var (x : CNTK.Function) = new Variable(x)
/// Convert Variable to Function
/// <remarks> CNTK helper </remarks>
let inline Fun (x : CNTK.Variable) = x.ToFunction()
/// Create a new linear layer in the WΒ·x+b pattern
/// <remarks> CNTK helper </helper>
let linearLayer (inputVar : Variable) outputDim =
let inputDim = inputVar.Shape.[0]
// Note that unlike the python example, the dimensionality of the output
// goes first in the parameter declaration, otherwise the connection
// cannot be propagated.
let weightParam = new Parameter(shape [outputDim; inputDim], dataType, initialization, device, "Weights")
let biasParam = new Parameter(shape [outputDim], dataType, 0.0, device, "Bias")
let dotProduct = CNTKLib.Times(weightParam, inputVar, "Weighted input")
CNTKLib.Plus(Var dotProduct, biasParam, "Layer")
/// Create a new linear layer and fully connect it to
/// an existing one through a specified differentiable function
/// <remarks> CNTK helper </helper>
let denseLayer (nonlinearity: Variable -> Function) inputVar outputDim =
linearLayer inputVar outputDim
|> Var |> nonlinearity |> Var
```

I changed the order parameter order of the **denseLayer** generator function by putting the activation function parameter first, in order to facilitate composition (it will become obvious how very shortly). Also, since the purpose of **denseLayer** is to connect two layers together, and the existing layer is presented as a `CNTK.Variable`

, it seems prudent to preemptively convert the result from Function to Variable as well.

#### Just F# problems

It's a matter of time before all this back and forth converting from Function to Variable and back gets tiring, but for now let's just persevere. If you're not inclined to, I urge you to check Mathias Brandewinder's CNTK.FSharp project, where he presents an elegant abstraction in the form of a *Tensor* discriminated union that encapsulates Functions and Variables in a single type and a lot more on top.

Here's is the dense layer creation function according to the logic of the original in python:

```
// # Define a multilayer feedforward classification model
let fullyConnectedClassifierNet inputVar numOutputClasses hiddenLayerDim numHiddenLayers nonlinearity =
let mutable h = denseLayer nonlinearity inputVar hiddenLayerDim
for i in 1..numHiddenLayers do
h <- denseLayer nonlinearity h hiddenLayerDim
// Note that we don't feed the output layer through
// the selected nonlinearity/activation function
linearLayer h numOutputClasses
// # Create the fully connected classifier
let z = fullyConnectedClassifierNet input num_output_classes hidden_layers_dim num_hidden_layers (CNTKLib.Sigmoid)
```

#### Let's get idiomatic

Sadly there seem to be no managed helpers to facilitate dense layer creation.

So instead, here's a more functional version of our linear layer composing function, without mutable variables or for loops, and with the added bonus of enabling you to arbitrarily set the number and dimension of hidden layers in a single parameter.

```
/// Fully connected linear layer composition function
/// <remarks> CNTK helper </remarks>
let fullyConnectedClassifierNet' inputVar (hiddenLayerDims: int seq) numOutputClasses nonlinearity =
(inputVar, hiddenLayerDims)
||> Seq.fold (denseLayer nonlinearity)
|> fun model -> linearLayer model numOutputClasses
```

In addition, passing an empty sequence instead of a list of hidden layer dimensions produces a linear model, just like if you had run `linearLayer`

with the same params.

And here's how we would use this to produce a model identical to `z`

:

```
let z' =
fullyConnectedClassifierNet'
input [hidden_layers_dim;hidden_layers_dim]
num_output_classes (CNTKLib.Sigmoid)
```

## Run evaluation / testing

This time, instead of reproducing the identical functionality of the python tutorial let's save our future selves a bit of time and create a reusable version of the evaluation/testing process.

```
open MathNet.Numerics.LinearAlgebra
/// Evaluation of a Matrix dataset for a trained model
/// <remarks> CNTK helper </remarks>
let testMinibatch (trainer: CNTK.Trainer) (features: Matrix<float32>) (labels: Matrix<float32>) =
let x,y = matrixToBatch features, matrixToBatch labels
// It should be interesting to see if this convention
// will hold for other topographies
let input = trainer.Model().Arguments |> Seq.head
let label = trainer.LossFunction().Arguments |> Seq.last
let testBatch =
[ (input, x);(label, y) ]
|> dict
|> AsUnorderedMapVariableValue
trainer.TestMinibatch(testBatch , device)
```

```
// # Generate new data
let test_minibatch_size = 25
let x_test,y_test = generateRandomDataSample test_minibatch_size input_dim num_output_classes
testMinibatch trainer x_test y_test
```

# Non linear separation example

Seems a shame to go through all this trouble to build a hidden layer topography and not even try it on a more challenging example.

## Data Generation

```
open MathNet.Numerics.LinearAlgebra
// We achieve non linear separation by stealthily adding another output class,
// that we then assign to the first class, thus encircling the rest of the data.
let generateRandomNonlinearlySeparableDataSample sampleCount featureCount labelCount =
let x,y = generateRandomDataSample sampleCount featureCount (labelCount+1)
let y' =
y
|> Matrix.toRowArrays
|> Array.map(
fun line ->
if line.[labelCount] = 1.f
then line.[0] <- 1.f
line.[0..labelCount-1])
|> matrix
x,y'
```

If you are using the version of MathNet linked in the FsLab nuget package as of the time of this writing you will get a conversion error when creating the new matrix from the transformed label data.

Type mismatch. Expecting a 'float32 [][] -> 'a' but given a 'int list list -> Matrix' The type 'seq' does not match the type 'int list list'

To resolve this make sure your paket dependencies point the latest version of **MathNet.Numerics** and **MathNet.Numerics.FSharp**

### Nonlinear data sample

```
generateRandomNonlinearlySeparableDataSample 64 input_dim num_output_classes
||> simpleScatterPlot "feature 1" "feature 2"
```

Output:
## Data Transformation

Adding a new class and then assigning it to one of the previous classes after the fact has the side effect of creating and imbalanced dataset, since we now have a class with twice as many samples as any of the others.

In case of a two-class dataset this is the most pronounced, since samples from the two classes will be produced at a 2:1 rate relative to each other. This makes convergence harder than it has to be and also produces misleading evaluation results since for instance always predicting class 1 gives 67% success.

```
let rnd = new Random()
let shuffle = Seq.sortBy (fun _ -> rnd.Next())
// This slightly awkward function truncates the overpopulated class to match
// the size of the others, and makes sure the selected subset is randomly
// distributed between the two clusters (i.e. the original doubled class
// and the spurious additional set)
let stratifiedSampling (features: Matrix<float32>) (labels: Matrix<float32>) =
let minLength =
labels
|> Matrix.toRowArrays
|> Array.countBy id
|> Array.map snd
|> Array.min
Seq.zip (features.ToRowArrays()) (labels.ToRowArrays())
|> shuffle
|> Seq.groupBy snd
|> Seq.map (fun (key, grp) -> grp |> Seq.take minLength)
|> Seq.collect id
|> shuffle
|> Seq.map (fun (f,l) -> Seq.append f l)
|> matrix
|> fun mtx -> mtx.[*,..1], mtx.[*,2..]
```

## Model Creation

Here are all the parameters necessary for training gathered in one place for your convenience. I've placed them under their own module so we don't mess with the global scope to much, i.e. so you won't have to restart the kernel every time you want to experiment with the parameters, which I very much encourage you to do.

```
module NonLinear =
let inputDim, numOutputClasses = 2,2
let learningRate = 0.001
let minibatchSize = 100
let trainingCycles = 15000
let reportSampleRate = 25
let input = Variable.InputVariable(shape [|inputDim|], dataType, "Features")
let label = Variable.InputVariable(shape [|numOutputClasses|], dataType, "Labels")
let z = fullyConnectedClassifierNet' input [50;50] numOutputClasses (CNTKLib.Sigmoid)
let loss = CNTKLib.CrossEntropyWithSoftmax(Var z, label)
let error = CNTKLib.ClassificationError(Var z, label)
let lrSchedule = new CNTK.TrainingParameterScheduleDouble(learningRate, uint32 CNTK.DataUnit.Minibatch)
let learner = CNTKLib.SGDLearner(z.Parameters() |> ParVec, lrSchedule)
let trainer = CNTK.Trainer.CreateTrainer(z, loss, error, ResizeArray<CNTK.Learner>([learner]))
```

## Extra! Training (with live progress report!)

Converting our synchronous training loop is as easy as placing it in an `async`

computational expression. We could stop there and have a very serviceable Async<> object to iterate over with AsyncSeq, but I thought I should pass a few arguments both to make the iterations per cycle to allow for some customization, and to make some additional info available about the general progress of the training.

From then on, displaying an updateable label is as simple as returning a string from Async<> and applying IfSharp `Display`

to the resulting AsyncSeq. And you can even return renderable html strings!

In fact, here's a simple Bootstrap progress bar to get you started:

```
/// Bootstrap progress bars for training data reporting
/// <remarks> Notebook helper function </remarks>
let reportHtml info progress loss error =
let progressBar kind label value =
System.String.Format(
"""<div class='progress' style='margin-top:5px; width: 600px'>
<div class='progress-bar progress-bar-{0} progress-bar-striped'
role='progressbar' aria-valuenow='{0:f2}'
aria-valuemin='0' aria-valuemax='100' style='width: {1:f2}%'>
<span>{1:f2}% ({2})</span>
</div>
</div>""", kind, value, label)
[ progressBar "info" "Progress" progress
progressBar "warning" "Loss" (loss * 100.)
progressBar "danger" "Error" (error * 100.) ]
|> List.reduce (+)
|> sprintf """<div class='container'><h2>%s</h2>%s</div>""" info
```

And here's the asynchronous training loop:

```
open FSharp.Control
let trainCycle iterations finalCycle currentCycle htmlReport =
// Our training cycle
for i in 0..iterations do
let features,labels =
generateRandomNonlinearlySeparableDataSample NonLinear.minibatchSize
(NonLinear.input.Shape.[0]) NonLinear.numOutputClasses
||> stratifiedSampling
|> fun (x,y) -> matrixToBatch x, matrixToBatch y
let trainingBatch =
[(NonLinear.input, features);(NonLinear.label, labels)] |> dict
NonLinear.trainer.TrainMinibatch(trainingBatch, true, device)
|> ignore
(* Let's skip the logging code to keep things shorter *)
// Calculate training info
let lossAverage = NonLinear.trainer.PreviousMinibatchLossAverage()
let evaluationAverage = NonLinear.trainer.PreviousMinibatchEvaluationAverage()
let current = 100. * (float currentCycle + 1.)/(float finalCycle)
async {
// Create report text
let progress =
sprintf "[%s] %.1f%%" ("".PadLeft(int current,'=').PadRight(100,' ')) current
let info =
sprintf "Minibatch: %d of %d, Loss: %.4f, Error: %.2f"
((currentCycle+1)*iterations) (finalCycle * iterations) lossAverage evaluationAverage;
let progressBar =
if htmlReport then
reportHtml info current lossAverage evaluationAverage
else
sprintf "<pre>%s\n %s</pre>" progress info
// Send result to AsyncSeq
return progressBar |> Util.Html
}
```

Let's try it:

```
let totalCycles = 128
let iterationsPerCycle = NonLinear.trainingCycles / totalCycles
AsyncSeq.initAsync (int64 totalCycles)
(fun i -> trainCycle iterationsPerCycle totalCycles (int i) true)
|> Display
```

## Evaluation

```
let test_minibatch_size = 128
let x_test,y_test = generateRandomNonlinearlySeparableDataSample test_minibatch_size input_dim num_output_classes
testMinibatch NonLinear.trainer x_test y_test
```

## Visualization

```
modelSoftmaxOutputHeatmap "feature 1" "feature 2" [|0. .. 0.1 .. 15.|] NonLinear.z
```

# The difference hidden layers make

Here's what happens when we attempt to classify this dataset by using a network without hidden layers:

`let z = fullyConnectedClassifierNet' input [] numOutputClasses (CNTKLib.Sigmoid)`

It's pretty obvious that a network without hidden layers simply can't deal with the nonlinearly separable dataset, for all intents and purposes treating the dataset like random noise, the error rate remaining steady at 50% and the loss value oscillating randomly.

In fact, the development of techniques that allow us to train multiple layer networks was an achievement of historic significance for the field of neural networks, dramatically expanding the scope of their usage.

My next batch of posts will follow the CNTK tutorial into image recognition and straight into another historic development that happened in relatively recent times, the emergence of convolutional topographies!

**Previous:**.NET deep learning stack: CNTK 101: Logistic Regression

**Next:**CNTK 102.5: Visualizing CNTK models with Graphviz and D3 in F# for Jupyter