
.NET deep learning stack: CNTK 101: Logistic Regression

Useful notes before starting
-
If you can't run F# jupyter notebooks locally go to IfSharp Project for binaries and installation instructions.
Be aware that if you are viewing this directly on GitHub you are missing features such as F# specific syntax highlighting, ligatures and additional markdown styling. Perhaps most importantly, you will not be able to view and interact with any XPlot visualizations.
Preparing the workspace for CNTK in jupyter
If referencing CNTK fails, make sure you have followed the instructions in my Preparing Workspace.ipynb notebook. The current notebook assumes that all necessary CNTK nuget DLLs have been copied to a folder named bin in the same path.
#r "netstandard"
#r @"bin\Cntk.Core.Managed-2.6.dll"
#load @".paket\load\main.group.fsx"
open System
open System.IO
Environment.GetEnvironmentVariable("PATH")
|> fun path -> sprintf "%s%c%s" path (Path.PathSeparator) (Path.GetFullPath("bin"))
|> fun path -> Environment.SetEnvironmentVariable("PATH", path)
open CNTK
DeviceDescriptor.UseDefaultDevice().Type
|> printfn "Congratulations, you are using CNTK for: %A"
CNTK 101: Logistic Regression and ML Primer
This notebook is primarily an F# port of CNTK_101_LogisticRegression. I have kept some of the original code comments to make it easier to follow along with the python notebook, but I am skipping the detailed explanations of machine learning concepts.
Intro
This being the first notebook, helper functions that emulate ipython functionality more precisely will be presented in full. As the series progresses, any such functions declared in previous notebooks will be referenced from a NbHelpers namespace without further comment.
To get things started, here's a small function to inline images from urls:
/// Simple wrapper to show inline images
/// from url with customizable width
/// <remarks> Notebook Helper Function </remarks>
let ImageUrl url width =
sprintf "<img src=\"%s\" style=\"width: %dpx; height: auto\" alt=\"Could not load image, make sure url is correct\">" url width
|> Util.Html
|> Display
// Figure 1
ImageUrl "https://www.cntk.ai/jup/cancer_data_plot.jpg" 400
// Figure 2
ImageUrl "https://www.cntk.ai/jup/cancer_classify_plot.jpg" 400
// Figure 3
ImageUrl "https://www.cntk.ai/jup/logistic_neuron.jpg" 300

Global variables, the first of many
I will be declaring any global parameters in the same sequence as the original python notebook, since this is meant as a companion piece. However, in the eventual independent fsx script for this notebook you can expect them to be laid out in a more meaningful manner.
let featureCount = 2
let labelCount = 2
let sampleCount = 32
let device = DeviceDescriptor.CPUDevice
You should take special notice of the device descriptor parameter that defines if a CNTK.Function runs on CPU or a GPU, as it pops up in a lot of CNTK functions. It's pretty much an open topic how to best treat it in F#. In this case I simply set a global variable, in the style of the original tutorial.
If you are interested in exploring the topic further be sure to check Mathias Brandenwinder's discussion of the issue and proposed functional solution here.
Data generation
Here's our first attempt at doing numpy with MathNet.Numerics. Out goal is to create a function that produces distinct (but not too distinct!) clusters of random points to be used as mock datasets.
Helpers
open MathNet.Numerics.Distributions;
let seed = 42
let rand = System.Random(seed)
let nrand = Normal(0.,1.,rand)
let randInt max = seq { while true do yield rand.Next() % max }
let randn = Normal.Samples(rand, 0.0, 1.0)
let oneHotEncoding classCount classType =
Array.init classCount (fun i -> if i = classType then 1.0f else 0.0f)
Mock data generator
open MathNet.Numerics.LinearAlgebra
let generateRandomDataSample sampleCount featureCount labelCount =
let Y = Array.init sampleCount
(fun _ -> float32 (rand.Next() % labelCount) )
let X = DenseMatrix.init sampleCount featureCount
(fun row col -> float32 (nrand.Sample() + 3.) * (Y.[row]+1.f) )
let oneHotLabel =
Y
|> Array.map(int>>(oneHotEncoding labelCount))
|> DenseMatrix.ofRowArrays
X, oneHotLabel
let x,y = generateRandomDataSample 32 2 2
Data visualization
In order to properly wire XPlot.Plotly to display in the notebook you need to either:
- Run paket.exe in the same folder as ifsharp.exe and reference XPlot from there, and then delete the .paket/load folder because otherwise it will supersede any calls to your own notebook's generated load scripts,
- or modify the #r call in XPlot.Plotly.fsx to point to where the relevant DLLs actually are.
This will by no means be a deep dive into XPlot's API, rather more of a gentle nudge.
// Setup display support
#load "XPlot.Plotly.fsx"
open XPlot.Plotly
let colors =
[for label in y.Column(0) do
yield if label = 0.f then "Red" else "Blue"]
Scatter(x = x.[*,0], y = x.[*,1],
mode = "markers",
marker = Marker(size=10, color=colors))
|> Chart.Plot
|> Chart.WithLayout (
Layout( xaxis=Xaxis(title="Tumor size (in cm)"),
yaxis=Yaxis(title="Age (scaled)")))
|> Chart.WithHeight 400
|> Chart.WithWidth 600

|> Chart.Show
as the last element of the chart pipeline.Model Creation
// Figure 4
ImageUrl "https://www.cntk.ai/jup/logistic_neuron2.jpg" 300
The IfSharp global for parsing LaTeX notation seems to work fantastically, except for \cdot which I put there with Alt+0183:
"z=\sum_{i=1}^n w_i \\times x_i+b= \\textbf{w Β· x}+b" |> Util.Math

Network setup
Here comes the CNTK managed API!
These will be the first of numerous small helper functions to make CNTK's .NET API more easily useable from F#, owing to a pair of distinct characteristics:
- The CNTK API is on average way more low level that the Python API we are aiming to match.
- It plays fast and loose with implicit type conversions in a way that F# is very uncomfortable with allowing.
You can find these functions independent of this notebook here.
/// In C# a function parameter of type NDShape apparently
/// can accept a simple int array and cast away implicitly.
///
/// Not so in F#.
/// <remarks>CNTK Helper function</remarks>
let inline shape (dims:int seq) : NDShape = NDShape.CreateNDShape dims
let dataType = CNTK.DataType.Float
If you are coming at this from a C# background, this is a good time to note that The F# float type corresponds to System.Double, and requires the CNTK model to be initialised with CNTK.DataType.Double.
Conversely, to use CNTK.DataType.Float such as we will do here, your numbers need to be typed as either float32 or single.
let featureVariable = Variable.InputVariable(shape [|featureCount|], dataType, "Features")
let initialization = CNTKLib.GlorotUniformInitializer(1.0)
let index = System.Collections.Generic.Dictionary<string, CNTK.Parameter>()
let linearLayer (inputVar : Variable) outputDim =
let inputDim = inputVar.Shape.[0]
let weightParam = new Parameter(shape [inputDim; outputDim], dataType, initialization, device, "Weights")
let biasParam = new Parameter(shape [outputDim], dataType, 0.0, device, "Bias")
index.Add("Weights", weightParam)
index.Add("Bias", biasParam)
// training works for w * i and not for i * w as in the python example
let dotProduct = CNTKLib.Times(weightParam, inputVar, "Weighted input")
let layer = CNTKLib.Plus(new Variable(dotProduct), biasParam, "Layer")
layer
let z = linearLayer featureVariable labelCount
This is a very hard mistake not to fall face first into if you are using the python tutorials as your main guide.
Training
"\\textbf{p}=softmax(z)" |> Util.Math
"H(p)=-\sum_{j=1}^{|y|}y_j log(p_j)" |> Util.Math
let labelVariable = Variable.InputVariable(shape [labelCount], dataType, "output")
let loss = CNTKLib.CrossEntropyWithSoftmax(new Variable(z), labelVariable)
Evaluation
let evalError = CNTKLib.ClassificationError(new Variable(z), labelVariable)
Although you can use a variety of functions for error evaluation and loss, the overall behavior of the network might change enough that you should think twice before including them In any sort of automated hyperparameter tuning scheme; not piping the network output through softmax for instance takes a lot out of our ability to interpret the output probabilistically.
Configure training
Helper functions
/// A sequence of Parameter objects needs to be converted
/// to type ParameterVector in order to be passed to CNTK functions.
/// <remarks> CNTK Helper function </remarks>
let ParVec (pars:Parameter seq) =
let vector = new ParameterVector()
pars |> Seq.iter (vector.Add)
vector
/// <remarks> Helper function </remarks>
let inline normalizeByMax(max:'T) (source : 'T seq) =
source |> Seq.map ((fun n -> float n/ float max)>>float32)
/// Convert MathNet 2d matrix to batch in one go, while accounting for
/// original dimensionality and numeric type.
/// <remarks> CNTK Helper function </remarks>
let matrixToBatch(m : Matrix<float32>) =
CNTK.Value.CreateBatch(shape [m.Rank()], m |> Matrix.transpose |> Matrix.toSeq, device)
/// Define a utility function to compute the moving average.
/// A more efficient implementation is possible with np.cumsum() function
/// <remarks> Helper Function.
/// *Summary from comments in python notebook</remarks>
let movingAverage (array : float seq) windowLength =
if (array |> Seq.length) >= windowLength
then array
|> Seq.windowed windowLength
|> Seq.map (Seq.average)
else seq [array |> Seq.average]
On the other hand, the following are exactly the sort of training parameters you should be playing around with when trying to decide what works best for your dataset:
// Instantiate the trainer object to drive the model training
let learningRate = 0.01
let lrSchedule = new CNTK.TrainingParameterScheduleDouble(learningRate, uint32 CNTK.DataUnit.Minibatch)
let learner = CNTKLib.SGDLearner(z.Parameters() |> ParVec, lrSchedule)
let trainer = CNTK.Trainer.CreateTrainer(z, loss, evalError, ResizeArray<CNTK.Learner>([learner]))
The training information logger
// # Define a utility that prints the training progress
/// A training progress logger
/// <remarks> Helper function </remarks>
let printTrainingProgress (trainer: CNTK.Trainer) minibatch frequency verbose =
if minibatch % frequency = 0
then
let mbla = trainer.PreviousMinibatchLossAverage()
let mbea = trainer.PreviousMinibatchEvaluationAverage()
if verbose then
printfn "Minibatch: %d, Loss: %.4f, Error: %.2f" minibatch mbla mbea
Some (minibatch, mbla, mbea)
else None
Run the trainer
Training parameters
let minibatchSize = 25
let numSamplesToTrain = 20000
let numMinibatchesToTrain = int (numSamplesToTrain/minibatchSize)
let progressOutputFreq = 50
type TrainReport = {
BatchSize: ResizeArray<int>
Loss: ResizeArray<float>
Error: ResizeArray<float> }
let plotdata = {
BatchSize = ResizeArray<int>()
Loss = ResizeArray<float>()
Error = ResizeArray<float>()
}
for i in [0..numMinibatchesToTrain] do
let x,y = generateRandomDataSample minibatchSize featureCount labelCount
let features,labels = matrixToBatch x, matrixToBatch y
// Assign the minibatch data to the input variables and train the model on the minibatch
let trainingBatch = [(featureVariable, features);(labelVariable, labels)] |> dict
let status = trainer.TrainMinibatch(trainingBatch, true, device)
// log training data
match (printTrainingProgress trainer i progressOutputFreq true) with
| Some (i,loss,eval) ->
plotdata.BatchSize.Add <| i
plotdata.Loss.Add <| loss
plotdata.Error.Add <| eval
| None -> ()
Visualizing training results:
let lossMax = plotdata.Loss |> Seq.max
let dash = Line(dash="dash")
[ Scatter(name="Loss (scaled)", line=dash,
x = plotdata.BatchSize,
y = (plotdata.Loss |> normalizeByMax lossMax))
Scatter(name="Error",
x = plotdata.BatchSize,
y = plotdata.Error, line=dash)]
|> Chart.Plot
|> Chart.WithLayout (Layout(title="Minibatch run",
xaxis=Xaxis(title="Minibatch number"),
yaxis=Yaxis(title="Cost")))
|> Chart.WithHeight 400

// Compute the moving average loss to smooth out the noise in SGD
let avgLoss = movingAverage (plotdata.Loss) 10
let avgError = movingAverage (plotdata.Error) 10
let maxAvgLoss = avgLoss |> Seq.max
[ Scatter(name="Average Loss (scaled)", line=dash,
x = plotdata.BatchSize, y = (avgLoss |> normalizeByMax maxAvgLoss))
Scatter(name="Average Error", line=dash,
x = plotdata.BatchSize, y = avgError)]
|> Chart.Plot
|> Chart.WithLayout
(Layout
(title = "Minibatch run", xaxis = Xaxis(title = "Minibatch number"),
yaxis = Yaxis(title = "Cost")))
|> Chart.WithHeight 400

Run evaluation / Testing
Let's generate a new dataset and see how good a job the model we just trained does in separating the different categories!
open System.Collections.Generic
/// Convert dictionary to Variable -> Value map for CNTK
/// Ported from https://github.com/Microsoft/CNTK/blob/master/bindings/csharp/CNTKLibraryManagedDll/Helper.cs
/// <remarks> CNTK Helper function </remarks>
let AsUnorderedMapVariableValue (source: IDictionary<Variable,Value>) =
let inputVector = new UnorderedMapVariableValuePtr()
for pair in source do inputVector.Add(pair.Key, pair.Value)
inputVector
let testMinibatchSize = 25
let x_test,y_test = generateRandomDataSample testMinibatchSize featureCount labelCount
let testBatch =
[ (featureVariable, matrixToBatch x_test)
(labelVariable, matrixToBatch y_test) ]
|> dict
|> AsUnorderedMapVariableValue
trainer.TestMinibatch(testBatch, device)
Checking prediction / evaluation
Let's go a bit deeper on how our model behaves.
Expression evaluation failed: Values for 1 required arguments 'Input('test_output', [2], [*, #])', that the requested output(s) 'Input('test_output', [2], [*, #])' depend on, have not been provided.
or
Expression evaluation failed: This value cannot be mutated NotSupportedExceptionThis value cannot be mutated
/// Create System.Collections.Generic.Dictionary<Variable,Value>
/// from corresponding tuple seq. Useful when a CNTK Data Map needs
/// to be mutable, fot instance when it's going to be holding data
/// generated from our model.
/// <remarks> CNTK Helper function </remarks>
let dataMap (source: seq<Variable*Value>) =
let result = Dictionary<Variable,Value>()
for key,value in source do result.Add(key,value)
result
/// A Function.Evaluate friendly one-hot -> boolean parser function
let parseOneHotPairs (source: IList<IList<float32>>) =
source
|> Seq.map Seq.head
|> Seq.map (float>>System.Math.Round>>float32)
|> Array.ofSeq
Viewing results and per sample comparison:
let out = CNTKLib.Softmax(new Variable(z))
let outputDataMap = [(out.Output, null)] |> dataMap
let inputDataMap = [(featureVariable, matrixToBatch x_test)] |> dict
// Generate network output
out.Evaluate(inputDataMap, outputDataMap, device)
// Extract data from the network
let result = outputDataMap.[out.Output].GetDenseData<float32>(out.Output)
Convert extracted data to readable output:
let labelsBinary = y_test.[*,0] |> Array.ofSeq
let predictedBinary = result |> parseOneHotPairs
labelsBinary |> Array.take 10 |> printfn "Label : %A ..."
predictedBinary |> Array.take 10 |> printfn "Predicted: %A ..."
(labelsBinary, predictedBinary)
||> Array.zip
|> Array.countBy (fun (label,predicted) -> label = predicted)
|> printfn "Success : %A"
Visualization
/// A helper function to extract data from parameter nodes.
/// You can use this to see a layer's weights.
/// <remarks> CNTK Helper function </remarks>
let paramData<'T> (p: CNTK.Parameter) =
let arrayView = p.Value()
let value = new Value(arrayView)
value.GetDenseData<'T>(p)
(* The index we created along with the linear layer function
finaly comes useful!
Seq.head is needed because the result of Value.GetDense is always 2D
*)
let weightMatrix =
index.["Weights"]
|> paramData<float32>
|> Seq.head
|> Seq.chunkBySize featureCount
|> Array.ofSeq
let biasVector =
index.["Bias"]
|> paramData<float32>
|> Seq.head
Since we know that without hidden layers a neural network is only capable of linear separation, we can plot the exact line of separation using the the two points where it intersects with the chart's axes, i.e. where either x = 0 or y = 0.
let separator_x = [0.f; biasVector.[1]/weightMatrix.[0].[0]]
let separator_y = [biasVector.[0]/weightMatrix.[0].[1]; 0.f]
separator_x, separator_y
[ Scatter(x = x.[*,0], y = x.[*,1],
mode = "markers",
marker = Marker(size=10, color=colors))
Scatter(x = separator_x, y = separator_y,
mode = "lines",
line = Line(color="Green", width=3)) ]
|> Chart.Plot
|> Chart.WithLayout (
Layout( xaxis=Xaxis(title="Tumor size (in cm)"),
yaxis=Yaxis(title="Age (scaled)")))
|> Chart.WithHeight 400
|> Chart.WithWidth 600

Extra! Prediction heatmap & revisiting evaluation
We have now pretty much covered the original tutorial. But! Why not refactor evaluation a bit, and show another cool way to visualise how the trained model works, by creating a heatmap of it's potential outputs?
Refactored code
/// A helper function to convert a sequence
/// of numbers for use as neural network input
/// <remarks> CNTK Helper function </remarks>
let batchFromSeq (dim:int) (source : float seq) =
CNTK.Value.CreateBatch(shape [dim], source |> Seq.map (float32), device)
/// A helper function to evaluate a dataset in
/// a softmax model and extract results in one go
/// <remarks> CNTK Helper function </remarks>
let evaluateWithSoftmax (model : Function) (source : float seq seq) =
let inputDim = source |> Seq.head |> Seq.length
let inputData = source |> Seq.collect id |> batchFromSeq inputDim
let out = CNTKLib.Softmax(new Variable(model))
let inputDataMap = [out.Arguments.[0], inputData] |> dict
let outputDataMap = [(out.Output, null)] |> dataMap
out.Evaluate(inputDataMap, outputDataMap, device)
outputDataMap
.[out.Output]
.GetDenseData<float32>(out.Output)
|> Seq.map Seq.head
Visualization
let predictedLabelGrid (range : float[]) =
seq [for x in range do for y in range do yield seq [x;y] ]
|> evaluateWithSoftmax z
|> Array.ofSeq
|> Array.chunkBySize range.Length
let colorScale =
(* Scale from https://fslab.org/XPlot/chart/plotly-heatmaps.html
* Default scales available: 'Greys' | 'Greens' | 'Bluered' | 'Hot' | 'Picnic' | 'Portland' | 'Jet' | 'RdBu' | 'Blackbody' | 'Earth' | 'Electric' | 'YIOrRd' | 'YIGnBu'
*)
[
[box 0.0; box "rgb(165,0,38)"]
[0.1111111111111111; "rgb(215,48,39)"]
[0.2222222222222222; "rgb(244,109,67)"]
[0.3333333333333333; "rgb(253,174,97)"]
[0.4444444444444444; "rgb(254,224,144)"]
[0.5555555555555556; "rgb(224,243,248)"]
[0.6666666666666666; "rgb(171,217,233)"]
[0.7777777777777778; "rgb(116,173,209)"]
[0.8888888888888888; "rgb(69,117,180)"]
[1.0; "rgb(49,54,149)"]
]
Heatmap(z = (predictedLabelGrid [|1. .. 0.1 .. 10.|]), colorscale = colorScale)
|> Chart.Plot
|> Chart.WithLayout (
Layout( xaxis=Xaxis(title="Tumor size (in cm)"),
yaxis=Yaxis(title="Age (scaled)")))
|> Chart.WithWidth 700
|> Chart.WithHeight 500

As we keep adding hidden layers this heatmap is only going to get more interesting.