.NET deep learning stack: CNTK 101: Logistic Regression

.NET deep learning stack: CNTK 101: Logistic Regression

 

Useful notes before starting

Be aware that if you are viewing this directly on GitHub you are missing features such as F# specific syntax highlighting, ligatures and additional markdown styling. Perhaps most importantly, you will not be able to view and interact with any XPlot visualizations.

Preparing the workspace for CNTK in jupyter

If referencing CNTK fails, make sure you have followed the instructions in my Preparing Workspace.ipynb notebook. The current notebook assumes that all necessary CNTK nuget DLLs have been copied to a folder named bin in the same path.

#r "netstandard"
#r @"bin\Cntk.Core.Managed-2.6.dll"
#load @".paket\load\main.group.fsx"
 
open System
open System.IO

Environment.GetEnvironmentVariable("PATH")
|> fun path -> sprintf "%s%c%s" path (Path.PathSeparator) (Path.GetFullPath("bin"))
|> fun path -> Environment.SetEnvironmentVariable("PATH", path)

open CNTK
DeviceDescriptor.UseDefaultDevice().Type
|> printfn "Congratulations, you are using CNTK for: %A" 
Output:
Congratulations, you are using CNTK for: CPU

CNTK 101: Logistic Regression and ML Primer

This notebook is primarily an F# port of CNTK_101_LogisticRegression. I have kept some of the original code comments to make it easier to follow along with the python notebook, but I am skipping the detailed explanations of machine learning concepts.

Intro

This being the first notebook, helper functions that emulate ipython functionality more precisely will be presented in full. As the series progresses, any such functions declared in previous notebooks will be referenced from a NbHelpers namespace without further comment.

To get things started, here's a small function to inline images from urls:

/// Simple wrapper to show inline images 
/// from url with customizable width
/// <remarks> Notebook Helper Function </remarks>
let ImageUrl url width =
    sprintf "<img src=\"%s\" style=\"width: %dpx; height: auto\" alt=\"Could not load image, make sure url is correct\">" url width
    |> Util.Html
    |> Display
// Figure 1
ImageUrl "https://www.cntk.ai/jup/cancer_data_plot.jpg" 400

// Figure 2
ImageUrl "https://www.cntk.ai/jup/cancer_classify_plot.jpg" 400

// Figure 3
ImageUrl "https://www.cntk.ai/jup/logistic_neuron.jpg" 300
Sample output:
Original chart from tutorial

 

 

Global variables, the first of many

I will be declaring any global parameters in the same sequence as the original python notebook, since this is meant as a companion piece. However, in the eventual independent fsx script for this notebook you can expect them to be laid out in a more meaningful manner.

let featureCount = 2
let labelCount = 2
let sampleCount = 32
let device = DeviceDescriptor.CPUDevice

You should take special notice of the device descriptor parameter that defines if a CNTK.Function runs on CPU or a GPU, as it pops up in a lot of CNTK functions. It's pretty much an open topic how to best treat it in F#. In this case I simply set a global variable, in the style of the original tutorial.

If you are interested in exploring the topic further be sure to check Mathias Brandenwinder's discussion of the issue and proposed functional solution here.

Data generation

Here's our first attempt at doing numpy with MathNet.Numerics. Out goal is to create a function that produces distinct (but not too distinct!) clusters of random points to be used as mock datasets.

Helpers

open MathNet.Numerics.Distributions;

let seed = 42
let rand = System.Random(seed)
let nrand = Normal(0.,1.,rand)
let randInt max = seq { while true do yield rand.Next() % max }
let randn = Normal.Samples(rand, 0.0, 1.0)
let oneHotEncoding classCount classType =
    Array.init classCount (fun i -> if i = classType then 1.0f else 0.0f)

Mock data generator

open MathNet.Numerics.LinearAlgebra

let generateRandomDataSample sampleCount featureCount labelCount = 
    let Y = Array.init sampleCount
                (fun _ -> float32 (rand.Next() % labelCount) )
    let X = DenseMatrix.init sampleCount featureCount 
                (fun row col -> float32 (nrand.Sample() + 3.) * (Y.[row]+1.f) )                 
    let oneHotLabel = 
        Y
        |> Array.map(int>>(oneHotEncoding labelCount))
        |> DenseMatrix.ofRowArrays

    X, oneHotLabel
    
let x,y = generateRandomDataSample 32 2 2
You should avoid using a global variable named X for the feature vector, appropriate though it may seem, as it apparently clashes with an XPlot global.

Data visualization

In order to properly wire XPlot.Plotly to display in the notebook you need to either:

  • Run paket.exe in the same folder as ifsharp.exe and reference XPlot from there, and then delete the .paket/load folder because otherwise it will supersede any calls to your own notebook's generated load scripts,
  • or modify the #r call in XPlot.Plotly.fsx to point to where the relevant DLLs actually are.

This will by no means be a deep dive into XPlot's API, rather more of a gentle nudge.

// Setup display support
#load "XPlot.Plotly.fsx"
open XPlot.Plotly

let colors = 
    [for label in y.Column(0) do 
        yield if label = 0.f then "Red" else "Blue"]
    
Scatter(x = x.[*,0], y = x.[*,1], 
        mode = "markers", 
        marker = Marker(size=10, color=colors))
|> Chart.Plot
|> Chart.WithLayout (
        Layout( xaxis=Xaxis(title="Tumor size (in cm)"), 
                yaxis=Yaxis(title="Age (scaled)")))
|> Chart.WithHeight 400
|> Chart.WithWidth 600        
Scatter plot with y = Age and x = tumor size, with samples colored red if malignant
In order to view the resulting chart when running these snippets outside jupyter, you will need to add |> Chart.Show as the last element of the chart pipeline.

Model Creation

// Figure 4
ImageUrl "https://www.cntk.ai/jup/logistic_neuron2.jpg" 300

The IfSharp global for parsing LaTeX notation seems to work fantastically, except for \cdot which I put there with Alt+0183:

"z=\sum_{i=1}^n w_i \\times x_i+b= \\textbf{w Β· x}+b" |> Util.Math 
Sample output:
Latex notation of dot production between weight and input layers, plus bias

Network setup

Here comes the CNTK managed API!

These will be the first of numerous small helper functions to make CNTK's .NET API more easily useable from F#, owing to a pair of distinct characteristics:

  1. The CNTK API is on average way more low level that the Python API we are aiming to match.
  2. It plays fast and loose with implicit type conversions in a way that F# is very uncomfortable with allowing.

You can find these functions independent of this notebook here.

/// In C# a function parameter of type NDShape apparently 
/// can accept a simple int array and cast away implicitly.
/// 
/// Not so in F#.
/// <remarks>CNTK Helper function</remarks>
let inline shape (dims:int seq) : NDShape = NDShape.CreateNDShape dims
let dataType = CNTK.DataType.Float

If you are coming at this from a C# background, this is a good time to note that The F# float type corresponds to System.Double, and requires the CNTK model to be initialised with CNTK.DataType.Double.

Conversely, to use CNTK.DataType.Float such as we will do here, your numbers need to be typed as either float32 or single.

 

Since the original Python example is murky on initialization specifics I've used 1.0 and 0.0 as init values for weights and bias respectively, just as they appear in this CNTK C# sample
let featureVariable = Variable.InputVariable(shape [|featureCount|], dataType, "Features")
let initialization = CNTKLib.GlorotUniformInitializer(1.0)
let index = System.Collections.Generic.Dictionary<string, CNTK.Parameter>()

let linearLayer (inputVar : Variable) outputDim =
    let inputDim = inputVar.Shape.[0] 
    let weightParam = new Parameter(shape [inputDim; outputDim], dataType, initialization, device, "Weights")
    let biasParam = new Parameter(shape [outputDim], dataType, 0.0, device, "Bias")    
    
    index.Add("Weights", weightParam)
    index.Add("Bias", biasParam)
    
    // training works for w * i and not for i * w as in the python example 
    let dotProduct =  CNTKLib.Times(weightParam, inputVar, "Weighted input")
    let layer = CNTKLib.Plus(new Variable(dotProduct), biasParam, "Layer")
    
    layer

let z = linearLayer featureVariable labelCount
As noted in the code comments, the order of vector multiplication is inverted relative to how they appear in the corresponding function in the CNTK Python API.

This is a very hard mistake not to fall face first into if you are using the python tutorials as your main guide.

Training

"\\textbf{p}=softmax(z)" |> Util.Math
"H(p)=-\sum_{j=1}^{|y|}y_j log(p_j)" |> Util.Math
let labelVariable = Variable.InputVariable(shape [labelCount], dataType, "output")
let loss = CNTKLib.CrossEntropyWithSoftmax(new Variable(z), labelVariable)

Evaluation

let evalError = CNTKLib.ClassificationError(new Variable(z), labelVariable)

Although you can use a variety of functions for error evaluation and loss, the overall behavior of the network might change enough that you should think twice before including them In any sort of automated hyperparameter tuning scheme; not piping the network output through softmax for instance takes a lot out of our ability to interpret the output probabilistically.

Configure training

Helper functions

/// A sequence of Parameter objects needs to be converted 
/// to type ParameterVector in order to be passed to CNTK functions.
/// <remarks> CNTK Helper function </remarks>
let ParVec (pars:Parameter seq) = 
    let vector = new ParameterVector()
    pars |> Seq.iter (vector.Add)
    vector
    
/// <remarks> Helper function </remarks>
let inline normalizeByMax(max:'T) (source : 'T seq) =
    source |> Seq.map ((fun n -> float n/ float max)>>float32)

/// Convert MathNet 2d matrix to batch in one go, while accounting for 
/// original dimensionality and numeric type.
/// <remarks> CNTK Helper function </remarks>
let matrixToBatch(m : Matrix<float32>) =    
    CNTK.Value.CreateBatch(shape [m.Rank()], m |> Matrix.transpose |> Matrix.toSeq, device)

/// Define a utility function to compute the moving average.
/// A more efficient implementation is possible with np.cumsum() function
/// <remarks> Helper Function. 
/// *Summary from comments in python notebook</remarks>
let movingAverage (array : float seq) windowLength = 
    if (array |> Seq.length) >= windowLength
    then array
         |> Seq.windowed windowLength 
         |> Seq.map (Seq.average)        
    else seq [array |> Seq.average]

On the other hand, the following are exactly the sort of training parameters you should be playing around with when trying to decide what works best for your dataset:

// Instantiate the trainer object to drive the model training
let learningRate = 0.01
let lrSchedule = new CNTK.TrainingParameterScheduleDouble(learningRate, uint32 CNTK.DataUnit.Minibatch)

let learner = CNTKLib.SGDLearner(z.Parameters() |> ParVec, lrSchedule)
let trainer = CNTK.Trainer.CreateTrainer(z, loss, evalError, ResizeArray<CNTK.Learner>([learner]))

The training information logger


// # Define a utility that prints the training progress
/// A training progress logger
/// <remarks> Helper function </remarks>
let printTrainingProgress (trainer: CNTK.Trainer) minibatch frequency verbose = 
    if minibatch % frequency = 0 
    then     
        let mbla = trainer.PreviousMinibatchLossAverage()
        let mbea = trainer.PreviousMinibatchEvaluationAverage()
        
        if verbose then 
            printfn "Minibatch: %d, Loss: %.4f, Error: %.2f" minibatch mbla mbea
    
        Some (minibatch, mbla, mbea)
    else None

Run the trainer

Training parameters

let minibatchSize = 25
let numSamplesToTrain = 20000
let numMinibatchesToTrain = int (numSamplesToTrain/minibatchSize)
let progressOutputFreq = 50
type TrainReport = { 
    BatchSize: ResizeArray<int> 
    Loss: ResizeArray<float>
    Error: ResizeArray<float> } 

let plotdata = { 
    BatchSize = ResizeArray<int>()
    Loss = ResizeArray<float>()
    Error = ResizeArray<float>()
}

for i in [0..numMinibatchesToTrain] do
    let x,y = generateRandomDataSample minibatchSize featureCount labelCount
    let features,labels = matrixToBatch x, matrixToBatch y
    
    // Assign the minibatch data to the input variables and train the model on the minibatch
    let trainingBatch = [(featureVariable, features);(labelVariable, labels)] |> dict
    let status = trainer.TrainMinibatch(trainingBatch, true, device)
    
    // log training data
    match (printTrainingProgress trainer i progressOutputFreq true) with
    | Some (i,loss,eval) ->         
        plotdata.BatchSize.Add <| i
        plotdata.Loss.Add <| loss
        plotdata.Error.Add <| eval
    | None -> ()
Output:
Minibatch: 0, Loss: 0.4776, Error: 0.24
Minibatch: 50, Loss: 0.3615, Error: 0.16
Minibatch: 100, Loss: 0.1099, Error: 0.00
Minibatch: 150, Loss: 0.1394, Error: 0.04
Minibatch: 200, Loss: 0.1703, Error: 0.08
Minibatch: 250, Loss: 0.2311, Error: 0.08
Minibatch: 300, Loss: 0.2511, Error: 0.08
Minibatch: 350, Loss: 0.1090, Error: 0.00
Minibatch: 400, Loss: 0.3172, Error: 0.12
Minibatch: 450, Loss: 0.0988, Error: 0.08
Minibatch: 500, Loss: 0.2587, Error: 0.12
Minibatch: 550, Loss: 0.2404, Error: 0.08
Minibatch: 600, Loss: 0.3179, Error: 0.12
Minibatch: 650, Loss: 0.0993, Error: 0.08
Minibatch: 700, Loss: 0.2174, Error: 0.08
Minibatch: 750, Loss: 0.2473, Error: 0.08
Minibatch: 800, Loss: 0.0978, Error: 0.00

Visualizing training results:

let lossMax = plotdata.Loss |> Seq.max
let dash = Line(dash="dash")

[   Scatter(name="Loss (scaled)", line=dash,
            x = plotdata.BatchSize, 
            y = (plotdata.Loss |> normalizeByMax lossMax))
    Scatter(name="Error",
            x = plotdata.BatchSize, 
            y = plotdata.Error, line=dash)] 
|> Chart.Plot
|> Chart.WithLayout (Layout(title="Minibatch run", 
                            xaxis=Xaxis(title="Minibatch number"), 
                            yaxis=Yaxis(title="Cost")))
|> Chart.WithHeight 400
Loss and error variation during gradient descent training
// Compute the moving average loss to smooth out the noise in SGD
let avgLoss = movingAverage (plotdata.Loss) 10 
let avgError = movingAverage (plotdata.Error) 10
let maxAvgLoss = avgLoss |> Seq.max

[   Scatter(name="Average Loss (scaled)", line=dash,
            x = plotdata.BatchSize, y = (avgLoss |> normalizeByMax maxAvgLoss))
    Scatter(name="Average Error", line=dash,
            x = plotdata.BatchSize, y = avgError)]
|> Chart.Plot
|> Chart.WithLayout
       (Layout
            (title = "Minibatch run", xaxis = Xaxis(title = "Minibatch number"),
             yaxis = Yaxis(title = "Cost")))
|> Chart.WithHeight 400
Loss and error variation during gradient descent training, smoothed according to moving average

Run evaluation / Testing

Let's generate a new dataset and see how good a job the model we just trained does in separating the different categories!

open System.Collections.Generic

/// Convert dictionary to Variable -> Value map for CNTK
/// Ported from https://github.com/Microsoft/CNTK/blob/master/bindings/csharp/CNTKLibraryManagedDll/Helper.cs
/// <remarks> CNTK Helper function </remarks>
let AsUnorderedMapVariableValue (source: IDictionary<Variable,Value>) =
    let inputVector = new UnorderedMapVariableValuePtr()
    for pair in source do inputVector.Add(pair.Key, pair.Value)
    inputVector

let testMinibatchSize = 25
let x_test,y_test = generateRandomDataSample testMinibatchSize featureCount labelCount
let testBatch = 
    [ (featureVariable, matrixToBatch x_test)
      (labelVariable, matrixToBatch y_test) ] 
    |> dict
    |> AsUnorderedMapVariableValue
    
trainer.TestMinibatch(testBatch, device)   
Output:
0.08

Checking prediction / evaluation

Let's go a bit deeper on how our model behaves.

It is important to explicitly use a System.Generic.Collections.Dictionary object as a data map for the evaluation target; F#'s own dict (Microsoft.FSharp.Core.ExtraTopLevelOperators) is read-only, resulting in some misleading errors:

 

Expression evaluation failed: Values for 1 required arguments 'Input('test_output', [2], [*, #])', that the requested output(s) 'Input('test_output', [2], [*, #])' depend on, have not been provided.

or

Expression evaluation failed: This value cannot be mutated NotSupportedExceptionThis value cannot be mutated
/// Create System.Collections.Generic.Dictionary<Variable,Value>
/// from corresponding tuple seq. Useful when a CNTK Data Map needs
/// to be mutable, fot instance when it's going to be holding data
/// generated from our model.
/// <remarks> CNTK Helper function </remarks>
let dataMap (source: seq<Variable*Value>) = 
    let result = Dictionary<Variable,Value>()
    for key,value in source do result.Add(key,value)
    result

/// A Function.Evaluate friendly one-hot -> boolean parser function
let parseOneHotPairs (source: IList<IList<float32>>) = 
    source 
    |> Seq.map Seq.head 
    |> Seq.map (float>>System.Math.Round>>float32)
    |> Array.ofSeq

Viewing results and per sample comparison:

let out = CNTKLib.Softmax(new Variable(z))
let outputDataMap = [(out.Output, null)] |> dataMap
let inputDataMap = [(featureVariable, matrixToBatch x_test)] |> dict

// Generate network output
out.Evaluate(inputDataMap, outputDataMap, device)            

// Extract data from the network
let result = outputDataMap.[out.Output].GetDenseData<float32>(out.Output)

Convert extracted data to readable output:

let labelsBinary = y_test.[*,0] |> Array.ofSeq    
let predictedBinary = result |> parseOneHotPairs 
    
labelsBinary |> Array.take 10 |> printfn "Label    : %A ..."    
predictedBinary |> Array.take 10 |> printfn "Predicted: %A ..."

(labelsBinary, predictedBinary) 
||> Array.zip
|>  Array.countBy (fun (label,predicted) -> label = predicted)
|>  printfn "Success  : %A"
Output:
Label    : [|1.0f; 1.0f; 0.0f; 0.0f; 0.0f; 1.0f; 0.0f; 1.0f; 0.0f; 1.0f|] ...
Predicted: [|1.0f; 1.0f; 0.0f; 0.0f; 0.0f; 1.0f; 0.0f; 1.0f; 0.0f; 1.0f|] ...
Success  : [|(true, 23); (false, 2)|]

Visualization

/// A helper function to extract data from parameter nodes.
/// You can use this to see a layer's weights.
/// <remarks> CNTK Helper function </remarks>
let paramData<'T> (p: CNTK.Parameter) =
    let arrayView = p.Value()
    let value = new Value(arrayView)
    value.GetDenseData<'T>(p)

(* The index we created along with the linear layer function
   finaly comes useful!
   
   Seq.head is needed because the result of Value.GetDense is always 2D
*)
let weightMatrix = 
    index.["Weights"] 
    |> paramData<float32>
    |> Seq.head
    |> Seq.chunkBySize featureCount
    |> Array.ofSeq
   
let biasVector = 
    index.["Bias"] 
    |> paramData<float32>
    |> Seq.head

Since we know that without hidden layers a neural network is only capable of linear separation, we can plot the exact line of separation using the the two points where it intersects with the chart's axes, i.e. where either x = 0 or y = 0.

let separator_x = [0.f; biasVector.[1]/weightMatrix.[0].[0]]
let separator_y = [biasVector.[0]/weightMatrix.[0].[1]; 0.f]

separator_x, separator_y
Output:
([0.0f; 13.5715132f], [6.42859125f; 0.0f])
[ Scatter(x = x.[*,0], y = x.[*,1], 
          mode = "markers", 
          marker = Marker(size=10, color=colors))
  Scatter(x = separator_x, y = separator_y, 
          mode = "lines",          
          line = Line(color="Green", width=3)) ]
|> Chart.Plot
|> Chart.WithLayout (
        Layout( xaxis=Xaxis(title="Tumor size (in cm)"), 
                yaxis=Yaxis(title="Age (scaled)")))
|> Chart.WithHeight 400
|> Chart.WithWidth 600        
Scatter plot of x= tumor size, y = age samples, separated according to malignancy by line plot

Extra! Prediction heatmap & revisiting evaluation

We have now pretty much covered the original tutorial. But! Why not refactor evaluation a bit, and show another cool way to visualise how the trained model works, by creating a heatmap of it's potential outputs?

Refactored code

/// A helper function to convert a sequence
/// of numbers for use as neural network input
/// <remarks> CNTK Helper function </remarks>
let batchFromSeq (dim:int) (source : float seq) =
    CNTK.Value.CreateBatch(shape [dim], source |> Seq.map (float32), device)

/// A helper function to evaluate a dataset in
/// a softmax model and extract results in one go
/// <remarks> CNTK Helper function </remarks>
let evaluateWithSoftmax (model : Function) (source : float seq seq) =
    let inputDim = source |> Seq.head |> Seq.length
    let inputData = source |> Seq.collect id |> batchFromSeq inputDim
    let out = CNTKLib.Softmax(new Variable(model))
    
    let inputDataMap = [out.Arguments.[0], inputData] |> dict
    let outputDataMap = [(out.Output, null)] |> dataMap
    
    out.Evaluate(inputDataMap, outputDataMap, device)            
    outputDataMap
        .[out.Output]
        .GetDenseData<float32>(out.Output)
    |> Seq.map Seq.head

Visualization

let predictedLabelGrid (range : float[]) =
    seq [for x in range do for y in range do yield seq [x;y] ]
    |> evaluateWithSoftmax z
    |> Array.ofSeq
    |> Array.chunkBySize range.Length

let colorScale =
    (* Scale from https://fslab.org/XPlot/chart/plotly-heatmaps.html
     * Default scales available: 'Greys' | 'Greens' | 'Bluered' | 'Hot' | 'Picnic' | 'Portland' | 'Jet' | 'RdBu' | 'Blackbody' | 'Earth' | 'Electric' | 'YIOrRd' | 'YIGnBu'
     *)
    [
        [box 0.0; box "rgb(165,0,38)"]
        [0.1111111111111111; "rgb(215,48,39)"]
        [0.2222222222222222; "rgb(244,109,67)"]
        [0.3333333333333333; "rgb(253,174,97)"]
        [0.4444444444444444; "rgb(254,224,144)"]
        [0.5555555555555556; "rgb(224,243,248)"]
        [0.6666666666666666; "rgb(171,217,233)"]
        [0.7777777777777778; "rgb(116,173,209)"]
        [0.8888888888888888; "rgb(69,117,180)"]
        [1.0; "rgb(49,54,149)"]
    ]

Heatmap(z = (predictedLabelGrid [|1. .. 0.1 .. 10.|]), colorscale = colorScale)
|> Chart.Plot
|> Chart.WithLayout (
        Layout( xaxis=Xaxis(title="Tumor size (in cm)"), 
                yaxis=Yaxis(title="Age (scaled)")))
|> Chart.WithWidth 700
|> Chart.WithHeight 500
Heatmap of neural network output

As we keep adding hidden layers this heatmap is only going to get more interesting.