Let's try a pure .NET python-like deep learning stack

Let's try a pure .NET python-like deep learning stack

 

The elevator pitch

The goal of this series is to achieve the full functionality of the CNTK Python API tutorials while using strictly .NET tools and jumping through the least hoops possible while doing it.

Secondarily, I hope to find out how far F# and IfSharp go to matching ipython notebook functionality. This means that I will try to keep original code to a minimum and instead prefer calls to appropriate libraries such as MathNet, except when a minimal wrapper function will serve to better follow the logic of the original python code.

The bulk of these posts will be available in a GitHub repository as IfSharp jupyter notebooks, along with .fsx files containing the notebook code, as well any other functionality that might be necessary but out of the scope of this series, such as the script that automatically locates the paths of referenced CNTK DLLs with System.Reflection.

As the series progresses any recurring functions will be included in (for now) two separate script files, one for necessary functionality related to the tutorials, such as the generateRandomDataSample function, and one for functionality related to the notebook presentation itself, such as the tiny wrapper that inlines images from a url, in order to better match ipython functionality.

The stack

To start with I am following the lead of the FsLab project, so the Python to .NET mapping will be along the lines of:

  • numpy -> MathNet.Numerics
  • pandas -> Deedle
  • matplotlib and seaborn -> FSharp.Charting and XPlot.Plotly
  • scikit-learn -> presumably Accord.Net

Except Accord.net all of the above can be referenced in one go from the FsLab nuget package, which however I will not do, until the FsLab package becomes fully netstandard compliant.

There is also the issue that in order to view the visualizations created with FSharp.Charting and XPlot.Plotly within a notebook requires some special backend wiring, and to which the simpler solution is to allow IfSharp to download the relevant packages in its own folder, and to forego specifically referencing them for your own notebook.

Preparing work space

If we are to use managed CNTK extensively there's no getting around that it's a bit of a hassle to setup outside a visual studio project. It seems to have slightly different requirements depending on whether you intend to call it from a script file, a .net core project or, as in our case, from within a jupyter notebook.

Whatever the setup, there seem to be two constants: that you will need to copy all of the CNTK DLLs and their references in the same folder and reference Cntk.Core.Managed-2.6.dll from there, and that it will probably only work in Windows {dramatic chord/sad trombone}.

What follows is basically my Prepare workspace notebook, meant to be run as is in the tutorials folder, hopefully taking care of any reference weirdness and letting you get straight to the CNTK code without much ado.

section splitter

Preparing the workspace for CNTK in jupyter notebook

#r "netstandard"
#load "Paket.fsx"
  • Unless you reference netstandard this way anything CNTK related will be marked as errors and will be ignored by autocomplete.

  • Paket.fsx is neccessary to enable IfSharp's paket extensions such as Paket.Dependencies.Install. Downloading paket.exe for each notebook specifically is thus unnessecary.

1. Install neccessary packages

Paket.Dependencies.Install """
framework: netstandard2.0
generate_load_scripts: true
storage: none
source https://nuget.org/api/v2
nuget CNTK.CPUOnly
nuget MathNet.Numerics
nuget MathNet.Numerics.FSharp
"""

Usually in F# notebooks we tend to see paket.dependencies created this way:

Paket.Package ["CNTK.CPUOnly"; "FsLab"; "FSharp.Charting"]

I used Paket.Dependencies.Install instead of Paket.Package as it allows us to use the current folder for paket files instead piling them up in the directory where IfSharp.exe is located. It also allows us to set storage: none and save some disk space and bandwidth by not re-downloading the same dependencies every time a project needs CNTK support, some of the required DLLs being rather immense as it is.

This is slightly ironic, considering that in order to use CNTK in jupyter we end up having to copy all dependencies in a common bin path anyway. This seems to be an issue with the way F# interactive handles native dependencies in general.

2. Reference Cntk.Core.Managed-*.*.dll from the nuget packages folder

#load @".paket\load\main.group.fsx"

System.Reflection.Assembly.GetAssembly(typeof<CNTK.CNTKLib>).CodeBase
|> printfn "CNTK managed binary referenced from:\n\t%s"
Output:
CNTK managed binary referenced from:
	file:///C:/Users/Ares/.nuget/packages/cntk.cpuonly/2.6.0/lib/netstandard2.0/Cntk.Core.Managed-2.6.dll

In Windows, if you get an error the likes of:

The namespace or module 'CNTK' is not defined.

after Paket.Dependencies.Install above succeeds, it could be because the path where IfSharp.exe is located seems to have precedence as far as the #load directive is concerned. This means that once main.group.fsx has been generated in the IfSharp folder (for instance after downloading the visualization tools) you will no longer have access to the local generated scripts until you delete those in the IfSharp folder, or rename either.

If you ommit storage: none from Paket.Dependencies.Install, the IfSharp folder is also where any nuget packages will physically end up, as long as you call paket from within the notebook.

3. Copy all CNTK dependencies to the same folder

As far as I can tell this is a managed CNTK issue, not a jupyter or IfSharp issue.

Currently all CNTK dlls need to be copied in the same folder with Cntk.Core.Managed-#.#.dll else they can't be resolved at runtime. The included script finds the path of the presently referenced Cntk DLL and uses it to locate any dependencies. Should work for both GPU and CPUOnly.

Again, this is only neccessary for working with CNTK outside Visual Studio.

If you are interested in F# scripting with CNTK make sure it's through a 64bit version of fsi.exe, such as fsiAnyCPU.exe, or you will run into bad image exceptions.
#load "fsx/PrepareWorkspace.fsx"
open PrepareWorkspace

CreateOrCleanLocalBinFolder "bin"
CopyDependenciesToLocalFolder "bin" Release
Output:
Copying from 'packages\cntk.cpuonly\2.6.0\lib\netstandard2.0\Cntk.Core.Managed-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.Composite-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.Core-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.Core.CSBinding-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.Deserializers.Binary-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.Deserializers.HTK-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.Deserializers.Image-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.Deserializers.TextFormat-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.Math-2.6.dll'
Copying from 'packages\cntk.cpuonly\2.6.0\support\x64\Release\Cntk.PerformanceProfiler-2.6.dll'
Copying from 'packages\cntk.deps.opencv.zip\2.6.0\support\x64\Dependency\Release\opencv_world310.dll'
Copying from 'packages\cntk.deps.mkl\2.6.0\support\x64\Dependency\libiomp5md.dll'
Copying from 'packages\cntk.deps.mkl\2.6.0\support\x64\Dependency\mkldnn.dll'
Copying from 'packages\cntk.deps.mkl\2.6.0\support\x64\Dependency\mklml.dll'
Copying from 'packages\cntk.deps.opencv.zip\2.6.0\support\x64\Dependency\zip.dll'
Copying from 'packages\cntk.deps.opencv.zip\2.6.0\support\x64\Dependency\zlib.dll'
Copied 212.00MB

So far I haven't been able to reference debug DLLs successfully in a notebook, i.e. without hitting the dreaded DllNotFoundException: Cntk.Core.CSBinding-#.#.dll, so I suggest you also stick to the release builds.

4. Reference CNTK from the newly created folder

In order to successfully reference CNTK from the new location you will have to restart the kernel, otherwise the reference set in main.group.fsx will persist.</p>

#r @"bin\Cntk.Core.Managed-2.6.dll"
#load @".paket\load\main.group.fsx"
open CNTK

DeviceDescriptor.UseDefaultDevice().Type
|> printfn "Congratulations, you are using CNTK for: %A"

System.Reflection.Assembly.GetAssembly(typeof<CNTK.CNTKLib>).CodeBase
|> printfn "\nCNTK managed binary referenced from:\n\t%s"
Output:
Congratulations, you are using CNTK for: CPU

CNTK managed binary referenced from:
	file:///../IfCntk/notebooks/cntk-tutorials/bin/Cntk.Core.Managed-2.6.dll

5. Update path variable

Finally, we need to update the path variable to include the newly created /bin folder, otherwise we get an error when instantiating the learner before training.

CNTK.CNTKLibPINVOKE.SGDLearner__SWIG_1 exception

This only affects the current process, so it is something you will need to do whenever you run a cntk notebook.

open System
open System.IO

Environment.GetEnvironmentVariable("PATH")
|> fun path -> sprintf "%s%c%s" path (Path.PathSeparator) (Path.GetFullPath("bin"))
|> fun path -> Environment.SetEnvironmentVariable("PATH", path)

If you've made it so far, congratulations, you should be good to go! Expect an F# notebook of the first CNTK python tutorial to be posted very soon.