ML.NET

Last updated
ML.NET
Original author(s) Microsoft
Developer(s) .NET Foundation
Initial release7 May 2018;5 years ago (2018-05-07) [1]
Stable release
3.0.0 / 28 November 2023;12 days ago (2023-11-28)
Preview release
3.0.0-preview.23511.1 / 14 October 2023;57 days ago (2023-10-14)
Repository github.com/dotnet/machinelearning/
Written in C# and C++
Operating system Linux, macOS, Windows [2]
Platform .NET Core,
.NET Framework
Type Machine learning library
License MIT License [3]
Website dot.net/ml

ML.NET is a free software machine learning library for the C# and F# programming languages. [4] [5] [6] It also supports Python models when used together with NimbusML. The preview release of ML.NET included transforms for feature engineering like n-gram creation, and learners to handle binary classification, multi-class classification, and regression tasks. [7] Additional ML tasks like anomaly detection and recommendation systems have since been added, and other approaches like deep learning will be included in future versions. [8] [9]

Contents

Machine learning

ML.NET brings model-based Machine Learning analytic and prediction capabilities to existing .NET developers. The framework is built upon .NET Core and .NET Standard inheriting the ability to run cross-platform on Linux, Windows and macOS. Although the ML.NET framework is new, its origins began in 2002 as a Microsoft Research project named TMSN (text mining search and navigation) for use internally within Microsoft products. It was later renamed to TLC (the learning code) around 2011. ML.NET was derived from the TLC library and has largely surpassed its parent says Dr. James McCaffrey, Microsoft Research. [10]

Developers can train a Machine Learning Model or reuse an existing Model by a 3rd party and run it on any environment offline. This means developers do not need to have a background in Data Science to use the framework. Support for the open-source Open Neural Network Exchange (ONNX) Deep Learning model format was introduced from build 0.3 in ML.NET. The release included other notable enhancements such as Factorization Machines, LightGBM, Ensembles, LightLDA transform and OVA. [11] The ML.NET integration of TensorFlow is enabled from the 0.5 release. Support for x86 & x64 applications was added to build 0.7 including enhanced recommendation capabilities with Matrix Factorization. [12] A full roadmap of planned features have been made available on the official GitHub repo. [13]

The first stable 1.0 release of the framework was announced at Build (developer conference) 2019. It included the addition of a Model Builder tool and AutoML (Automated Machine Learning) capabilities. [14] Build 1.3.1 introduced a preview of Deep Neural Network training using C# bindings [15] for Tensorflow and a Database loader which enables model training on databases. The 1.4.0 preview added ML.NET scoring on ARM processors and Deep Neural Network training with GPU's for Windows and Linux. [16]

Performance

Microsoft's paper on machine learning with ML.NET demonstrated it is capable of training sentiment analysis models using large datasets while achieving high accuracy. Its results showed 95% accuracy on Amazon's 9GB review dataset. [17]

Model builder

The ML.NET CLI is a Command-line interface which uses ML.NET AutoML to perform model training and pick the best algorithm for the data. The ML.NET Model Builder preview [18] is an extension for Visual Studio that uses ML.NET CLI and ML.NET AutoML to output the best ML.NET Model using a GUI. [14]

Model explainability

AI fairness and explainability has been an area of debate for AI Ethicists in recent years. [19] A major issue for Machine Learning applications is the black box effect where end users and the developers of an application are unsure of how an algorithm came to a decision or whether the dataset contains bias. [20] Build 0.8 included model explainability API's that had been used internally in Microsoft. It added the capability to understand the feature importance of models with the addition of 'Overall Feature Importance' and 'Generalized Additive Models'. [21]

When there are several variables that contribute to the overall score, it is possible to see a breakdown of each variable and which features had the most impact on the final score. The official documentation demonstrates that the scoring metrics can be output for debugging purposes. During training & debugging of a model, developers can preview and inspect live filtered data. This is possible using the Visual Studio DataView tools. [22]

Infer.NET

Microsoft Research announced the popular Infer.NET model-based machine learning framework used for research in academic institutions since 2008 has been released open source and is now part of the ML.NET framework. [23] The Infer.NET framework utilises probabilistic programming to describe probabilistic models which has the added advantage of interpretability. The Infer.NET namespace has since been changed to Microsoft.ML.Probabilistic consistent with ML.NET namespaces. [24]

NimbusML Python support

Microsoft acknowledged that the Python programming language is popular with Data Scientists, so it has introduced NimbusML the experimental Python bindings for ML.NET. This enables users to train and use machine learning models in Python. It was made open source similar to Infer.NET. [12]

Machine learning in the browser

ML.NET allows users to export trained models to the Open Neural Network Exchange (ONNX) format. [25] This establishes an opportunity to use models in different environments that don't use ML.NET. It would be possible to run these models in the client side of a browser using ONNX.js, a javascript client-side framework for deep learning models created in the Onnx format. [26]

AI School Machine Learning Course

Along with the rollout of the ML.NET preview, Microsoft rolled out free AI tutorials and courses to help developers understand techniques needed to work with the framework. [27] [28] [29]

See also

Related Research Articles

James D. McCaffrey is an American research software engineer at Microsoft Research known for his contributions to machine learning, combinatorics, and software test automation.

<span class="mw-page-title-main">Visual Studio</span> Code editor and IDE

Visual Studio is an integrated development environment (IDE) from Microsoft. It is used to develop computer programs including websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development platforms such as Windows API, Windows Forms, Windows Presentation Foundation, Windows Store and Microsoft Silverlight. It can produce both native code and managed code.

<span class="mw-page-title-main">.NET</span> Free and open-source software platform developed by Microsoft

The .NET platform is a free and open-source, managed computer software framework for Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft employees by way of the .NET Foundation and is released under an MIT License.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

The following table compares notable software frameworks, libraries and computer programs for deep learning.

Apache MXNet is an open-source deep learning software framework that trains and deploys deep neural networks. It is scalable, allows fast model training, and supports a flexible programming model and multiple programming languages. The MXNet library is portable and can scale to multiple GPUs and machines. It was co-developed by Carlos Guestrin at the University of Washington.

Chainer is an open source deep learning framework written purely in Python on top of NumPy and CuPy Python libraries. The development is led by Japanese venture company Preferred Networks in partnership with IBM, Intel, Microsoft, and Nvidia.

<span class="mw-page-title-main">Caffe (software)</span> Deep learning framework

Caffe is a deep learning framework, originally developed at University of California, Berkeley. It is open source, under a BSD license. It is written in C++, with a Python interface.

<span class="mw-page-title-main">PyTorch</span> Open source machine learning library

PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open-source software released under the modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.

The Open Neural Network Exchange (ONNX) [] is an open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and software tools to promote innovation and collaboration in the AI sector. ONNX is available on GitHub.

Deep Learning Studio is a software tool that aims to simplify the creation of deep learning models used in artificial intelligence. It is compatible with a number of open-source programming frameworks popularly used in artificial neural networks, including MXNet and Google's TensorFlow.

Microsoft, a technology company historically known for its opposition to the open source software paradigm, turned to embrace the approach in the 2010s. From the 1970s through 2000s under CEOs Bill Gates and Steve Ballmer, Microsoft viewed the community creation and sharing of communal code, later to be known as free and open source software, as a threat to its business, and both executives spoke negatively against it. In the 2010s, as the industry turned towards cloud, embedded, and mobile computing—technologies powered by open source advances—CEO Satya Nadella led Microsoft towards open source adoption although Microsoft's traditional Windows business continued to grow throughout this period generating revenues of 26.8 billion in the third quarter of 2018, while Microsoft's Azure cloud revenues nearly doubled.

<span class="mw-page-title-main">Flux (machine-learning framework)</span> Open-source machine-learning software library

Flux is an open-source machine-learning software library and ecosystem written in Julia. Its current stable release is v0.14.5 . It has a layer-stacking-based interface for simpler models, and has a strong support on interoperability with other Julia packages instead of a monolithic design. For example, GPU support is implemented transparently by CuArrays.jl This is in contrast to some other machine learning frameworks which are implemented in other languages with Julia bindings, such as TensorFlow.jl, and thus are more limited by the functionality present in the underlying implementation, which is often in C or C++. Flux joined NumFOCUS as an affiliated project in December of 2021.

Amazon SageMaker is a cloud based machine-learning platform that allows the creation, training, and deployment by developers of machine-learning (ML) models on the cloud. It can be used to deploy ML models on embedded systems and edge-devices. SageMaker was launched in November 2017.

<span class="mw-page-title-main">Infer.NET</span> Microsoft open source library

Infer.NET is a free and open source .NET software library for machine learning. It supports running Bayesian inference in graphical models and can also be used for probabilistic programming.

<span class="mw-page-title-main">Neural Network Intelligence</span> Microsoft open source library

NNI is a free and open-source AutoML toolkit developed by Microsoft. It is used to automate feature engineering, model compression, neural architecture search, and hyper-parameter tuning.

<span class="mw-page-title-main">CatBoost</span> Yandex open source gradient boosting framework on decision trees

CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which among other features attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. It works on Linux, Windows, macOS, and is available in Python, R, and models built using catboost can be used for predictions in C++, Java, C#, Rust, Core ML, ONNX, and PMML. The source code is licensed under Apache License and available on GitHub.

GitHub Copilot is a cloud-based artificial intelligence tool developed by GitHub and OpenAI to assist users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently available by subscription to individual developers and to businesses, the tool was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go.

References

  1. Ankit Asthana (2017-05-07). "Introducing ML.NET: Cross-platform, Proven and Open Source Machine Learning Framework". blogs.msdn.microsoft.com. Retrieved 2018-05-10.
  2. "ML.NET: Machine Learning made for .NET". Microsoft. Retrieved 11 May 2018.
  3. at master · DotNet/MachineLearning
  4. David Ramel (2018-05-08). "Open Source, Cross-Platform ML.NET Simplifies Machine Learning -- Visual Studio Magazine". Visual Studio Magazine. Retrieved 2018-05-10.
  5. Kareem Anderson (2017-05-09). "Microsoft debuts ML.NET cross-platform machine learning framework". On MSFT. Retrieved 2018-05-10.
  6. Ankit Asthana (2018-08-07). "Announcing ML.NET 0.4". blogs.msdn.microsoft.com. Retrieved 2018-08-08.
  7. Gal Oshri (2018-05-06). "ML.NET 0.1 Release Notes". GitHub. Retrieved 2018-05-10.
  8. Tiwari, Aditya (2018-05-08). "Microsoft Launches ML.NET Open Source Machine Learning Framework". Fossbytes. Retrieved 2018-05-10. Over time, it will enable other ML tasks like anomaly detection, recommendation system, and other approaches like deep learning using the benefits of added libraries.
  9. "Machine learning tasks in ML.NET". Microsoft. Retrieved 26 December 2018.
  10. James McCaffrey (2018-12-19). "ML.NET: The Machine Learning Framework for .NET Developers". MSDN Magazine Connect() Special Issue 2018. Retrieved 2019-01-09. Even though the ML.NET library is new, its origins go back many years. Shortly after the introduction of the Microsoft .NET Framework in 2002, Microsoft Research began a project called TMSN ("text mining search and navigation") to enable software developers to include ML code in Microsoft products and technologies. The project was very successful, and over the years grew in size and usage internally at Microsoft. Somewhere around 2011 the library was renamed to TLC ("the learning code"). TLC is widely used within Microsoft and is currently in version 3.10. The ML.NET library is a descendant of TLC, with Microsoft-specific features removed. I've used both libraries and, in many ways, the ML.NET child has surpassed its parent.
  11. "Release Microsoft ML.NET v0.3". Github. 2018-07-03. Retrieved 2018-07-03.
  12. 1 2 "Announcing ML.NET 0.7 (Machine Learning .NET)". Microsoft. 2018-11-08. Retrieved 2018-11-14.
  13. "The ML.NET Roadmap". Github. 2018-05-09. Retrieved 2018-06-30.
  14. 1 2 "Announcing ML.NET 1.0". Microsoft. 2019-05-06. Retrieved 2019-05-07.
  15. "SciSharp/TensorFlow.NET". SciSharp STACK. 21 February 2020.
  16. "ML.NET 1.4.0-preview2". Github. 2019-10-09. Retrieved 2019-10-09.
  17. Ahmed, Zeeshan; Amizadeh, Saeed; Bilenko, Mikhail; Carr, Rogan; Chin, Wei-Sheng; Dekel, Yael; Dupre, Xavier; Eksarevskiy, Vadim; Erhardt, Eric; Eseanu, Costin; Filipi, Senja; Finley, Tom; Goswami, Abhishek; Hoover, Monte; Inglis, Scott; Interlandi, Matteo; Katzenberger, Shon; Kazmi, Najeeb; Krivosheev, Gleb; Luferenko, Pete; Matantsev, Ivan; Matusevych, Sergiy; Moradi, Shahab; Nazirov, Gani; Ormont, Justin; Oshri, Gal; Pagnoni, Artidoro; Parmar, Jignesh; Roy, Prabhat; et al. (2019-05-15). "Machine Learning at Microsoft with ML.NET". Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 2448–2458. arXiv: 1905.05715 . doi:10.1145/3292500.3330667. ISBN   9781450362016. S2CID   53380995.{{cite book}}: |journal= ignored (help)
  18. "dotnet/machinelearning-modelbuilder". .NET Platform. 17 February 2020.
  19. "Artificial Intelligence Can Reinforce Bias, Cloud Giants Announce Tools For AI Fairness". Forbes. 2018-09-24. Retrieved 2018-12-05.
  20. "What it means to open AI's black box". PwC. 2018-05-15. Retrieved 2018-12-05.
  21. Hastie, Trevor J. (1 November 2017). "Generalized Additive Models". Statistical Models in S. pp. 249–307. doi:10.1201/9780203738535-7. ISBN   9780203738535.
  22. "Announcing ML.NET 0.8 – Machine Learning for .NET". Microsoft. 2018-12-04. Retrieved 2018-12-05.
  23. "Microsoft open-sources Infer.NET AI code just in time for the weekend". The Register. 2018-10-05. Retrieved 2018-10-31.
  24. "Microsoft open sources Infer.NET, its popular model-based machine learning framework". Packt. 2018-10-08. Retrieved 2018-10-31.
  25. "ML.NET – Export Machine Learning.Net models to ONNX format". El Bruno. 2018-07-11. Retrieved 2019-01-09.
  26. "ONNX.js: Universal Deep Learning Models in The Browser". Will Badr. 2019-01-08. Retrieved 2019-01-09.
  27. "AI School". Microsoft AI. 2018-05-07. Retrieved 2018-06-29.
  28. "ML.NET Guide". Microsoft. 2018-05-07. Retrieved 2018-06-29.
  29. "Infer.NET User Guide". Infer.NET. 2018-10-05. Retrieved 2018-10-31.

Further reading