Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational April 22, 2016 Expires: October 24, 2016 Network Machine Learning draft-jiang-nmlrg-network-machine-learning-01 Abstract This document introduces background information of machine learning briefly, then explores the potential of machine learning techniques for networks. This document is serving as a white paper of the (proposed) IRTF Network Machine Learning Research Group. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 24, 2016. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Jiang Expires October 24, 2016 [Page 1] Internet-Draft Network Machine Learning April 2016 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Brief Background of Machine Learning . . . . . . . . . . . . 3 3.1. Machine Learning Categories . . . . . . . . . . . . . . . 3 3.2. Machine Learning Approaches . . . . . . . . . . . . . . . 3 3.3. Successful Applications . . . . . . . . . . . . . . . . . 5 3.4. Precondition of Applying Machine Learning Approach . . . 5 3.5. Limitation of Machine Learning Mechanism . . . . . . . . 5 4. Network Machine Learning Research Group in IRTF . . . . . . . 6 5. Use Cases Study of Applying Machine Learning in Network . . . 7 5.1. Network Traffic . . . . . . . . . . . . . . . . . . . . . 7 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 9. Change log [RFC Editor: Please remove] . . . . . . . . . . . 8 10. Informative References . . . . . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction Machine learning techniques help to make predictions or decisions by learning from historical data. As machine learning mechanism could dynamically adapt to varying situations and enhance their own intelligence by learning from new data, they are more flexible in handling complicated tasks than strictly static program instructions. Therefore, machine learning techniques have been widely applied in image analysis, pattern recognition, language recognition, conversation simulation, and etc. With deep exploration, machine learning techniques would cast light on studies of autonomic networking, in that they could be well adapted to learn the various environments of networks and react to dynamic situations. The proposed Network Machine Learning Research Group (NMLRG) was formed within IRTF (Internet Research Task Force), October, 2015. As a procedure, currently, IRTF request an one-year provisional period. After this period, the proposed research group may become a formal research group if there is a steady research community. The NMLRG provides a forum for researchers to explore the potential of machine learning techniques for networks. This document firstly provides background information of machine learning briefly, then explores the potential of machine learning techniques for networks functions, such as network control, network management, and supplying network data for upper-layer applications. Jiang Expires October 24, 2016 [Page 2] Internet-Draft Network Machine Learning April 2016 Author notice: this document is in the primary stage. It is an ongoing document for the proposed Network Machine Learning Research Group. For now, it is not clear whether it would be published or not. 2. Terminology The terminology defined in this document. Machine Learning A computational mechanism that analyzes and learns from data input, either historic data or real-time feedback data, following designed model/pattern. It can be used to make predictions or decision, rather than following strictly static program instructions. 3. Brief Background of Machine Learning 3.1. Machine Learning Categories Machine learning mechanisms are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available: Supervised learning The machine learning mechanism is given labeled inputs and the correspondent desired outputs. The mechanism could learn a general rule that maps inputs to outputs by itself. Unsupervised learning The given input are not labeled. It leaves the machine learning mechanism itself to find structure in its input and output. Reinforcement learning The machine learning mechanism interacts with dynamic environments in which it performs a certain task and receives feedback from its action. Between supervised and unsupervised learning, there is semi- supervised learning, in which input data are partially labeled. 3.2. Machine Learning Approaches There are a few basic machine learning approaches. They can be mixed together to complete complicated tasks. Classification With the training data that has been labeled into a number of classes, the machine learning mechanism could assign new unlabeled data into one or more these classes. An example is SPAM filtering, in which emails are classified into "spam" or "not spam" classes. Jiang Expires October 24, 2016 [Page 3] Internet-Draft Network Machine Learning April 2016 Clustering Without labeled training data, the machine learning mechanism divides data into groups. It is the learning mechanism itself to decide the number or structure of output classes. Regression It estimates the relationships among variables. The outputs are continuous. Anomaly detection It detects specific data which do not conform to an expected pattern or other data in a data set. Density estimation The machine learning mechanism needs to identify the distribution of input data. Dimensionality reduction The machine learning mechanism could simplify inputs by mapping them into a lower-dimensional space. Decision tree learning The learning output is structured into a decision tree as a predictive model. Association rule learning The learning delivers potential relations between variables. Artificial neural networks also called "neural network". It is inspired by the structure and functions of biological neural networks. It is structured by a number of interconnected computational "neurons", each of which has independent deciding ability. The connections have numeric weights that can be tuned according to feedback and trends, making neural nets adaptive to inputs and capable of learning. Reinforcement learning It is inspired by behaviorist psychology. The mechanism take actions in an environment so as to maximize cumulative reward. Similarity and metric learning It learns from training data a similarity function that measures how similar or related two objects are. Representation learning Also called feature learning. It learns a feature - a transformation of raw data input to a representation that can be effectively exploited in machine learning tasks. This is not a full enumerated list of machine learning approaches. Other approaches may include support vector machines, bayesian networks, inductive logic programming, sparse dictionary learning, genetic algorithms, and etc. Jiang Expires October 24, 2016 [Page 4] Internet-Draft Network Machine Learning April 2016 Editor notes: the basic algorithms that machine learning approaches use may be listed as a future work. It may be too detailed and too many to be included. 3.3. Successful Applications Machine learning approaches have been successfully applied in many areas, such as human behavior analysis, image analysis, nature language recognition (including speech and handwriting processing), conversation simulation, medical diagnosis, structural health monitoring, stock market analysis, biological analysis and classifying, loan and insurance evaluation, game playing, and many other applications. As for network applications, such as search engines, SPAM filtering, adaptive website, Internet fraud detection, online advertising, etc., have all been greatly benefited from the machine learning mechanism. However, most of those successful stories are in the application layer of network perspective. 3.4. Precondition of Applying Machine Learning Approach Although it is different from big data or data mining, machine learning does also need data. However, machine learning can be applied with small set of data or dynamic feedback from environment. The quality of data decides the efficient and accuracy of machine learning. There is no generic machine learning mechanism that could suitable for all or most of use cases. For each use case, the developers need to design a specific analysis path, which may combine multiple approaches or algorithms together. The feature design and analysis path design are the key factor in the machine learning applications. To achieve autonomic decision or minimize the human intervention, there should be evaluation system for the results of machine learning mechanism. The evaluation system could be the measurement that the results of machine learning mechanism are executed. The evaluation system and machine learning mechanism could compose a close decision loop for autonomic decision. 3.5. Limitation of Machine Learning Mechanism So far, the machine learning mechanism does not perform very well for accurate result. In most successful cases, it is used as an assistant analysis tool. Its results are usually accepted in fault- tolerant environment or with further human confirmation. Jiang Expires October 24, 2016 [Page 5] Internet-Draft Network Machine Learning April 2016 4. Network Machine Learning Research Group in IRTF The Network Machine Learning Research Group (NMLRG), which was formed as a proposed research group of IRTF, October, 2015 (as a procedure, a proposed research group may become a formal research group after one year provisional period), provides a forum for researchers to explore the potential of machine learning techniques for networks. In particular, the NMLRG will work on potential approaches that apply machine learning techniques in network control, network management, and supplying network data for upper-layer applications. The initial focus of the NMLRG will be on higher-layer concepts where the machine learning mechanism could be applied in order to enhance the network establishing, controlling, managing, network applications and customer services. This includes mechanisms to acquire knowledge from the existing networks so that new networks can be established with minimum efforts; the potential to use machine learning mechanisms for routing control and optimization; using machine learning mechanisms in network management to predict future network status; using machine learning mechanisms to autonomic and dynamical network management; using machine learning mechanisms to analyze network faults and support recovery; learning network attacks and their behaviors, so that protection mechanisms could be self- adapted; unifying the data structure and the communication interface between network/network devices and customers, so that the upper- layer applications could easily obtain relevant network information, etc. The NMLRG is expected to identify and document requirements, to survey possible approaches, to provide specifications for proposed solutions, and to prove concepts with prototype implementations that can be tested in real-world environments. The more knowledge we have, the more intelligent we are. It is the same for networks and network management. Up to now, the only available network knowledge is usually the current network status inside a given device or relevant current status from other devices. However, historic knowledge is very helpful to make correct decisions, in particular to reduce network oscillation or to manage network resources over time. Transplantable knowledge from other networks can be helpful to initially set up a new network or new network devices. Knowledge of relationships between network events and network configuration may help a network to decide the best parameters according to real performance feedback. In addition to such historic knowledge, powerful data analytics of current network conditions may also be a valuable source of knowledge that can be exploited directly. The machine learning mechanism is the correspondent mechanism to learn and apply knowledge intelligently. Jiang Expires October 24, 2016 [Page 6] Internet-Draft Network Machine Learning April 2016 5. Use Cases Study of Applying Machine Learning in Network In 2016, the NMLRG is focusing on collecting and studying of use cases that applies machine learning mechanisms into network area. More use cases are still in the collecting process. 5.1. Network Traffic Network traffic is one of the most important objectives that needs to be managed in network/Internet area. Network traffic meets preconditions of applying Machine Learning mechanisms. It is full of data: the network traffic itself is data source, also there are many properties of network traffic are measurable, such as latency, number of packets, last period, etc. The network traffics are complicated. Its characteristics are often beyond the awareness of human operators. Machine Learning would greatly help to discover knowledge regarding to network traffics. The network traffics are always dynamic changing. There is both regularity and irregularity. Quick response to real-time network traffic is a big challenge to network management. It is beyond the ability of human operator. The rigid management has already become a bottleneck of current networks. Machine Learning could form a quick and adaptive auto response managing system. There are many different types of network traffic. In April 2016, NMLRG #2 IETF 95 meeting was organized with the theme of network traffic. There are multiple use cases presented: HTTPS traffic classification, machine learning in the router - learn from and act on network traffics, applications of machine learning to flow-based monitoring, malicious domains: automatic detection with DNS traffic analysis, machine-learning based policy derivation and evaluation in broadband networks, predicting interface failures for better traffic management NMLRG is currently working on a dedicated document for this theme. It is potential this document becomes RG document and is published as a RFC in the future. 6. Security Considerations This document is focused on applying machine learning in network, including of course applying machine learning in network security, on higher-layer concepts. Therefore, it does not itself create any new security issues. Jiang Expires October 24, 2016 [Page 7] Internet-Draft Network Machine Learning April 2016 7. IANA Considerations This memo includes no request to IANA. 8. Acknowledgements The author would like to acknowledge the valuable comments made by participants in the IRTF Network Machine Learning Research Group, particular thanks to Lars Eggert, Brian Carpenter, Albert Cabellos, Shufan Ji, Panagiotis Demestichas, Jerome Francois, Susan Hares, Rudra Saha, Dacheng Zhang and Bing Liu. This document was produced using the xml2rfc tool [RFC2629]. 9. Change log [RFC Editor: Please remove] draft-jiang-nmlrg-network-machine-learning-01: adding brief description of network traffic and ML into use case study, 2016-4-23. draft-jiang-nmlrg-network-machine-learning-00: original version, 2015-10-19. 10. Informative References [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, DOI 10.17487/RFC2629, June 1999, . Author's Address Sheng Jiang Huawei Technologies Co., Ltd Q14, Huawei Campus, No.156 Beiqing Road Hai-Dian District, Beijing, 100095 P.R. China Email: jiangsheng@huawei.com Jiang Expires October 24, 2016 [Page 8]