Federated learning

Last updated
Diagram of a Federated Learning protocol with smartphones training a global AI model Centralized federated learning protocol.png
Diagram of a Federated Learning protocol with smartphones training a global AI model

Federated learning (also known as collaborative learning) is a sub-field of machine learning focusing on settings in which multiple entities (often referred to as clients) collaboratively train a model while ensuring that their data remains decentralized. [1] This stands in contrast to machine learning settings in which data is centrally stored. One of the primary defining characteristics of federated learning is data heterogeneity. Due to the decentralized nature of the clients' data, there is no guarantee that data samples held by each client are independently and identically distributed.

Contents

Federated learning is generally concerned with and motivated by issues such as data privacy, data minimization, and data access rights. Its applications involve a variety of research areas including defense, telecommunications, the Internet of Things, and pharmaceuticals.

Definition

Federated learning aims at training a machine learning algorithm, for instance deep neural networks, on multiple local datasets contained in local nodes without explicitly exchanging data samples. The general principle consists in training local models on local data samples and exchanging parameters (e.g. the weights and biases of a deep neural network) between these local nodes at some frequency to generate a global model shared by all nodes.

The main difference between federated learning and distributed learning lies in the assumptions made on the properties of the local datasets, [2] as distributed learning originally aims at parallelizing computing power where federated learning originally aims at training on heterogeneous datasets. While distributed learning also aims at training a single model on multiple servers, a common underlying assumption is that the local datasets are independent and identically distributed (i.i.d.) and roughly have the same size. None of these hypotheses are made for federated learning; instead, the datasets are typically heterogeneous and their sizes may span several orders of magnitude. Moreover, the clients involved in federated learning may be unreliable as they are subject to more failures or drop out since they commonly rely on less powerful communication media (i.e. Wi-Fi) and battery-powered systems (i.e. smartphones and IoT devices) compared to distributed learning where nodes are typically datacenters that have powerful computational capabilities and are connected to one another with fast networks. [3]

Mathematical formulation

The objective function for federated learning is as follows:

where is the number of nodes, are the weights of model as viewed by node , and is node 's local objective function, which describes how model weights conforms to node 's local dataset.

The goal of federated learning is to train a common model on all of the nodes' local datasets, in other words:

Centralized federated learning

In the centralized federated learning setting, a central server is used to orchestrate the different steps of the algorithms and coordinate all the participating nodes during the learning process. The server is responsible for the nodes selection at the beginning of the training process and for the aggregation of the received model updates. Since all the selected nodes have to send updates to a single entity, the server may become a bottleneck of the system. [3]

Diagrams of a centralized federated learning (on the left) and a decentralized federated learning (on the right) Federated learning (centralized vs decentralized).png
Diagrams of a centralized federated learning (on the left) and a decentralized federated learning (on the right)

Decentralized federated learning

In the decentralized federated learning setting, the nodes are able to coordinate themselves to obtain the global model. This setup prevents single point failures as the model updates are exchanged only between interconnected nodes without the orchestration of the central server. Nevertheless, the specific network topology may affect the performances of the learning process. [3] See blockchain-based federated learning [4] and the references therein.

Heterogeneous federated learning

An increasing number of application domains involve a large set of heterogeneous clients, e.g., mobile phones and IoT devices. [5] Most of the existing Federated learning strategies assume that local models share the same global model architecture. Recently, a new federated learning framework named HeteroFL was developed to address heterogeneous clients equipped with very different computation and communication capabilities. [6] The HeteroFL technique can enable the training of heterogeneous local models with dynamically varying computation and non-IID data complexities while still producing a single accurate global inference model. [6] [7]

Federated learning general process in central orchestrator setup Federated learning process central case.png
Federated learning general process in central orchestrator setup

Main features

Iterative learning

To ensure good task performance of a final, central machine learning model, federated learning relies on an iterative process broken up into an atomic set of client-server interactions known as a federated learning round. Each round of this process consists in transmitting the current global model state to participating nodes, training local models on these local nodes to produce a set of potential model updates at each node, and then aggregating and processing these local updates into a single global update and applying it to the global model. [3]

In the methodology below, a central server is used for aggregation, while local nodes perform local training depending on the central server's orders. However, other strategies lead to the same results without central servers, in a peer-to-peer approach, using gossip [8] or consensus methodologies. [9]

Assuming a federated round composed by one iteration of the learning process, the learning procedure can be summarized as follows: [10]

  1. Initialization: according to the server inputs, a machine learning model (e.g., linear regression, neural network, boosting) is chosen to be trained on local nodes and initialized. Then, nodes are activated and wait for the central server to give the calculation tasks.
  2. Client selection: a fraction of local nodes are selected to start training on local data. The selected nodes acquire the current statistical model while the others wait for the next federated round.
  3. Configuration: the central server orders selected nodes to undergo training of the model on their local data in a pre-specified fashion (e.g., for some mini-batch updates of gradient descent).
  4. Reporting: each selected node sends its local model to the server for aggregation. The central server aggregates the received models and sends back the model updates to the nodes. It also handles failures for disconnected nodes or lost model updates. The next federated round is started returning to the client selection phase.
  5. Termination: once a pre-defined termination criterion is met (e.g., a maximum number of iterations is reached or the model accuracy is greater than a threshold) the central server aggregates the updates and finalizes the global model.

The procedure considered before assumes synchronized model updates. Recent federated learning developments introduced novel techniques to tackle asynchronicity during the training process, or training with dynamically varying models. [6] Compared to synchronous approaches where local models are exchanged once the computations have been performed for all layers of the neural network, asynchronous ones leverage the properties of neural networks to exchange model updates as soon as the computations of a certain layer are available. These techniques are also commonly referred to as split learning [11] [12] and they can be applied both at training and inference time regardless of centralized or decentralized federated learning settings. [3] [6]

Non-IID data

In most cases, the assumption of independent and identically distributed samples across local nodes does not hold for federated learning setups. Under this setting, the performances of the training process may vary significantly according to the unbalanced local data samples as well as the particular probability distribution of the training examples (i.e., features and labels) stored at the local nodes. To further investigate the effects of non-IID data, the following description considers the main categories presented in the preprint by Peter Kairouz et al. from 2019. [3]

The description of non-IID data relies on the analysis of the joint probability between features and labels for each node. This allows decoupling of each contribution according to the specific distribution available at the local nodes. The main categories for non-iid data can be summarized as follows: [3]

The loss in accuracy due to non-iid data can be bounded through using more sophisticated means of doing data normalization, rather than batch normalization. [13]

Algorithmic hyper-parameters

Network topology

The way the statistical local outputs are pooled and the way the nodes communicate with each other can change from the centralized model explained in the previous section. This leads to a variety of federated learning approaches: for instance no central orchestrating server, or stochastic communication. [14]

In particular, orchestrator-less distributed networks are one important variation. In this case, there is no central server dispatching queries to local nodes and aggregating local models. Each local node sends its outputs to several randomly-selected others, which aggregate their results locally. This restrains the number of transactions, thereby sometimes reducing training time and computing cost. [15]

Federated learning parameters

Once the topology of the node network is chosen, one can control different parameters of the federated learning process (in addition to the machine learning model's own hyperparameters) to optimize learning:

Other model-dependent parameters can also be tinkered with, such as:

Those parameters have to be optimized depending on the constraints of the machine learning application (e.g., available computing power, available memory, bandwidth). For instance, stochastically choosing a limited fraction of nodes for each iteration diminishes computing cost and may prevent overfitting [ citation needed ], in the same way that stochastic gradient descent can reduce overfitting.

Technical limitations

Federated learning requires frequent communication between nodes during the learning process. Thus, it requires not only enough local computing power and memory, but also high bandwidth connections to be able to exchange parameters of the machine learning model. However, the technology also avoids data communication, which can require significant resources before starting centralized machine learning. Nevertheless, the devices typically employed in federated learning are communication-constrained, for example IoT devices or smartphones are generally connected to Wi-Fi networks, thus, even if the models are commonly less expensive to be transmitted compared to raw data, federated learning mechanisms may not be suitable in their general form. [3]

Federated learning raises several statistical challenges:

Federated learning variations

In this section, the notation of the paper published by H. Brendan McMahan and al. in 2017 is followed. [19]

To describe the federated strategies, let us introduce some notations:

Federated stochastic gradient descent (FedSGD)

Deep learning training mainly relies on variants of stochastic gradient descent, where gradients are computed on a random subset of the total dataset and then used to make one step of the gradient descent.

Federated stochastic gradient descent [20] is the direct transposition of this algorithm to the federated setting, but by using a random fraction of the nodes and using all the data on this node. The gradients are averaged by the server proportionally to the number of training samples on each node, and used to make a gradient descent step.

Federated averaging

Federated averaging (FedAvg) is a generalization of FedSGD, which allows local nodes to perform more than one batch update on local data and exchanges the updated weights rather than the gradients. The rationale behind this generalization is that in FedSGD, if all local nodes start from the same initialization, averaging the gradients is strictly equivalent to averaging the weights themselves. Further, averaging tuned weights coming from the same initialization does not necessarily hurt the resulting averaged model's performance. [19]

Federated Learning with Dynamic Regularization (FedDyn)

Federated learning methods suffer when the device datasets are heterogeneously distributed. Fundamental dilemma in heterogeneously distributed device setting is that minimizing the device loss functions is not the same as minimizing the global loss objective. In 2021, Acar et al. [21] introduced FedDyn method as a solution to heterogenous dataset setting. FedDyn dynamically regularizes each devices loss function so that the modified device losses converges to the actual global loss. Since the local losses are aligned, FedDyn is robust to the different heterogeneity levels and it can safely perform full minimization in each device. Theoretically, FedDyn converges to the optimal (a stationary point for nonconvex losses) by being agnostic to the heterogeneity levels. These claims are verified with extensive experimentations on various datasets. [21]

Minimizing the number of communications is the gold-standard for comparison in federated learning. We may also want to decrease the local computation levels per device in each round. FedDynOneGD [21] is an extension of FedDyn with less local compute requirements. FedDynOneGD calculates only one gradients per device in each round and update the model with a regularized version of the gradient. Hence, the computation complexity is linear in local dataset size. Moreover, gradient computation can be parallelizable within each device which is different from successive SGD steps. Theoretically, FedDynOneGD achieves the same convergence guarantees as in FedDyn with less local computation. [21]

Personalized Federated Learning by Pruning (Sub-FedAvg)

Federated Learning methods cannot achieve good global performance under Non-IID settings which motivates the participating clients to yield personalized models in federation. Recently, Vahidian et al. [22] introduced Sub-FedAvg opening a new personalized FL algorithm paradigm by proposing Hybrid Pruning (structured + unstructured pruning) with averaging on the intersection of clients’ drawn subnetworks which simultaneously handles communication efficiency, resource constraints and personalized models accuracies. [22]

Sub-FedAvg is the first work which shows existence of personalized winning tickets for clients in federated learning through experiments. [22] Moreover, it also proposes two algorithms on how to effectively draw the personalized subnetworks. [22] Sub-FedAvg tries to extend "Lottery Ticket Hypothesis" which is for centrally trained neural networks to federated learning trained neural networks leading to this open research problem: “Do winning tickets exist for clients’ neural networks being trained in federated learning? If yes, how to effectively draw the personalized subnetworks for each client?”

Dynamic Aggregation - Inverse Distance Aggregation

IDA (Inverse Distance Aggregation) is a novel adaptive weighting approach for clients based on meta-information which handles unbalanced and non-iid data. It uses the distance of the model parameters as a strategy to minimize the effect of outliers and improve the model's convergence rate. [23]

Hybrid Federated Dual Coordinate Ascent (HyFDCA)

Very few methods for hybrid federated learning, where clients only hold subsets of both features and samples, exist. Yet, this scenario is very important in practical settings. Hybrid Federated Dual Coordinate Ascent (HyFDCA) [24] is a novel algorithm proposed in 2024 that solves convex problems in the hybrid FL setting. This algorithm extends CoCoA, a primal-dual distributed optimization algorithm introduced by Jaggi et al. (2014) [25] and Smith et al. (2017), [26] to the case where both samples and features are partitioned across clients.

HyFDCA claims several improvement over existing algorithms:

There is only one other algorithm that focuses on hybrid FL, HyFEM proposed by Zhang et al. (2020). [29] This algorithm uses a feature matching formulation that balances clients building accurate local models and the server learning an accurate global model. This requires a matching regularizer constant that must be tuned based on user goals and results in disparate local and global models. Furthermore, the convergence results provided for HyFEM only prove convergence of the matching formulation not of the original global problem. This work is substantially different than HyFDCA's approach which uses data on local clients to build a global model that converges to the same solution as if the model was trained centrally. Furthermore, the local and global models are synchronized and do not require the adjustment of a matching parameter between local and global models. However, HyFEM is suitable for a vast array of architectures including deep learning architectures, whereas HyFDCA is designed for convex problems like logistic regression and support vector machines.

HyFDCA is empirically benchmarked against the aforementioned HyFEM as well as the popular FedAvg in solving convex problem (specifically classification problems) for several popular datasets (MNIST, Covtype, and News20). The authors found HyFDCA converges to a lower loss value and higher validation accuracy in less overall time in 33 of 36 comparisons examined and 36 of 36 comparisons examined with respect to the number of outer iterations. [24] Lastly, HyFDCA only requires tuning of one hyperparameter, the number of inner iterations, as opposed to FedAvg (which requires tuning three) or HyFEM (which requires tuning four). In addition to FedAvg and HyFEM being quite difficult to optimize hyperparameters in turn greatly affecting convergence, HyFDCA's single hyperparameter allows for simpler practical implementations and hyperparameter selection methodologies.

Federated ViT using Dynamic Aggregation (FED-REV)

Federated Learning (FL) provides training of global shared model using decentralized data sources on edge nodes while preserving data privacy. However, its performance in the computer vision applications using Convolution neural network (CNN) considerably behind that of centralized training due to limited communication resources and low processing capability at edge nodes. Alternatively, Pure Vision transformer models (VIT) outperform CNNs by almost four times when it comes to computational efficiency and accuracy. Hence, we propose a new FL model with reconstructive strategy called FED-REV, Illustrates how attention-based structures (pure Vision Transformers) enhance FL accuracy over large and diverse data distributed over edge nodes, in addition to the proposed reconstruction strategy that determines the dimensions influence of each stage of the vision transformer and then reduce its dimension complexity which reduce computation cost of edge devices in addition to preserving accuracy achieved due to using the pure Vision transformer. [30]

Current research topics

Federated learning has started to emerge as an important research topic in 2015 [2] and 2016, [31] with the first publications on federated averaging in telecommunication settings. Before that, in a thesis work titled "A Framework for Multi-source Prefetching Through Adaptive Weight", [32] an approach to aggregate predictions from multiple models trained at three location of a request response cycle with was proposed. Another important aspect of active research is the reduction of the communication burden during the federated learning process. In 2017 and 2018, publications have emphasized the development of resource allocation strategies, especially to reduce communication [19] requirements [33] between nodes with gossip algorithms [34] as well as on the characterization of the robustness to differential privacy attacks. [35] Other research activities focus on the reduction of the bandwidth during training through sparsification and quantization methods, [33] where the machine learning models are sparsified and/or compressed before they are shared with other nodes. Developing ultra-light DNN architectures is essential for device-/edge- learning and recent work recognises both the energy efficiency requirements [36] for future federated learning and the need to compress deep learning, especially during learning. [37]

Recent research advancements are starting to consider real-world propagating channels [38] as in previous implementations ideal channels were assumed. Another active direction of research is to develop Federated learning for training heterogeneous local models with varying computation complexities and producing a single powerful global inference model. [6]

A learning framework named Assisted learning was recently developed to improve each agent's learning capabilities without transmitting private data, models, and even learning objectives. [39] Compared with Federated learning that often requires a central controller to orchestrate the learning and optimization, Assisted learning aims to provide protocols for the agents to optimize and learn among themselves without a global model.

Use cases

Federated learning typically applies when individual actors need to train models on larger datasets than their own, but cannot afford to share the data in itself with others (e.g., for legal, strategic or economic reasons). The technology yet requires good connections between local servers and minimum computational power for each node. [3]

Transportation: self-driving cars

Self-driving cars encapsulate many machine learning technologies to function: computer vision for analyzing obstacles, machine learning for adapting their pace to the environment (e.g., bumpiness of the road). Due to the potential high number of self-driving cars and the need for them to quickly respond to real world situations, traditional cloud approach may generate safety risks. Federated learning can represent a solution for limiting volume of data transfer and accelerating learning processes. [40] [41]

Industry 4.0: smart manufacturing

In Industry 4.0, there is a widespread adoption of machine learning techniques [42] to improve the efficiency and effectiveness of industrial process while guaranteeing a high level of safety. Nevertheless, privacy of sensitive data for industries and manufacturing companies is of paramount importance. Federated learning algorithms can be applied to these problems as they do not disclose any sensitive data. [31] In addition, FL also implemented for PM2.5 prediction to support Smart city sensing applications. [43]

Medicine: digital health

Federated learning seeks to address the problem of data governance and privacy by training algorithms collaboratively without exchanging the data itself. Today's standard approach of centralizing data from multiple centers comes at the cost of critical concerns regarding patient privacy and data protection. To solve this problem, the ability to train machine learning models at scale across multiple medical institutions without moving the data is a critical technology. Nature Digital Medicine published the paper "The Future of Digital Health with Federated Learning" [44] in September 2020, in which the authors explore how federated learning may provide a solution for the future of digital health, and highlight the challenges and considerations that need to be addressed. Recently, a collaboration of 20 different institutions around the world validated the utility of training AI models using federated learning. In a paper published in Nature Medicine "Federated learning for predicting clinical outcomes in patients with COVID-19", [45] they showcased the accuracy and generalizability of a federated AI model for the prediction of oxygen needs in patients with COVID-19 infections. Furthermore, in a published paper "A Systematic Review of Federated Learning in the Healthcare Area: From the Perspective of Data Properties and Applications", the authors trying to provide a set of challenges on FL challenges on medical data-centric perspective. [46]

A coalition from industry and academia has developed MedPerf, [47] an open source platform that enables validation of medical AI models in real world data. The platform relies technically on federated evaluation of AI models aiming to alleviate concerns of patient privacy and conceptually on diverse benchmark committees to build the specifications of neutral clinically impactful benchmarks. [48]

Robotics

Robotics includes a wide range of applications of machine learning methods: from perception and decision-making to control. As robotic technologies have been increasingly deployed from simple and repetitive tasks (e.g. repetitive manipulation) to complex and unpredictable tasks (e.g. autonomous navigation), the need for machine learning grows. Federated Learning provides a solution to improve over conventional machine learning training methods. In the paper, [49] mobile robots learned navigation over diverse environments using the FL-based method, helping generalization. In the paper, [50] Federated Learning is applied to improve multi-robot navigation under limited communication bandwidth scenarios, which is a current challenge in real-world learning-based robotic tasks. In the paper, [51] Federated Learning is used to learn Vision-based navigation, helping better sim-to-real transfer.

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

Stochastic gradient descent is an iterative method for optimizing an objective function with suitable smoothness properties. It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient by an estimate thereof. Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate.

In machine learning, backpropagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization algorithm to compute the network parameter updates.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

Medoids are representative objects of a data set or a cluster within a data set whose sum of dissimilarities to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always restricted to be members of the data set. Medoids are most commonly used on data when a mean or centroid cannot be defined, such as graphs. They are also used in contexts where the centroid is not representative of the dataset like in images, 3-D trajectories and gene expression. These are also of interest while wanting to find a representative using some distance other than squared euclidean distance.

<span class="mw-page-title-main">Quantum neural network</span> Quantum Mechanics in Neural Networks

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest. A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Feature learning</span> Set of learning techniques in machine learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

Local differential privacy (LDP) is a model of differential privacy with the added requirement that if an adversary has access to the personal responses of an individual in the database, that adversary will still be unable to learn much of the user's personal data. This is contrasted with global differential privacy, a model of differential privacy that incorporates a central aggregator with access to the raw data.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

An energy-based model (EBM) (also called a Canonical Ensemble Learning(CEL) or Learning via Canonical Ensemble (LCE)) is an application of canonical ensemble formulation of statistical physics for learning from data problems. The approach prominently appears in generative models (GMs).

<span class="mw-page-title-main">Physics-informed neural networks</span> Technique to solve partial differential equations

Physics-informed neural networks (PINNs) are a type of universal function approximators that can embed the knowledge of any physical laws that govern a given data-set in the learning process, and can be described by partial differential equations (PDEs). They overcome the low data availability of some biological and engineering systems that makes most state-of-the-art machine learning techniques lack robustness, rendering them ineffective in these scenarios. The prior knowledge of general physical laws acts in the training of neural networks (NNs) as a regularization agent that limits the space of admissible solutions, increasing the correctness of the function approximation. This way, embedding this prior information into a neural network results in enhancing the information content of the available data, facilitating the learning algorithm to capture the right solution and to generalize well even with a low amount of training examples.

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

References

  1. Kairouz, Peter; McMahan, H. Brendan; Avent, Brendan; Bellet, Aurélien; Bennis, Mehdi; Bhagoji, Arjun Nitin; Bonawitz, Kallista; Charles, Zachary; Cormode, Graham; Cummings, Rachel; D’Oliveira, Rafael G. L.; Eichner, Hubert; Rouayheb, Salim El; Evans, David; Gardner, Josh (2021-06-22). "Advances and Open Problems in Federated Learning". Foundations and Trends in Machine Learning. 14 (1–2): 1–210. arXiv: 1912.04977 . doi:10.1561/2200000083. ISSN   1935-8237.
  2. 1 2 Konečný, Jakub; McMahan, Brendan; Ramage, Daniel (2015). "Federated Optimization: Distributed Optimization Beyond the Datacenter". arXiv: 1511.03575 [cs.LG].
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kairouz, Peter; Brendan McMahan, H.; Avent, Brendan; Bellet, Aurélien; Bennis, Mehdi; Arjun Nitin Bhagoji; Bonawitz, Keith; Charles, Zachary; Cormode, Graham; Cummings, Rachel; D'Oliveira, Rafael G. L.; Salim El Rouayheb; Evans, David; Gardner, Josh; Garrett, Zachary; Gascón, Adrià; Ghazi, Badih; Gibbons, Phillip B.; Gruteser, Marco; Harchaoui, Zaid; He, Chaoyang; He, Lie; Huo, Zhouyuan; Hutchinson, Ben; Hsu, Justin; Jaggi, Martin; Javidi, Tara; Joshi, Gauri; Khodak, Mikhail; et al. (10 December 2019). "Advances and Open Problems in Federated Learning". arXiv: 1912.04977 [cs.LG].
  4. Pokhrel, Shiva Raj; Choi, Jinho (2020). "Federated Learning with Blockchain for Autonomous Vehicles: Analysis and Design Challenges". IEEE Transactions on Communications. 68 (8): 4734–4746. doi:10.1109/TCOMM.2020.2990686. S2CID   219006840.
  5. Xu, Zirui; Yu, Fuxun; Xiong, Jinjun; Chen, Xiang (December 2021). "Helios: Heterogeneity-Aware Federated Learning with Dynamically Balanced Collaboration". 2021 58th ACM/IEEE Design Automation Conference (DAC). pp. 997–1002. arXiv: 1912.01684 . doi:10.1109/DAC18074.2021.9586241. ISBN   978-1-6654-3274-0. S2CID   243925551.
  6. 1 2 3 4 5 6 7 Diao, Enmao; Ding, Jie; Tarokh, Vahid (2020-10-02). "HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients". arXiv: 2010.01264 [cs.LG].
  7. Yu, Fuxun; Zhang, Weishan; Qin, Zhuwei; Xu, Zirui; Wang, Di; Liu, Chenchen; Tian, Zhi; Chen, Xiang (2021-08-14). "Fed2". Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. KDD '21. New York, NY, USA: Association for Computing Machinery. pp. 2066–2074. arXiv: 2111.14248 . doi:10.1145/3447548.3467309. ISBN   978-1-4503-8332-5. S2CID   240598436.
  8. Decentralized Collaborative Learning of Personalized Models over Networks Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi, 2017
  9. Savazzi, Stefano; Nicoli, Monica; Rampa, Vittorio (May 2020). "Federated Learning With Cooperating Devices: A Consensus Approach for Massive IoT Networks". IEEE Internet of Things Journal. 7 (5): 4641–4654. arXiv: 1912.13163 . doi:10.1109/JIOT.2020.2964162. S2CID   209515403.
  10. Towards federated learning at scale: system design, Keith Bonawitz Hubert Eichner and al., 2019
  11. Gupta, Otkrist; Raskar, Ramesh (14 October 2018). "Distributed learning of deep neural network over multiple agents". arXiv: 1810.06060 [cs.LG].
  12. Vepakomma, Praneeth; Gupta, Otkrist; Swedish, Tristan; Raskar, Ramesh (3 December 2018). "Split learning for health: Distributed deep learning without sharing raw patient data". arXiv: 1812.00564 [cs.LG].
  13. Hsieh, Kevin; Phanishayee, Amar; Mutlu, Onur; Gibbons, Phillip (2020-11-21). "The Non-IID Data Quagmire of Decentralized Machine Learning". International Conference on Machine Learning. PMLR: 4387–4398. arXiv: 1910.00189 .
  14. Collaborative Deep Learning in Fixed Topology Networks, Zhanhong Jiang, Mukesh Yadaw, Chinmay Hegde, Soumik Sarkar, 2017
  15. GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent, Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, Vinay Amatya, 2018
  16. Bagdasaryan, Eugene; Veit, Andreas; Hua, Yiqing (2019-08-06). "How To Backdoor Federated Learning". arXiv: 1807.00459 [cs.CR].
  17. Vahid, Diao; Ding, Enmao; Tarokh, Jie (2021-06-02). SemiFL: Communication Efficient Semi-Supervised Federated Learning with Unlabeled Clients. OCLC   1269554828.
  18. "Apache Wayang - Home". wayang.apache.org.
  19. 1 2 3 Communication-Efficient Learning of Deep Networks from Decentralized Data, H. Brendan McMahan and al. 2017
  20. Privacy Preserving Deep Learning, R. Shokri and V. Shmatikov, 2015
  21. 1 2 3 4 Acar, Durmus Alp Emre; Zhao, Yue; Navarro, Ramon Matas; Mattina, Matthew; Whatmough, Paul N.; Saligrama, Venkatesh (2021). "Federated Learning Based on Dynamic Regularization". ICLR. arXiv: 2111.04263 .
  22. 1 2 3 4 Vahidian, Saeed; Morafah, Mahdi; Lin, Bill (2021). "Personalized Federated Learning by Structured and Unstructured Pruning under Data Heterogeneity". Icdcs-W. arXiv: 2105.00562 .
  23. Yeganeh, Yousef; Farshad, Azade; Navab, Nassir; Albarqouni, Shadi (2020). "Inverse Distance Aggregation for Federated Learning with Non-IID Data". Icdcs-W. arXiv: 2008.07665 .
  24. 1 2 Overman, T., Blum, G., & Klabjan, D.. (2024). A Primal-Dual Algorithm for Hybrid Federated Learning, https://arxiv.org/pdf/2210.08106.pdf
  25. Jaggi, M., Smith, V., Takácˇ, M., Terhorst, J., Krishnan, S., Hofmann, T., and Jordan, M. I. (2014). Communication- efficient distributed dual coordinate ascent. In Pro- ceedings of the 27th International Conference on Neu- ral Information Processing Systems, volume 2, page 3068–3076.
  26. Smith, V., Forte, S., Ma, C., Takácˇ, M., Jordan, M. I., and Jaggi, M. (2017). Cocoa: A general framework for communication-efficient distributed optimization. Journal of Machine Learning Research, 18(1):8590–8638.
  27. McMahan, H. B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In AISTATS, volume 54, pages 1273–1282
  28. Liu, Y., Zhang, X., Kang, Y., Li, L., Chen, T., Hong, M., and Yang, Q. (2022). Fedbcd: A communication- efficient collaborative learning framework for distributed features. IEEE Transactions on Signal Processing, pages 1–12.
  29. Zhang, X., Yin, W., Hong, M., and Chen, T. (2020). Hybrid federated learning: Algorithms and implementation. In NeurIPS-SpicyFL 2020.
  30. Ismail, Hatem (August 2022). "A FEDERATED PURE VISION TRANSFORMER ALGORITHM FOR COMPUTER VISION USING DYNAMIC AGGREGATION MODEL" (PDF). NeuroQuantology (published Aug 2022).
  31. 1 2 Federated Optimization: Distributed Machine Learning for On-Device Intelligence, Jakub Konečný, H. Brendan McMahan, Daniel Ramage and Peter Richtárik, 2016
  32. Berhanu, Yoseph. "A Framework for Multi-source Prefetching Through Adaptive Weight".
  33. 1 2 Konečný, Jakub; McMahan, H. Brendan; Yu, Felix X.; Richtárik, Peter; Suresh, Ananda Theertha; Bacon, Dave (30 October 2017). "Federated Learning: Strategies for Improving Communication Efficiency". arXiv: 1610.05492 [cs.LG].
  34. Gossip training for deep learning, Michael Blot and al., 2017
  35. Differentially Private Federated Learning: A Client Level Perspective Robin C. Geyer and al., 2018
  36. Du, Zhiyong; Deng, Yansha; Guo, Weisi; Nallanathan, Arumugam; Wu, Qihui (2021). "Green Deep Reinforcement Learning for Radio Resource Management: Architecture, Algorithm Compression, and Challenges". IEEE Vehicular Technology Magazine. 16: 29–39. doi:10.1109/MVT.2020.3015184. hdl: 1826/16378 . S2CID   204401715.
  37. "Random sketch learning for deep neural networks in edge computing". Nature Computational Science. 1. 2021.
  38. Amiri, Mohammad Mohammadi; Gunduz, Deniz (10 February 2020). "Federated Learning over Wireless Fading Channels". arXiv: 1907.09769 [cs.IT].
  39. Xian, Xun; Wang, Xinran; Ding, Jie; Ghanadan, Reza (2020). "Assisted Learning: A Framework for Multi-Organization Learning". Advances in Neural Information Processing Systems. 33. arXiv: 2004.00566 .
  40. Pokhrel, Shiva Raj (2020). "Federated learning meets blockchain at 6G edge: a drone-assisted networking for disaster response": 49–54. doi:10.1145/3414045.3415949. S2CID   222179104.{{cite journal}}: Cite journal requires |journal= (help)
  41. Elbir, Ahmet M.; Coleri, S. (2 June 2020). "Federated Learning for Vehicular Networks". arXiv: 2006.01412 [eess.SP].
  42. Cioffi, Raffaele; Travaglioni, Marta; Piscitelli, Giuseppina; Petrillo, Antonella; De Felice, Fabio (2019). "Artificial Intelligence and Machine Learning Applications in Smart Production: Progress, Trends, and Directions". Sustainability. 12 (2): 492. doi: 10.3390/su12020492 .
  43. Putra, Karisma Trinanda; Chen, Hsing-Chung; Prayitno; Ogiela, Marek R.; Chou, Chao-Lung; Weng, Chien-Erh; Shae, Zon-Yin (January 2021). "Federated Compressed Learning Edge Computing Framework with Ensuring Data Privacy for PM2.5 Prediction in Smart City Sensing Applications". Sensors. 21 (13): 4586. Bibcode:2021Senso..21.4586P. doi: 10.3390/s21134586 . PMC   8271576 . PMID   34283140.
  44. Rieke, Nicola; Hancox, Jonny; Li, Wenqi; Milletarì, Fausto; Roth, Holger R.; Albarqouni, Shadi; Bakas, Spyridon; Galtier, Mathieu N.; Landman, Bennett A.; Maier-Hein, Klaus; Ourselin, Sébastien; Sheller, Micah; Summers, Ronald M.; Trask, Andrew; Xu, Daguang; Baust, Maximilian; Cardoso, M. Jorge (14 September 2020). "The future of digital health with federated learning". npj Digital Medicine. 3 (1): 119. arXiv: 2003.08119 . doi:10.1038/s41746-020-00323-1. PMC   7490367 . PMID   33015372. S2CID   212747909.
  45. Dayan, Ittai; Roth, Holger R.; Zhong, Aoxiao; et al. (2021). "Federated learning for predicting clinical outcomes in patients with COVID-19". Nature Medicine. 27 (10): 1735–1743. doi:10.1038/s41591-021-01506-3. PMC   9157510 . PMID   34526699. S2CID   237536154.
  46. Prayitno; Shyu, Chi-Ren; Putra, Karisma Trinanda; Chen, Hsing-Chung; Tsai, Yuan-Yu; Hossain, K. S. M. Tozammel; Jiang, Wei; Shae, Zon-Yin (January 2021). "A Systematic Review of Federated Learning in the Healthcare Area: From the Perspective of Data Properties and Applications". Applied Sciences. 11 (23): 11191. doi: 10.3390/app112311191 .
  47. Karargyris, Alexandros; Umeton, Renato; Sheller, Micah J.; et al. (17 July 2023). "Federated benchmarking of medical artificial intelligence with MedPerf". Nature Machine Intelligence. 5 (7). Springer Science and Business Media LLC: 799–810. arXiv: 2110.01406 . doi: 10.1038/s42256-023-00652-2 . ISSN   2522-5839. PMID   38706981.
  48. "Announcing MedPerf Open Benchmarking Platform for Medical AI". MLCommons. 2023-07-17. Retrieved 2023-09-13.
  49. Liu, Boyi; Wang, Lujia; Liu, Ming (2019). "Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems". 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1688–1695. arXiv: 1901.06455 . doi:10.1109/IROS40897.2019.8967908. ISBN   978-1-7281-4004-9. S2CID   210972473.
  50. Na, Seongin; Rouček, Tomáš; Ulrich, Jiří; Pikman, Jan; Krajník, Tomáš; Lennox, Barry; Arvin, Farshad (2023). "Federated Reinforcement Learning for Collective Navigation of Robotic Swarms". IEEE Transactions on Cognitive and Developmental Systems. 15 (4): 1. arXiv: 2202.01141 . doi:10.1109/TCDS.2023.3239815. S2CID   246473085.
  51. Yu, Xianjia; Queralta, Jorge Pena; Westerlund, Tomi (2022). "Towards Lifelong Federated Learning in Autonomous Mobile Robots with Continuous Sim-to-Real Transfer". Procedia Computer Science. 210: 86–93. arXiv: 2205.15496 . doi:10.1016/j.procs.2022.10.123.