Abstract

Distributed multi-agent systems are becoming increasingly crucial for diverse applications in robotics because of their capacity for scalability, efficiency, robustness, resilience, and the ability to accomplish complex tasks. Controlling these large-scale swarms by relying on local information is very challenging. Although centralized methods are generally efficient or optimal, they face the issue of scalability and are often impractical. Given the challenge of finding an efficient decentralized controller that uses only local information to accomplish a global task, we propose a learning-based approach to decentralized control using supervised learning. Our approach entails training controllers to imitate a centralized controller's behavior but uses only local information to make decisions. The controller is parameterized by aggregation graph neural networks (GNNs) that integrate information from remote neighbors. The problems of segregation and aggregation of a swarm of heterogeneous agents are explored in 2D and 3D point mass systems as two use cases to illustrate the effectiveness of the proposed framework. The decentralized controller is trained using data from a centralized (expert) controller derived from the concept of artificial differential potential. Our learned models successfully transfer to actual robot dynamics in physics-based Turtlebot3 robot swarms in Gazebo/ROS2 simulations and hardware implementation and Crazyflie quadrotor swarms in Pybullet simulations. Our experiments show that our controller performs comparably to the centralized controller and demonstrates superior performance compared to a local controller. Additionally, we showed that the controller is scalable by analyzing larger teams and diverse groups with up to 100 robots.

1 Introduction

Robots have become prevalent for various applications such as mapping, surveillance, delivery of goods, transportation, and emergency response. As real-world problems become complex, the need for multi-agent systems (MAS) instead of single-agent systems becomes paramount. In many robotics applications, distributed multi-agent systems are becoming crucial due to their potential for enhancing efficiency, fault tolerance, scalability, robustness, and resilience. These applications span diverse domains such as space exploration [1], cooperative localization [2], collaborative robots in production factories [3], search and rescue [4], and traffic control systems [5]. Hence, swarm robotics has been a focus of research for several years. Swarm robots are simple and large number of robots that collaborate to accomplish a collective objective, drawing inspiration from natural behaviors. Coordination and control of multiple robots has garnered significant interest from researchers and has been explored for various applications, including collective exploration, collective fault detection, collective transport, task allocation, search and rescue, etc. In nature, various systems form groups, where agents unite or separate based on intended purpose or unique characteristics [6]. Control of several agents to achieve a desired pattern or shape is critical for tasks that rely on coordinated action by multi-agent systems such as pattern formation, collective exploration, object clustering, and assembling. Most swarm research works concentrate on robots that are identical or have similar hardware and software frameworks, known as homogeneous systems [79]. Nonetheless, many applications of multi-agent systems necessitate the collaboration of diverse teams of agents with distinct characteristics to accomplish a specific task.

A key benefit of employing heterogeneous swarms lies in their capacity to tackle a diverse range of tasks, enabling the assignment of certain functions to a subset of the swarm [10]. In applications like encircling areas with hazardous waste or chemicals, boundary protection, and surveillance, which require the collaboration of multiple robots with diverse sensors or actuation capabilities [11], the robots need to synchronize their movements in a specific pattern. A viable approach would involve the robots organizing themselves into distinct groups for the subtasks. This behavior is known as Segregation. A similar behavior called Aggregation occurs when the different robot types are intermixed homogeneously. In such circumstances, incorporating all the necessary actuation and sensory functionalities within a single robot can be impractical; thus, a heterogeneous team with the appropriate blend of components becomes necessary.

Controlling large-scale swarms can be very challenging. Centralized methods are generally optimal and a good option for a small scale of robots. All the information is received by a central agent responsible for calculating the actions of each individual agent. In other formulations of centralized controllers, actions are computed for different agents locally but require global information. This is a bottleneck, not robust or scalable, and often impractical. With the increasing number of agents, decentralized control becomes essential. Each agent is responsible for determining its own action based on the local information from its neighbors. However, finding an optimal decentralized controller to achieve a global task using local information is a complex problem. Particularly, the effect of individual action on global behavior is hard to predict and is often governed by emergent behavior. This makes it very hard to solve the inverse problem of finding local agent actions that can lead to desired global behavior.

This paper focuses on developing solutions for addressing the challenges of the decentralized systems—local information, control, and scalability—by exploiting the power of deep learning and Graph Neural Networks. We developed a decentralized controller for segregating and aggregating a heterogeneous robotic swarm based on the robot type. Here, the proposed method focuses on the imitation learning framework and graph neural network parameterization for learning a scalable control policy for holonomic mobile agents moving in a 2D and 3D Euclidean space. The imitation learning framework aims to replicate a global controller's actions and behaviors. The policy is learned using the DAGGER (Dataset Aggregation) algorithm. Our experiments show that the learned controller, which uses only local information, delivers performance close to the expert controller that uses global information for decision-making and outperforms a classical local controller. The number of robots was also scaled up to 100 and 10 groups. Also, the controller was scaled to larger nonholonomic mobile robots and flying robot swarms.

2 Related Works

2.1 Segregation and Aggregation Behaviors.

Reynolds [12] proposed one of the pioneering research works to simulate swarming behaviors, including herding and the flocking of birds. The flocking algorithm incorporates three fundamental driving rules: alignment, guiding agents toward their average heading; separation, to avoid collisions; and cohesion, steering agents toward their average position. Expanding upon this foundation, numerous researchers have delved into the development of both centralized and decentralized control methodologies, including artificial potential [13], hydrodynamic control [14], leader–follower [7], and more [8,9].

Segregation and Aggregation are behaviors seen in several biological systems and are widely studied phenomena. Segregation is a sorting mechanism in various natural processes observed in numerous organisms—cells and animals [15]. Segregation seen in nature includes cell division in embryogenesis [15], strain odor recognition causing aggregation and segregation of cockroaches [16], and tentacle morphogenesis in hydra [17].

Aggregation is an extensively studied behavior found in living organisms such as fish, bacteria, cockroaches, mammals, and arthropods [18,19]. Jeanson et al. [20] developed a model for aggregation in cockroach larvae and reported that cockroaches leave their clusters and join back with the probabilities correlating to the cluster sizes. Garnier et al. [21] demonstrated aggregation behavior of twenty homogeneous Alice robots by modeling this behavior found in German cockroaches. Similarly, [22] analyzed a comparable model showing the combination of locomotion speed and sensing radius required for aggregation using probabilistic aggregation rules. The aggregation problem is quite challenging because the swarms can form clusters instead of being aggregated [23]. Examining methods that display this behavior can assist in designing techniques for distributed systems [24].

It has been previously shown that the variations in intercellular adhesiveness result in sorting or aggregation in specific cellular interactions [25,26]. Steinberg [26] developed a Differential Adhesion Hypothesis that states that the differences in the cohesion work in similar and dissimilar cells can achieve cell segregation or cell aggregation. As such, when a cell population encounters more potent cohesive forces from cells of the same type than from dissimilar types, an imbalance emerges, which causes segregation. The reverse of this action causes aggregation.

2.2 Decentralized Control of Robot Swarms Using Graph Neural Networks.

Previous research in segregation and aggregation involves classical methods, including convex optimization approach [11], evolutionary methods [27], particle swarm optimization [28], probabilistic aggregation algorithm [29,30], differential potential [24,25,31], model predictive control [32], etc. Probabilistic aggregation algorithms encounter the difficulties associated with unstable aggregation, as robots are consistently entering and adapting [33]. In Ref. [34], genetic algorithm was used to train a neural network for static and dynamic aggregation but faced the issue of scalability, unstable aggregation, and required large computational onboard resources. In Refs. [25] and [35], a differential potential concept was proposed though it uses global information.

However, large-scale systems frequently encounter challenges related to scalability, communication bottlenecks, and robustness. While centralized solutions, where a central agent determines the actions of the entire team, may be viable in small-scale scenarios, the demand for decentralized solutions becomes paramount as the system size grows. In decentralized systems, communication among agents is limited. Each agent must determine its actions using only local information to accomplish a global task. Some works have explored the segregation behavior with varying degrees of decentralization [3638]. For example, the authors in Edson Filho and Pimenta [36] proposed the use of abstractions and artificial potential functions to segregate the groups, but the proposed method was not completely decentralized. Also, the work in Ref. [38] proposed a distributed mechanism that combines flocking behaviors, hierarchical abstractions, and collision avoidance based on the concept called virtual group velocity obstacle (VGVO). However, the robots start in an already segregated state, so the paper focuses on navigation while maintaining segregation. The main limitations of these previous approaches are that some require global information, analytical solutions, high computational resources, and careful tuning of control parameters. Additionally, most of the work focuses on the segregation problem in 2D spaces. The aggregation problem and segregation in 3D spaces have not been fully explored in literature. Moreover, while much of the work in literature has utilized analytical and control-theoretic methods, data-driven and learning-based controls have not been explored for the problems of segregation and aggregation.

As highlighted in the literature, deriving distributed multi-agent controllers has proven to be a complex task [39]. As a result, the challenge of finding these controllers motivates the adoption of a deep-learning approach. A decentralized controller-trained robust behavioral policies in an imitation learning framework is proposed. These policies learn to imitate a centralized controller's behavior but use only local information to make decisions. Moreover, dimensional growth issues arise as the number of robots increases. Both challenges are effectively addressed by harnessing the capabilities of aggregation graph neural networks (Aggregation GNNs). Aggregation GNNs are particularly well-suited for distributed systems control due to their inherently local structure [40].

Graph neural networks (GNNs) are neural networks designed to work with data structured in the form of graphs. They act as function approximators that can explicitly model the interaction among the entities within the graph. Graph Neural Networks function in a fully localized manner, with communication occurring solely between nearby neighbors; thus, they are well suited for developing decentralized controllers for robot swarms. They are invariant to changes in the order or labeling of the agents within the team. This is particularly important for decentralized systems. GNNs can also adapt to systems beyond the ones they were initially trained on, making them scalable to larger or smaller sets of robots. Graph Neural Networks are promising architectures for parameterization in imitation learning [41,42] and RL algorithms [43,44]. In Ref. [45], aggregation GNNs with Multi-Hop communication coupled with imitation learning for flocking, 1-leader flocking, and flocking of quadrotors in Airsim experiments was proposed. In Ref. [46], Graph Filters, Graph CNN, and Graph RNN were used in imitation learning for flocking and 2D grid path planning. In Ref. [42], the effectiveness of graph CNN coupled with Policy gradient learning compared with Vanilla Graph Policy Gradient and PPO with a fully connected network for formation flying experiments was shown. In Ref. [47], linear and nonlinear graph CNN coupled with imitation learning and PPO for coverage and exploration experiments were proposed. Blumenkamp et al. [48] presented the results showing the application of GNN policies to five robots in ROS2 for navigation through a passage in real environments. We build on these methods for a different class of swarming behaviors—segregation and aggregation which is challenging because of the instability that can occur with inaccurate grouping.

3 Main Contributions

As seen from the literature review in Sec. 2.2, most approaches in literature utilize mathematical equations as decentralized control laws that need to be analytically derived to obtain a global behavior, such as aggregation and segregation behaviors, which this paper focuses on. Obtaining such individual control laws for robots to obtain a desired global behavior is an inverse and hard problem and requires trials of many potential control laws. The Neural Network-based techniques proposed in this paper provide a data-driven approach that overcomes the challenge of deriving mathematical control laws. As noted in Sec. 2.2, for the segregation and aggregation behaviors studied in this paper, there is no data-driven and learning-based controller available in literature.

Hence, in contrast to previous approaches to segegrative and aggregative behavior in robot swarms, we present an approach to demonstrate these behaviors using decentralized learning-based control. This approach aims to design local controllers guiding a diverse group of robots to exhibit both Segregation behavior (forming distinct groups) and Aggregation behavior (forming homogeneous mixtures). The approach utilizes Graph Neural Networks to parameterize the controller and trains them using imitation learning algorithm. The proposed method was first presented in our earlier work [49], where the technique was applied to the segregation and aggregation problem for only 2D point mass swarms for up to 50 robots. This paper extends our prior work by: (i) improving the learned controller by including more training features that include robot velocity and a distance parameter, (ii) scaling the 2D point mass simulation experiments to 100 robots and 10 groups, (iii) applying the problem in 3D for point mass systems, (iv) extending application from point mass dynamics to nonholonomic systems, (v) extending the prior work to 3D Crazyflie quadrotor systems in Pybullet simulation environment, and (vi) Implementing the controller on Turtlebot3 Burger both in simulations (Gazebo/ROS2) and in real-world experiments. To the best of our knowledge, this is the first research that employs GNNs with multihop communication, trained through imitation learning, to address segregation and aggregation tasks for both 2D and 3D holonomic point mass robots and actual robots systems—nonholonomic autonomous ground robots and autonomous aerial robots.

The primary contributions of this paper are:

  • We combine Aggregation Graph Neural Network for time-varying systems, trained using imitation learning for segregating and aggregating heterogeneous robotic swarms in 2D and 3D Euclidean space. This work achieves comparable performance to that of the expert (centralized) controller performance by aggregating information from remote neighbors and outperforms a local controller.

  • We illustrated the scalability and generalization of the model by training it on a small teams and groups for segregation and testing its performance by progressively increasing the team size and groups, reaching up to 100 robots and 10 groups. The proposed model can also generalize to the aggregation problem without further training.

  • A transfer framework that transitioned from point mass systems to real robot systems within physics-based simulations was developed. The decentralized controller was implemented on mobile robots in Gazebo/ROS2 and flying robot swarms in Pybullet.

  • Zero-shot transfer of the learned policies to real-world systems—Turtlebot3 robot swarms showing the efficacy of the policies.

The rest of the paper is structured as follows: Sec. 4 presents the segregation and aggregation problem, along with classical centralized control for point mass systems. Section 5 describes the optimal decentralized control paradigm and the proposed controller using GNN and Imitation Learning. Section 6 gives details of the actual mobile and flying robots kinematics used for the experiments. Experimental results and discussion are detailed in Sec. 7 for the holonomic point mass systems and Sec. 8 for swarms of mobile(Turtlebot3) and flying robots(Crazyflie 2), including the hardware experiments with the Turtlebots. Section 9 presents the conclusions and future directions.

4 Problem Formulation

This section describes the equations that govern robot swarms' segregative and aggregative behavior shown in Fig. 1. In addition, we defined the expert controller used in generating the data for imitation learning in both 2D Euclidean space and 3D Euclidean space.

Fig. 1
Description of the Segregation and Aggregation controller
Fig. 1
Description of the Segregation and Aggregation controller
Close modal

4.1 Point Mass Kinematics Model.

We consider a team of N fully actuated holonomic agents V = {1, ,N} navigating within a 2D or 3D Euclidean environment. Each agent is defined by its position ri(t)2 or 3, its velocity vi(t)2 or 3 and its acceleration ui(t)2 or 3 for time steps t=0,1,2,3,, where the discrete time-index t denotes the sequential time instances occurring at the sampling time Ts. It is assumed that the acceleration remains constant during the time interval [tTs,(t+1)Ts]. The system's dynamics is expressed as follows:
(1)
(2)

for i = {1, ,N}.

4.2 Decentralized Segregation and Aggregation.

In the segregation and aggregation problem, we assign each robot to a group Nk, k={1, ,W} where W is the number of the groups. Therefore, the heterogeneous robot swarm is composed of robots within the set of these partitions {N1,N2,,NW},NWN. Robots belonging to the identical group are classified as the same type. The neighbors of each robot can be either from robots of the same type or of a different type.

For segregation, our objective is to develop a controller capable of sorting diverse types of robots into M distinct groups. This aims to create groups that exclusively consist of agents of the same type. The team is considered segregated when the average distance between agents of the same type is less than that between agents of different types, as defined by Kumar et al. [24]. The controller that solves this problem exhibits the segregative behavior. On the other hand, when the average distance between agents of the same type is greater than between agents of the different types, the team is said to aggregate. This is referred to as the aggregation problem. The aim is to learn a controller that ensures that the swarm forms a homogeneous mixture of robots of different types while flocking together.

4.3 Classical Centralized Control.

Segregative and Aggregative behavior can be achieved using a centralized controller defined in Ref. [31]
(3)

where ui*(t) is agent i control input. riUij(|rirj|) represents the gradient of the artificial potential function governing the interaction between agents i and j. This gradient is taken with respect to the position vector ri and is evaluated at the positions ri(t) and rj(t) at time t. The second term in Eq. (3) accounts for damping, encouraging robots to synchronize their velocities with one another, as described by Santos et al. [31].

The artificial potential Uij(rirj):>0 is a positive function of the relative distance between a pair of agents [7] is given by
(4)
and its gradient is given by
(5)
where α represents the scalar controller gain, dij is the segregation or aggregation parameter and rij denotes rirj. Segregation or aggregation can be achieved based on the local groups Nk.
(6)
Equation (6) shows that dAA and dAB controls the interactions between robots of the same and different types, respectively. Hence, the swarm demonstrates segregative behavior when
(7)
The system exhibits aggregative behavior when
(8)

The evaluation metrics are described in the Appendix.

5 Decentralized Control

Equation (3) represents a centralized/global controller that requires access to the positions and velocities of all agents. However, acquiring such global information is often challenging in practical situations. Agents usually have access only to local information. This limitation is primarily attributed to the agents' sensing range. Each agent can only communicate with other agents within its sensing range or communication radius R; that is, |rij|R. The aim of this paper is to design a decentralized controller that relies solely on local information. Figure 2 describes the flow of the decentralized controller.

Fig. 2
Agents network trained using delayed information from multihop neighbors. The graph neural network takes the local information as an input and output the action of each agent. The agents act in both simulated and real environments, and the observed behaviors are depicted on the right for varying numbers of robots and groups. (a) Segregation and (b) Aggregation.
Fig. 2
Agents network trained using delayed information from multihop neighbors. The graph neural network takes the local information as an input and output the action of each agent. The agents act in both simulated and real environments, and the observed behaviors are depicted on the right for varying numbers of robots and groups. (a) Segregation and (b) Aggregation.
Close modal

5.1 Communication Graph.

Here, we describe the agents' communication network. At time t, agents i and j can establish communication if rijR, where R denotes the communication radius of the agents. As a result, we construct a communication network graph G={V,E(t)}, where V represents the set of agents, and E(t) is the set of edges, defined such that (i,j)E(t) if and only if rijR. Consequently, j can transmit data to i at time t, making j a neighbor of i. We denote Ni(t)={jV:(j,i)E(t)} as the set of all agents that agent i can communicate with at time t.

5.2 Local Controller.

A viable approach to achieving decentralized control entails formulating a controller that adheres to local communication and sensing constraints. Following the design of the centralized controller defined in Sec. 4.3, the local controller is described thus
(9)

The local controller in Eq. (9) involves a summation over only the neighbors of agent i, i.e., all agents jNi. This is different from the centralized controller that sums over all the agents in the team. While the centralized and local controllers have identical stationary points, Eq. (9) typically requires more time for segregation given that the graph remains connected [42]. The following sections introduce a novel learning-based approach to the segregation and aggregation problem. This method relies on an imitation learning algorithm known as Dataset Aggregation (DAgger) and utilizes an Aggregation Graph Neural Network to parameterize the agents' policy. This approach imitates the centralized controller in Eq. (3). We will demonstrate that the GNN-based controller performs similarly to the centralized controller and surpasses the performance of the local controller in Eq. (9).

5.3 Delayed Aggregation Graph Neural Network.

In the context of graph theory and signal processing on graphs, a graph signal, denoted by ×:V, is a function that assigns a scalar value to each node in a graph. Each node is represented by a feature vector xi(t)F, where iV = {1, ,N} and N is the number of nodes in the graph. In our application, each agent is a node. Hence, the set of all agent states is denoted by X(t)N×F, where each agent is described by an F-dimensional feature vector xi(t)F, i.e., the rows of X(t).

The communication between agents is defined using a graph shift operator (GSO) matrix, S(t)N×N. There are several types of shift operators used in literature, including Laplacian and adjacency matrix. In this paper, we opt for the binary adjacency matrix as the support S(t), which adheres to the sparsity of the graph , i.e., [S(t)]ij=sij(t) is 1 if and only if (j,i)E(t). The linear operation S(t)X(t) serves as an operator that shifts the information within the graph signal X(t) producing another graph signal. The computation of its (i, f)-th entry is expressed as
(10)

Equation (10) indicates that S(t)X(t) functions as a distributed and local operator. This is evident from the fact that each node undergoes updates based solely on local interactions with its neighboring nodes. The reliance on local interactions is a fundamental feature in the development of controllers for decentralized systems, and it is frequently harnessed in graph-based approaches for information processing and control.

Information exchanges between nodes occurs at time t, which is the exchange clock. These exchanges introduces a unit time delay, thus creating a delayed information structure [50].
(11)

where Nik(t) is the set of nodes k-hops away from node i and it is defined recursively as Nik(t)={jNjk1(t1),jNi(t)} with Ni1(t)=Ni(t) and Ni0={i}. We denote X(t)={Xi(t)}i=1,,N as the set of delayed information history Xi(t) of all nodes. This structure shows that the information available to each node at a given time t is past and delayed information from neighbors that are k-hops away.

The challenge in decentralized control learning is to devise a control policy that accommodates the delayed local information structure as outlined in Eq. (11). Consequently, a decentralized controller must effectively handle historical information. It is well-established in the literature that achieving optimal decentralized control is very challenging, even in scenarios involving linear quadratic regulators, which have relatively straightforward centralized solutions [39,50].

In contrast to centralized controllers, the intricacies associated with finding effective decentralized controllers underscore the importance of leveraging learning techniques. This paper hinges on the utilization of graph convolutional neural networks (GCNNs) in conjunction with imitation learning. The choice of GCNNs is justified by their alignment with the local information structure inherent in decentralized control. Imitation Learning is chosen for its relative simplicity in developing decentralized controllers by replicating the behavior observed using a centralized controller.

Formally, we want to find a parameterized policy that maps the decentralized and delayed information history to local action u=π(X(t),Θ) and define a loss function L(π,π*) to determine the difference between the local action and the centralized policy U*(t)=π*(X(t)). This reduces to an optimization problem to find the tensor of network parameter Θ
(12)

5.4 Graph Convolutional Neural Network.

Graph convolutional neural networks (GCNNs) are composed of consecutive layers. Each layer in a GCNN applies graph filters and nonlinear activation functions, enabling the network to learn hierarchical representations of graph-structured data. They are distributed in nature which make them well-suited for parameterizing the decentralized policy π(X(t),Θ). We define a time-varying aggregation sequence Z(t)N×KF using multihop communication and the delayed information structure defined in Eq. (11). This sequence includes the agents' neighborhoods through (K1) repeated data exchanges with their immediate neighbors [51].
(13)
Each N × F block Zk(t) in the sequence is the delayed state information aggregated from k-hop neighbors. We represent zi(t)FK row i of matrix Z(t) as the state at node i, obtained locally through (K1) exchanges with neighbors. An essential characteristic of the aggregation sequence is its regular temporal structure, which consists of nested aggregation neighborhoods. Subsequently, we can apply a standard convolutional neural network (CNN) with a depth of L to zi(t), effectively mapping the local information to an action. Thus each layer l=1,,L is shown below
(14)
(15)
(16)

where σ(l) is an activation function and Θ(l) comprises a set of support filters with learnable parameters. The output of the final layer corresponds to the decentralized control action at node i, at time t.

5.5 Imitation Learning Framework.

Training the neural network for decentralized control requires determining the appropriate values for Θ. In imitation learning, the training process involves acquiring a training set, T=(X(t),U*(t)), representing sample trajectories using the centralized controller. Here, X(t) denotes the time-series observations of the agents, and U*(t) represents the set of actions generated for the agents using the centralized controller. Using supervised learning, the objective is to minimize the loss function over the training set, where u(t) gathers the output ui(t)=ziL(t) from Eq. (14) at each node i
(17)

It is important to emphasize that Θ is uniform across all nodes. Θ is not node or time dependent. As a result, the learned policy is independent of the size and structure of the network which facilitates modularity, scalability to any number of agents and transfer learning.

5.6 State Representation.

To train the network, we need to define the input state vector. Both the centralized and local controllers exhibit nonlinear characteristics in the agents' states. Also, the aggregation graph neural networks (AGNNs) do not inherently allow nonlinear operations before aggregation [42] so only the positions and velocities of the agents are not sufficient to describe the system. Therefore, to represent nonlinearity in the AGNN controller, we extracted important features from the states of the agents that can be used in aggregation. These features are computed locally and depend on the relative distance and velocity between agents. The local controller similarly computes these features to ensure a fair comparison. The input to the GNN is the same for both tasks and is defined thus
(18)
We added a relative distance to a goal position (goal position—rg) given that the robot can see the goal, i.e., it is within its communication/sensing radius, R. This is needed to stabilize the training and ensure the robot segregates or aggregates around a particular location.
(19)

6 Actual Robot Kinematics

The holonomic ideal point mass model aids in testing different scenarios and parameters for the task and provides a benchmark we can build on. However, we are interested in achieving the tasks in real-world robotic systems under the constraints of delays in observations and slower control rates. In this section, we designed a framework to transfer the GNN-based policies for the point mass model to physics-based nonholonomic mobile robots—Turtlebot3 burger platform in ROS2 both in simulation and real experiments and a swarm of quadrotors in a physics-based simulator—Pybullet using the Crazyflie 2 model without further training.

6.1 Decentralized Control for Nonholonomic Robot Swarms.

The point mass system is holonomic. Nonetheless, the GNN controller can be used for a nonholonomic system such as the 2D differential drive model. We follow the feedback linearization approach designed for expressing double integrator dynamics in differential drive robots described in Ref. [52].

6.1.1 Kinematic Modeling and Feedback Linearization Approach.

We consider a team of N differential drive mobile agents navigating within a 2D Euclidean space. For brevity, we omit the robots and groups indexes. Each agent dynamics is defined thus
(20)
where x,y,θ,ω are the x, y position, heading, and heading rate of the robot, respectively. v is the linear (forward) speed of the center of the robot. The point mass actuation are accelerations, therefore we differentiate Eq. (20), resulting into
(21)

where a=v˙ is the linear acceleration of the center of the robot.

Given a point described by the following equation at a distance da from the center of the robot as seen in Fig. 3.
(22)
and differentiating Eq. (22), we have
(23)
and
(24)
where Υ=ω˙ is the angular acceleration. Substituting Eqs. (21) in (24), we have in matrix form
(25)
where
(26)
Fig. 3
Schematic diagram of a differential drive robot showing the parameters in the feedback linearization
Fig. 3
Schematic diagram of a differential drive robot showing the parameters in the feedback linearization
Close modal
Finally
(27)
where uv is the control input for the differential drive model, this comes from the point mass acceleration with the defined parameter da.
(28)

We then integrate Eq. (27) to get the linear velocity, v and angular velocity, ω to pass into the differential drive model in Eq. (20).

6.2 Decentralized Control for Quadrotors.

In this section, we present the transfer of the point mass-trained GNN to a swarm of quadrotors in Pybullet simulation. Figure 4 shows the framework for transferring the trained GNN to control a swarm of quadrotors. The position and velocity of the swarm are passed from the Pybullet environment into the point mass gym environment, where the local features are calculated and sent to the GNN controller to calculate the actions and predict the next state. The current and next state of the quadrotor swarms is then passed into the gym's PID controller, which drives the swarm to the desired state. Then, the current state is passed back into the point mass environment, and this loop continues until the task is achieved. The dynamics and PID control equations are described in Refs. [5355].

Fig. 4
Flowchart describing the overview of the transfer framework for the quadrotor swarms
Fig. 4
Flowchart describing the overview of the transfer framework for the quadrotor swarms
Close modal

7 Point Mass Results and Discussion

We ran a series of experiments to study the performance and scalability of our approach. The experimental results are presented and evaluated using the intersection area of convex hulls metric M(r,N) in Eq. (A1), the number of clusters formed for segregation, and average distances for same and different groups for aggregation (see Appendix for details). We illustrate the scalability of the controller by increasing the swarm size and discuss the performance comparison between traditional controllers (centralized and local) and the GNN controller.

7.1 Experiments.

For the 2D and 3D segregation task, the GNN controller was trained on 21 robots and 3 groups. We tested the learned controller on {(10,2); (20,5); (21,7); (30,5); (50,5); (100,5)} (written in format of {(Robots, Groups)} for 2D and in 3D {(20,5); (21,7); (30,5)}{Robots, Groups} for 40 experiments with random initial locations. Without further training, we transferred the segregative GNN controller to a different swarming behavior—Aggregation. All the velocities were set to zeros at the initial state, and positions were uniformly distributed independently of the robot's group. For training, we set dAA and dAB to 3 and 5, respectively. The communication radius, R and exchanges, K were set to 6 and 3 for the state vector. However, we ran the test scenarios using dAA=5, dAB = 10, R =12, K =3. For all the experiments, the goal was randomized between [1,1] and maximum acceleration was set to 1. We compared the performance of all the controllers—centralized described in Sec. 4.3, local described in Sec. 5.2, and learned described in Sec. 5.3 from the same initial configuration. We collected 400 trajectories, each of 500 steps using the centralized control with α = 3 in Sec. 4.3 for training. The GNN network is structured as a fully connected feed-forward neural network with a single hidden layer comprising 64 neurons and a Tanh activation function. Implemented within the PyTorch framework, we trained the network using an Adam optimizer, an mean squared error (MSE) cost function, and a learning rate of 5×105. To address challenges in behavior cloning for imitation, we implemented the Dataset Aggregation (DAgger) algorithm [56]. The algorithm used a probability 1β of selecting the GNN policy, while the probability β of following the expert policy was set to decay by a factor of 0.993 to a minimum of 0.5 after each trajectory.

7.2 Discussion.

Successful Imitation We studied the effect of the nature of information sharing in the swarms of robots. In Fig. 5, a specific instance of the temporal progression of the segregation task is depicted, using three controllers for 100 robots distributed across 5 groups in 2D. Additionally, Fig. 6 shows the same for 30 robots and 5 groups in 3D. In both scenarios, it is evident that the local controller struggles to segregate the robots. This challenge is particularly notable when the robots begin in a configuration where not all are interconnected, but each has at least one connection. In such cases, distinct clusters form, each containing robots of the same type. This outcome is expected, considering that the robots are not attracted to their respective groups due to a lack of awareness of their existence. In contrast, the learned controller demonstrates the capability to effectively segregate the robots into groups of varying sizes. These results support the rationale behind utilizing a graph neural network framework, as it enables the dissemination of information throughout the network, ultimately facilitating the successful completion of the task.

Fig. 5
Images of the 2D Segregation trajectory at different time intervals. The initial state is shown on the left, while the final state is shown on the right. Distinct colors represent each type of robot. (a) 100 robots and 5 groups–centralized controller, (b) 100 robots and 5 groups–local controller, and (c) 100 robots and 5 groups–GNN controller. (Color version online)
Fig. 5
Images of the 2D Segregation trajectory at different time intervals. The initial state is shown on the left, while the final state is shown on the right. Distinct colors represent each type of robot. (a) 100 robots and 5 groups–centralized controller, (b) 100 robots and 5 groups–local controller, and (c) 100 robots and 5 groups–GNN controller. (Color version online)
Close modal
Fig. 6
Images of the 3D Segregation trajectory at different time intervals. The initial state is shown on the left, while the final state is shown on the right. Distinct colors represent each type of robot. (a) 30 robots and 5 groups–centralized controller, (b) 30 robots and 5 groups–local controller, and (c) 30 robots and 5 groups–GNN Controller. (Color version online)
Fig. 6
Images of the 3D Segregation trajectory at different time intervals. The initial state is shown on the left, while the final state is shown on the right. Distinct colors represent each type of robot. (a) 30 robots and 5 groups–centralized controller, (b) 30 robots and 5 groups–local controller, and (c) 30 robots and 5 groups–GNN Controller. (Color version online)
Close modal

Comparison with Classical Approaches We compared the GNN controller with the traditional controllers. Figures 710 show all the controllers' mean and confidence interval of the segregation tasks metrics for 40 trials for the 2D and 3D experiments. As time progresses in all scenarios with the learned controller, the mean and standard deviation of the area approach zero, while the number of clusters converges to the actual number of groups. The learned controller is able to fully segregate the system. Likewise, experiments with a smaller number of robot groups exhibit faster convergence compared to those with a larger number of groups. However, the local controller becomes trapped in a local minimum and is not able to achieve segregation in the system.

Fig. 7
Two-dimensional space segregation (comparison of the GNN and local controllers)—mean intersection area of convex hulls and number of clusters for 40 experiments
Fig. 7
Two-dimensional space segregation (comparison of the GNN and local controllers)—mean intersection area of convex hulls and number of clusters for 40 experiments
Close modal

We show the comparative results between the centralized controller in Fig. 8 and our controller in Fig. 10, and in all the test cases, both exhibit comparable performance, indicating that our controller is efficient. Also, it can be seen from the experiments that it is easier and faster for the 3D case to segregate than its 2D counterparts. This is because the robots have more degree-of-freedom to maneuver in new directions. The GNN utilizes local information and still achieves performance comparable to the centralized controller, which relies on global information. This highlights its capability for learning and generalization.

Fig. 8
Two-dimensional space segregation (comparison of the GNN and centralized controllers)—mean intersection area of convex hulls and number of clusters for 40 experiments: (a) varying number of robots and groups and (b) fixed 50 robots and varying groups
Fig. 8
Two-dimensional space segregation (comparison of the GNN and centralized controllers)—mean intersection area of convex hulls and number of clusters for 40 experiments: (a) varying number of robots and groups and (b) fixed 50 robots and varying groups
Close modal
Fig. 9
Three-dimensional space segregation (comparison of the GNN and local controllers)—mean intersection area of convex hulls and number of clusters for 40 experiments
Fig. 9
Three-dimensional space segregation (comparison of the GNN and local controllers)—mean intersection area of convex hulls and number of clusters for 40 experiments
Close modal
Fig. 10
Three-dimensional space segregation (comparison of the GNN and centralized controllers)—mean intersection area of convex hulls and number of clusters for 40 experiments
Fig. 10
Three-dimensional space segregation (comparison of the GNN and centralized controllers)—mean intersection area of convex hulls and number of clusters for 40 experiments
Close modal

Scalability Study We examined a scenario where the number of robots was fixed at 50, and we varied the number of groups. Figure 8 shows that as the number of groups increases, the agents take a longer time to segregate. More specifically, when the number of groups was 10, it took more than 3500 steps to segregate, while for groups 2 and 5, it took less than that. All the experiments show that the policies, though trained on 21 robots, can scale up to 100 robots in 2D and handle groups it has not seen before. Similarly, we see the same behavior in the 3D case.

Generalization to Aggregation Task Another interesting aspect of our GNN controller is that it can generate other behaviors besides segregation. We extended the trained GNN for segregation to the aggregation tasks by changing the dAA = 10 and dAB = 5. Without any further training, our segregation controller was able to aggregate the swarm with just the change of parameters. This particularly proves the theory of intercellular adhesivity. A particular instance of the time series progression of the aggregative behavior of the system for the centralized and GNN controller for the 100 robots and 5 groups for the 2D and the 30 robots and 5 groups for the 3D case in Figs. 11 and 12, respectively.

Fig. 11
Images of the 2D Aggregation trajectory at different time intervals. The initial state is shown on the left, while the final state is shown on the right. Distinct colors represent each type of robot. (a) 100 robots and 5 groups–centralized controller and (b) 100 robots and 5 groups–GNN controller. (Color version online)
Fig. 11
Images of the 2D Aggregation trajectory at different time intervals. The initial state is shown on the left, while the final state is shown on the right. Distinct colors represent each type of robot. (a) 100 robots and 5 groups–centralized controller and (b) 100 robots and 5 groups–GNN controller. (Color version online)
Close modal
Fig. 12
Images of the 3D Aggregation trajectory at different time intervals. The initial state is shown on the left, while the final state is shown on the right. Distinct colors represent each type of robot. (a) 30 robots and 5 groups–centralized controller and (b) 30 robots and 5 groups–GNN controller. (Color version online)
Fig. 12
Images of the 3D Aggregation trajectory at different time intervals. The initial state is shown on the left, while the final state is shown on the right. Distinct colors represent each type of robot. (a) 30 robots and 5 groups–centralized controller and (b) 30 robots and 5 groups–GNN controller. (Color version online)
Close modal

Figure 13 shows the plot of the average distances between agents of the same type and agents of different types. Although the trajectory evolution does not look the same for the centralized and GNN controllers, the average distances plot shows that the swarm is aggregated. For example, in the case of 100 robots and 5 groups, the average distance ravgAA and ravgAB (see Appendix  A2 for the definition of these quantities) for the 2D case were found to be 4.29 and 4.13, respectively. For the 3D case in the example of 30 robots and 5 groups, the average distance ravgAA and ravgAB were found to be 5.81 and 5.11, respectively. Both cases clearly show that the swarm was aggregated based on the condition in Sec. A2. The difference in the final trajectory shape is dependent on the communication radius.

Fig. 13
Average Distances Between Robots of the same type (ravgAA) and robots of different types (ravgAB) with diverse number of robots and groups in 2D and 3D space for aggregation behavior. 40 trials were performed.
Fig. 13
Average Distances Between Robots of the same type (ravgAA) and robots of different types (ravgAB) with diverse number of robots and groups in 2D and 3D space for aggregation behavior. 40 trials were performed.
Close modal

Effect of Communication Radius We analyze the effect of the communication radius, R, on the segregation and aggregation behavior by varying the sensing radius of each robot. We chose the following values: R = {4,6,8,10,12}m. We fixed the number of robots and groups to 20 robots and 5 groups and kept other parameters constant with dAA = 5 and dAB = 10. From Fig. 14, even in the case of limited communication, the GNN learns efficient control strategies that enables the swarm to segregate up to the communication radius of R=8m. This shows the GNN controller's ability to transmit information to the network even in limited communication. However, as the communication radius reduces to R=6m or below, the network struggles to fully segregate (shown by larger number of clusters than the number of groups and larger mean intersection area). This is because that the system is expected to segregate at a separation distance dAA=5m between agents of same type. Hence, if the communication radius is 6 m or less, then there is a good chance of having fewer, or even no, agents in the communication radius. As a result, from the figure, it can be seen that the system started from about 10 clusters and converged only until about 7 clusters for R=6m. Indeed, the ability to segregate completely depends on the parameters such as number of hops, R, dAA and dAB.

Fig. 14
Segregation in 2D Space (effect of communication radius on the GNN controller)—mean intersection area of convex hulls and number of clusters for 40 experiments
Fig. 14
Segregation in 2D Space (effect of communication radius on the GNN controller)—mean intersection area of convex hulls and number of clusters for 40 experiments
Close modal

As seen in Table 1 for the aggregation behavior, we started at a segregated state with an ravgAA=2.17 which is lower than ravgAB=9.44. The goal for aggregation is to ensure that ravgAA is greater than ravgAB. With limited communication, our GNN controller can complete the task showing its performance in limited communication. It may be noted that the system achieves the aggregation successfully even with low communication radii (as compared to segregation behavior which fails when radius becomes lower than 6 m). This is because the robots move closer to each other first thereby reducing their separation distance and increasing their level of communication. It is also noted that at communication radius of R=8m or higher, the system converges to the same average distances.

Table 1

Effect of the communication radius on the 2D GNN controller—comparison of the initial and final average distance of agents of the same and different types for the aggregation behavior for 40 experiments

Effect of the communication radius on the 2D GNN aggregation controller
ravgAA (initial)ravgAB (initial)
Values2.179.44
Communication radius, R (m)ravgAA (final)ravgAB (final)
44.694.05
64.673.93
84.333.63
104.333.63
124.333.63
Effect of the communication radius on the 2D GNN aggregation controller
ravgAA (initial)ravgAB (initial)
Values2.179.44
Communication radius, R (m)ravgAA (final)ravgAB (final)
44.694.05
64.673.93
84.333.63
104.333.63
124.333.63

8 Actual Robots Kinematics Results

This section presents the simulation and hardware experiments for the Turtlebot3 and Crazyflie2 robots. Table 2 lists the parameters used for the experiments.

Table 2

Simulation parameters

ParametersGazebo environmentPybullet environment
dAA5.03.0
dAB10.05.0
da0.1
R12.06.0
ParametersGazebo environmentPybullet environment
dAA5.03.0
dAB10.05.0
da0.1
R12.06.0

8.1 Gazebo Simulations With ROS2.

To evaluate the feasibility of transferring the point mass GNN to a physics-based differential drive robot, we tested the trained policy with Turtlebot3 burger robots—10, 20, 50 robots and 2, 5, 5 groups, respectively. As seen in Fig. 15, we designed an OpenAI gym environment—gym_gazeboswarm which carries out all the communication with Gazebo using ROS2 and interfaces with the GNN-policy to control the swarm. We created a multi-Turtlebot3 Gazebo environment using ROS2. Each robot has its node. The gym environment receives the position of each robot, and the local features are calculated and sent to the GNN to obtain acceleration commands which are then used to obtain the actions—v,ω of each robot using the method described in Sec. 5.1.1. These actions are published on each robot cmd_vel topic to drive the swarm.

Fig. 15
Flowchart describing the overview of the interface communications for simulated experiments using Turtlebot3 burger robot swarm in Gazebo environment
Fig. 15
Flowchart describing the overview of the interface communications for simulated experiments using Turtlebot3 burger robot swarm in Gazebo environment
Close modal

8.2 Pybullet Physics-Based Simulation Experiments.

We used an OpenAI Gym environment based on PyBullet2 introduced by Panerati et al. [57] to evaluate the performance of our GNN controller in a realistic 3D environment. The environment is parallelizable. It can be run with a GUI or in a headless mode, with or without a GPU. We chose this simulator because it supported realistic aerodynamic effects such as drag, ground effect, downwash, and control algorithms suite. As a result, it gives us a testbed close to the real-world system to test our algorithm. The dynamics of the quadrotors are modeled based on the Crazyflie 2 quadrotor3.

We aim to use the GNN controller to control the quadrotor swarms in the simulator. The trained GNN is robust to any value of dAA and dAB. Hence, we adapt these parameters to suit the Pybullet simulator. We initialized the swarm with varying yaw values, and roll and pitch were set to 0. We consider two types of experiments—fixed height simulation using the trained 2D point mass GNN and varying height simulation using the trained 3D point mass GNN.

8.3 Fixed Height Results.

The fixed height simulations come from the fact that the 2D point mass model does not have a z-component. Hence, we set z to 1 m and ran the 2D GNN to predict the next states in x- and y- directions.

8.4 Varying Height Results.

Here, we implemented the 3D-trained GNN to the Crazyflie simulation. Using the predicted next state from the point mass 3D gym environment for segregation and aggregation task, we were able to achieve the same results as in the point mass model case for a swarm of Crazyflie quadrotors.

8.5 Real Robots.

We demonstrated a zero-shot transfer of the policies learned in the Gazebo simulation to the real Turtlebot3 burger robots—8 robots and 2 groups; 9 robots and 3 groups dAA = 2 and dAB = 4. We use a Qualisys Motion Capture System with 18 cameras that provide position updates at 100 Hz. With a multi-agent extended Kalman filter implemented in ROS to reduce noise, we obtained the position and velocity of each agent at 100 Hz. These updates are then used in the GNN controller environment to calculate the control for each robot, as seen in Fig. 16.

Fig. 16
Flowchart describing the hardware experiments data communication
Fig. 16
Flowchart describing the hardware experiments data communication
Close modal

8.6 Discussion.

We reported the mean intersection area and number of clusters metrics for the Turtlebot3 and Crazyflie 2 (fixed and varying heights) experiments.

• Successful Imitation and Transfer: Figures 17, 18, and 19 shows a particular instance of the Turtlebot3 and Crazyflie 2 time series evolution of the swarm trajectory for the segregation and aggregation tasks , respectively. For the Turtlebot3 burger robots, we also report the Mean intersection area, number of clusters, and average distances between agents in the same and different groups results in Fig. 20. For the Crazyflie robots, we reported the mean intersection area and the number of cluster metrics for the fixed and varying height segregation experiments in Figs. 21 and 22, respectively.

Fig. 17
Snapshots of the Turtlebot3 robots trajectory in the Gazebo simulator using the GNN controller. Top—50 robots and 5 groups (segregation behavior). Bottom—20 robots and 5 groups (aggregation behavior). (a) Initial configuration, (b) robots moving to form a cluster (top) and robots moving to form a aggregate (bottom), and (c) final state.
Fig. 17
Snapshots of the Turtlebot3 robots trajectory in the Gazebo simulator using the GNN controller. Top—50 robots and 5 groups (segregation behavior). Bottom—20 robots and 5 groups (aggregation behavior). (a) Initial configuration, (b) robots moving to form a cluster (top) and robots moving to form a aggregate (bottom), and (c) final state.
Close modal
Fig. 18
Snapshots of 50 Crazyflie quadrotors and five groups trajectory using the 2D GNN controller in Pybullet (fixed height). Top—segregation behavior. Bottom—aggregation behavior. (a) Initial configuration, (b) robots moving to form a cluster (top) and robots moving to form an aggregate (bottom), and (c) final state.
Fig. 18
Snapshots of 50 Crazyflie quadrotors and five groups trajectory using the 2D GNN controller in Pybullet (fixed height). Top—segregation behavior. Bottom—aggregation behavior. (a) Initial configuration, (b) robots moving to form a cluster (top) and robots moving to form an aggregate (bottom), and (c) final state.
Close modal
Fig. 19
Snapshots of the Crazyflie quadrotors trajectory using the 3D GNN controller in Pybullet (varying height) for the segregation and aggregation behavior. Top—30 robots and 5 groups (segregation behavior). Bottom—20 robots and 5 groups (aggregation behavior). (a) Initial configuration, (b) Robots moving to form a cluster (top) and robots moving to form an aggregate (bottom), and (c)final state.
Fig. 19
Snapshots of the Crazyflie quadrotors trajectory using the 3D GNN controller in Pybullet (varying height) for the segregation and aggregation behavior. Top—30 robots and 5 groups (segregation behavior). Bottom—20 robots and 5 groups (aggregation behavior). (a) Initial configuration, (b) Robots moving to form a cluster (top) and robots moving to form an aggregate (bottom), and (c)final state.
Close modal
Fig. 20
(a) and (b) Mean Intersection area and number of clusters for different numbers of Turtlebot3 robots and groups for segregation behavior, (c) average distances between Turtlebot3 robots of the same type (ravgAA) and robots of different types (ravgAB) for aggregation behavior
Fig. 20
(a) and (b) Mean Intersection area and number of clusters for different numbers of Turtlebot3 robots and groups for segregation behavior, (c) average distances between Turtlebot3 robots of the same type (ravgAA) and robots of different types (ravgAB) for aggregation behavior
Close modal
Fig. 21
(a) and (b) Mean intersection area and number of clusters with varying number of Crazyflie quadrotors and groups for aggregation behavior. (c) Average distances between robots of the same type (ravgAA) and robots of different types (ravgBB) with varying number of Crazyflie quadrotors and groups for aggregation behavior—fixed height.
Fig. 21
(a) and (b) Mean intersection area and number of clusters with varying number of Crazyflie quadrotors and groups for aggregation behavior. (c) Average distances between robots of the same type (ravgAA) and robots of different types (ravgBB) with varying number of Crazyflie quadrotors and groups for aggregation behavior—fixed height.
Close modal
Fig. 22
(a) and (b) Mean Intersection area and number of clusters with varying number of Crazyflie quadrotors and groups for segregation behavior. (c) Average distances between robots of the same type (ravgAA) and robots of different types (ravgBB) with varying number of Crazyflie quadrotors and groups for aggregation behavior—varying height.
Fig. 22
(a) and (b) Mean Intersection area and number of clusters with varying number of Crazyflie quadrotors and groups for segregation behavior. (c) Average distances between robots of the same type (ravgAA) and robots of different types (ravgBB) with varying number of Crazyflie quadrotors and groups for aggregation behavior—varying height.
Close modal

The results show our GNN controller can successfully transfer to a physics-based quadrotor swarm with different number of robots and groups, communication radius, dAA, and dAB.

• Zero-Shot Transfer to Hardware: Figures 23 and 24 show the initial and final configuration of the robots trajectory and the metrics for segregation and aggregation tasks. Even with the presence of noise and uneven terrains, the robots were still able to perform the tasks with the GNN controller in a decentralized fashion. This shows the efficacy of our controller to transfer successfully into real-world applications.

Fig. 23
Snapshots of the real Turtlebot3 robot trajectory. (1) Segregation: (1a) initial configuration and (1b) final configuration. (2)Aggregation: (2a) initial configuration and (2b) final configuration.
Fig. 23
Snapshots of the real Turtlebot3 robot trajectory. (1) Segregation: (1a) initial configuration and (1b) final configuration. (2)Aggregation: (2a) initial configuration and (2b) final configuration.
Close modal
Fig. 24
(a) and (b) Mean intersection area and number of clusters for segregation behavior. (c) Average distances between turtlebot3 robots of the same type (ravgAA) and robots of different types (ravgBB) for aggregation behavior.
Fig. 24
(a) and (b) Mean intersection area and number of clusters for segregation behavior. (c) Average distances between turtlebot3 robots of the same type (ravgAA) and robots of different types (ravgBB) for aggregation behavior.
Close modal

9 Conclusions

Controlling large-scale dynamical systems in distributed settings poses a significant challenge in the search for optimal and efficient decentralized controllers. This paper uses learned heuristics to address this challenge for agents in 2D and 3D Euclidean space. Our approach involves the design of these controllers parameterized by Aggregation Graph Neural Networks, incorporating information from remote neighbors within an imitation learning framework. These controllers learn to imitate the behavior of an efficient artificial differential potential-based centralized controller, utilizing only local information to make decisions.

Our results demonstrate that large-scale point mass systems, mobile robots, and quadrotors can perform segregation tasks across initial configurations where the swarm is not fully connected, with varied limited communication radius and separation distances. Our policies trained with 21 robots using a point mass model generalizes to larger swarms up to 100 robots and to the aggregation task without further training. Through varied experiments, we illustrated the controller's capability to be deployed in larger swarms.

Furthermore, we compared our controller with the centralized controller and a local controller that only utilized information from its immediate neighbors. The results showed that the system did not converge to a segregated state; instead, multiple clusters of robots of the same type persisted. Thus, we resolved the issue by implementing our controller, demonstrating its superior efficacy over the local controller and comparable performance to the centralized controller. This affirms the significance of multihop information in enhancing overall performance. Therefore, the GNN-based controller is more suitable for distributed systems than the centralized controller, given its scalability, which is vital in practical scenarios where only local information is accessible.

In addition, we showed that the GNN-based policies trained for the holonomic point mass model can be transferred to physics-based robotics swarms in 2D with nonholonomic constraints and in 3D—quadrotors. We present the results demonstrating successful swarm coordination and control in simulation (Gazebo/ROS2 simulation for Turtlebot3 robots and Pybullet simulation for Crazyflie quadrotors) and demonstrate the zero-shot transfer of the GNN policies to real Turtlebot3 robots. Potential future works to this approach will be to implement on the Crazyflie hardware platform, environments with static and dynamic obstacles [58], explore other methods like deep reinforcement learning, and extend it to other swarm behaviors.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Appendix: Evaluation Metrics

A1 Segregation Tasks.

In order to evaluate segregation in the swarm, we employed the metrics proposed in Ref. [31]—the pairwise intersection area of the swarm positions' convex hulls M(r,N) and number of clusters metric. Segregation happens when M(r,N) approaches zero, signifying the absence of overlap among clusters.

The pairwise intersection area metric is defined below:
(A1)

where CH(Q) and A(Q) represent the convex hull and the area of the set Q, respectively.

A2 Aggregation Tasks.

We employed the average distances between agents of the same types (ravgAA) and average distances between agents of different types (ravgAB) to evaluate aggregation. The system is said to aggregate when the ravgAA is greater than ravgAB. This is an intergroup (same groups) and intragroup (different groups) distance.

Given that NA represents the number of unique pairs of robots in the same groups and NB represents the number of unique pairs of robots in different groups.

We define ravgAA thus
(A2)
We define ravgAB thus
(A3)

Footnotes

References

1.
Yliniemi
,
L.
,
Agogino
,
A. K.
, and
Tumer
,
K.
,
2014
, “
Multirobot Coordination for Space Exploration
,”
AI Mag.
,
35
(
4
), pp.
61
74
.10.1609/aimag.v35i4.2556
2.
Omotuyi
,
O.
,
Pokhrel
,
S.
, and
Sharma
,
R.
,
2021
, “
Distributed Quadrotor Uav Tracking Using a Team of Unmanned Ground Vehicles
,”
AIAA
Paper No. 2021–0266.10.2514/6.2021-0266
3.
Ronzoni
,
M.
,
Accorsi
,
R.
,
Botti
,
L.
, and
Manzini
,
R.
,
2021
, “
A Support-Design Framework for Cooperative Robots Systems in Labor-Intensive Manufacturing Processes
,”
J. Manuf. Syst.
,
61
, pp.
646
657
.10.1016/j.jmsy.2021.10.008
4.
Queralta
,
J. P.
,
Taipalmaa
,
J.
,
Pullinen
,
B. C.
,
Sarker
,
V. K.
,
Gia
,
T. N.
,
Tenhunen
,
H.
,
Gabbouj
,
M.
,
Raitoharju
,
J.
, and
Westerlund
,
T.
,
2020
, “
Collaborative Multi-Robot Search and Rescue: Planning, Coordination, Perception, and Active Vision
,”
IEEE Access
,
8
, pp.
191617
191643
.10.1109/ACCESS.2020.3030190
5.
Chen
,
B.
, and
Cheng
,
H. H.
,
2010
, “
A Review of the Applications of Agent Technology in Traffic and Transportation Systems
,”
IEEE Trans. Intell. Transp. Syst.
,
11
(
2
), pp.
485
497
.10.1109/TITS.2010.2048313
6.
Ferreira Filho
,
E. B.
, and
Pimenta
,
L. C.
,
2020
, “
Segregation of Heterogeneous Swarms of Robots in Curves
,” IEEE International Conference on Robotics and Automation (
ICRA
), Paris, France, May 31–Aug. 31, pp.
7173
7179
.10.1109/ICRA40945.2020.9196851
7.
Leonard
,
N. E.
, and
Fiorelli
,
E.
,
2001
, “
Virtual Leaders, Artificial Potentials and Coordinated Control of Groups
,”
Proceedings of the 40th IEEE Conference on Decision and Control
(
Cat. No. 01CH37228
), Orlando, FL, Dec. 4–7, pp.
2968
2973
.10.1109/CDC.2001.980728
8.
Balch
,
T.
, and
Arkin
,
R. C.
,
1998
, “
Behavior-Based Formation Control for Multirobot Teams
,”
IEEE Trans. Robot. Automation
,
14
(
6
), pp.
926
939
.10.1109/70.736776
9.
Belta
,
C.
, and
Kumar
,
V.
,
2004
, “
Abstraction and Control for Groups of Robots
,”
IEEE Trans. Robot.
,
20
(
5
), pp.
865
875
.10.1109/TRO.2004.829498
10.
Li
,
L.
,
Martinoli
,
A.
, and
Abu-Mostafa
,
Y.
,
2003
, “
Diversity and Specialization in Collaborative Swarm Systems
,”
Proceedings of the Second International Workshop on Mathematics and Algorithms of Social Insects
, Atlanta, GA, Dec, pp.
91
98
.https://home.work.caltech.edu/pub/Li2003measure.pdf
11.
Edwards
,
V.
,
Rezeck
,
P.
,
Chaimowicz
,
L.
, and
Hsieh
,
M. A.
,
2016
, “
Segregation of Heterogeneous Robotics Swarms Via Convex Optimization
,”
ASME
Paper No. DSCC2016-9653.10.1115/DSCC2016-9653
12.
Reynolds
,
C. W.
,
1987
, “
Flocks, Herds and Schools: A Distributed Behavioral Model
,”
ACM SIGGRAPH Computer Graphics
, 21(4), pp.
25
34
.10.1145/37402.37406
13.
Olfati-Saber
,
R.
,
2006
, “
Flocking for Multi-Agent Dynamic Systems: Algorithms and Theory
,”
IEEE Trans. Automatic Control
,
51
(
3
), pp.
401
420
.10.1109/TAC.2005.864190
14.
Pimenta
,
L. C.
,
Pereira
,
G. A.
,
Michael
,
N.
,
Mesquita
,
R. C.
,
Bosque
,
M. M.
,
Chaimowicz
,
L.
, and
Kumar
,
V.
,
2013
, “
Swarm Coordination Based on Smoothed Particle Hydrodynamics Technique
,”
IEEE Trans. Rob.
,
29
(
2
), pp.
383
399
.10.1109/TRO.2012.2234294
15.
Batlle
,
E.
, and
Wilkinson
,
D. G.
,
2012
, “
Molecular Mechanisms of Cell Segregation and Boundary Formation in Development and Tumorigenesis
,”
Cold Spring Harb. Perspect. Biol.
,
4
(
1
), p.
a008227
.10.1101/cshperspect.a008227
16.
Ame
,
J.-M.
,
Rivault
,
C.
, and
Deneubourg
,
J.-L.
,
2004
, “
Cockroach Aggregation Based on Strain Odour Recognition
,”
Anim. Behav.
,
68
(
4
), pp.
793
801
.10.1016/j.anbehav.2004.01.009
17.
Lesh-Laurie
,
G. E.
,
1974
, “
Tentacle Morphogenesis in Hydra: A Morphological and Biochemical Analysis of the Effect of Actinomycin d
,”
Am. Zoologist
,
14
(
2
), pp.
591
602
.10.1093/icb/14.2.591
18.
Camazine
,
S.
,
Deneubourg
,
J.-L.
,
Franks
,
N. R.
,
Sneyd
,
J.
,
Theraula
,
G.
, and
Bonabeau
,
E.
,
2020
, “
Self-Organization in Biological Systems
,”
Self-Organization in Biological Systems
,
Princeton University Press
, Princeton, NJ.
19.
Parrish
,
J. K.
, and
Edelstein-Keshet
,
L.
,
1999
, “
Complexity, Pattern, and Evolutionary Trade-Offs in Animal Aggregation
,”
Science
,
284
(
5411
), pp.
99
101
.10.1126/science.284.5411.99
20.
Jeanson
,
R.
,
Rivault
,
C.
,
Deneubourg
,
J.-L.
,
Blanco
,
S.
,
Fournier
,
R.
,
Jost
,
C.
, and
Theraulaz
,
G.
,
2005
, “
Self-Organized Aggregation in Cockroaches
,”
Anim. Behav.
,
69
(
1
), pp.
169
180
.10.1016/j.anbehav.2004.02.009
21.
Garnier
,
S.
,
Jost
,
C.
,
Gautrais
,
J.
,
Asadpour
,
M.
,
Caprari
,
G.
,
Jeanson
,
R.
,
Grimal
,
A.
, and
Theraulaz
,
G.
,
2008
, “
The Embodiment of Cockroach Aggregation Behavior in a Group of Micro-Robots
,”
Artif. Life
,
14
(
4
), pp.
387
408
.10.1162/artl.2008.14.4.14400
22.
Correll
,
N.
, and
Martinoli
,
A.
,
2011
, “
Modeling and Designing Self-Organized Aggregation in a Swarm of Miniature Robots
,”
Int. J. Rob. Res.
,
30
(
5
), pp.
615
626
.10.1177/0278364911403017
23.
Gauci
,
M.
,
Chen
,
J.
,
Dodd
,
T. J.
, and
Groß
,
R.
,
2014
, “
Evolving Aggregation Behaviors in Multi-Robot Systems With Binary Sensors
,”
Distributed Autonomous Robotic Systems: The 11th International Symposium
, Baltimore, MD, Nov, pp.
355
367
.10.1007/978-3-642-55146-8_25
24.
Kumar
,
M.
,
Garg
,
D. P.
, and
Kumar
,
V.
,
2010
, “
Segregation of Heterogeneous Units in a Swarm of Robotic Agents
,”
IEEE Trans. Autom. Control
,
55
(
3
), pp.
743
748
.10.1109/TAC.2010.2040494
25.
Kumar
,
M.
, and
Garg
,
D. P.
,
2011
, “
Aggregation of Heterogeneous Units in a Swarm of Robotic Agents
,”
Fourth International Symposium on Resilient Control Systems
, Boise, ID, Aug. 9–11, pp.
107
112
.10.1109/ISRCS.2011.6016099
26.
Steinberg
,
M. S.
,
1963
, “
Reconstruction of Tissues by Dissociated Cells: Some Morphogenetic Tissue Movements and the Sorting Out of Embryonic Cells May Have a Common Explanation
,”
Science
,
141
(
3579
), pp.
401
408
.10.1126/science.141.3579.401
27.
Gomes
,
J.
,
Urbano
,
P.
, and
Christensen
,
A. L.
,
2013
, “
Evolution of Swarm Robotics Systems With Novelty Search
,”
Swarm Intell.
,
7
(
2–3
), pp.
115
144
.10.1007/s11721-013-0081-z
28.
Inácio
,
F. R.
,
Macharet
,
D. G.
, and
Chaimowicz
,
L.
,
2019
, “
Pso-Based Strategy for the Segregation of Heterogeneous Robotic Swarms
,”
J. Comput. Sci.
,
31
, pp.
86
94
.10.1016/j.jocs.2018.12.008
29.
Kernbach
,
S.
,
Thenius
,
R.
,
Kernbach
,
O.
, and
Schmickl
,
T.
,
2009
, “
Re-Embodiment of Honeybee Aggregation Behavior in an Artificial Micro-Robotic System
,”
Adaptive Behav.
,
17
(
3
), pp.
237
259
.10.1177/1059712309104966
30.
Bayindir
,
L.
,
2012
, “
A Probabilistic Geometric Model of Self-Organized Aggregation in Swarm Robotic Systems
,”
Ph.D thesis
, Middle East Technical University, Ankara, Turkey.https://open.metu.edu.tr/handle/11511/22295
31.
Santos
,
V. G.
,
Pimenta
,
L. C.
, and
Chaimowicz
,
L.
,
2014
, “
Segregation of Multiple Heterogeneous Units in a Robotic Swarm
,” IEEE International Conference on Robotics and Automation (
ICRA
), Hong Kong, China, May 31–June 7, pp.
1112
1117
.10.1109/ICRA.2014.6906993
32.
Gupta
,
S.
,
Chaudhary
,
S.
,
Maurya
,
D.
,
Joshi
,
S. K.
,
Tripathy
,
N. S.
, and
Shah
,
S. V.
,
2022
, “
Segregation of Multiple Robots Using Model Predictive Control With Asynchronous Path Smoothing
,” IEEE Conference on Control Technology and Applications (
CCTA
), Trieste, Italy, Aug. 23–25, pp.
1378
1383
.10.1109/CCTA49430.2022.9966011
33.
Shlyakhov
,
N.
,
Vatamaniuk
,
I.
, and
Ronzhin
,
A.
,
2017
, “
Survey of Methods and Algorithms of Robot Swarm Aggregation
,”
J. Phys.: Conf. Ser.
,
803
, p.
012146
.10.1088/1742-6596/803/1/012146
34.
Trianni
,
V.
,
Groß
,
R.
,
Labella
,
T. H.
,
Şahin
,
E.
, and
Dorigo
,
M.
,
2003
, “
Evolving Aggregation Behaviors in a Swarm of Robots
,”
Advances in Artificial Life: Seventh European Conference
, ECAL, Dortmund, Germany, Sept. 14–17, pp.
865
874
.
35.
Santos
,
V. G.
,
Pires
,
A. G.
,
Alitappeh
,
R. J.
,
Rezeck
,
P. A.
,
Pimenta
,
L. C.
,
Macharet
,
D. G.
, and
Chaimowicz
,
L.
,
2020
, “
Spatial Segregative Behaviors in Robotic Swarms Using Differential Potentials
,”
Swarm Intell.
,
14
(
4
), pp.
259
284
.10.1007/s11721-020-00184-0
36.
Edson Filho
,
B.
, and
Pimenta
,
L. C.
,
2015
, “
Segregating Multiple Groups of Heterogeneous Units in Robot Swarms Using Abstractions
,” IEEE/RSJ International Conference on Intelligent Robots and Systems (
IROS
), Hamburg, Germany, Sept. 28–Oct. 2, pp.
401
406
.10.1109/IROS.2015.7353404
37.
Inácio
,
F. R.
,
Macharet
,
D. G.
, and
Chaimowicz
,
L.
,
2018
, “
United we Move: Decentralized Segregated Robotic Swarm Navigation
,”
Distributed Autonomous Robotic Systems: The 13th International Symposium
, London, UK, Nov. 7–9, pp.
313
326
.10.1007/978-3-319-73008-0_22
38.
Santos
,
V. G.
, and
Chaimowicz
,
L.
,
2014
, “
Cohesion and Segregation in Swarm Navigation
,”
Robotica
,
32
(
2
), pp.
209
223
.10.1017/S0263574714000563
39.
Witsenhausen
,
H. S.
,
1968
, “
A Counterexample in Stochastic Optimum Control
,”
SIAM J. Control
,
6
(
1
), pp.
131
147
.10.1137/0306011
40.
Gama
,
F.
,
Marques
,
A. G.
,
Leus
,
G.
, and
Ribeiro
,
A.
,
2019
, “
Convolutional Neural Network Architectures for Signals Supported on Graphs
,”
IEEE Trans. Signal Process.
,
67
(
4
), pp.
1034
1049
.10.1109/TSP.2018.2887403
41.
Li
,
Q.
,
Gama
,
F.
,
Ribeiro
,
A.
, and
Prorok
,
A.
,
2020
, “
Graph Neural Networks for Decentralized Multi-Robot Path Planning
,” IEEE/RSJ International Conference on Intelligent Robots and Systems (
IROS
), Las Vegas, NV, Oct. 24–Jan. 24, pp.
11785
11792
.10.1109/IROS45743.2020.9341668
42.
Tolstaya
,
E.
,
Gama
,
F.
,
Paulos
,
J.
,
Pappas
,
G.
,
Kumar
,
V.
, and
Ribeiro
,
A.
,
2020
, “
Learning Decentralized Controllers for Robot Swarms With Graph Neural Networks
,”
Conference on Robot Learning
, Osaka, Japan, Nov. 16–18, pp.
671
682
.https://www.researchgate.net/publication/332010803_Learning_Decentralized_Controllers_for_Robot_Swarms_with_Graph_Neural_Networks
43.
Khan
,
A.
,
Tolstaya
,
E.
,
Ribeiro
,
A.
, and
Kumar
,
V.
,
2020
, “
Graph Policy Gradients for Large Scale Robot Control
,”
Conference on Robot Learning
, Osaka, Japan, Nov. 16–18, pp.
823
834
.https://proceedings.mlr.press/v100/khan20a/khan20a.pdf
44.
Khan
,
A.
,
Kumar
,
V.
, and
Ribeiro
,
A.
,
2021
, “
Large Scale Distributed Collaborative Unlabeled Motion Planning With Graph Policy Gradients
,”
IEEE Rob. Autom. Lett.
,
6
(
3
), pp.
5340
5347
.10.1109/LRA.2021.3074885
45.
Tolstaya
,
E.
,
Gama
,
F.
,
Paulos
,
J.
,
Pappas
,
G.
,
Kumar
,
V.
, and
Ribeiro
,
A.
,
2020
, “
Learning Decentralized Controllers for Robot Swarms With Graph Neural Networks
,”
Proceedings of the Conference on Robot Learning, PMLR
,
L. P.
Kaelbling
,
D.
Kragic
, and
K.
Sugiura
, eds., Vol.
100
, pp.
671
682
.
46.
Gama
,
F.
,
Li
,
Q.
,
Tolstaya
,
E.
,
Prorok
,
A.
, and
Ribeiro
,
A.
,
2020
, “
Decentralized Control With Graph Neural Networks
,” arXiv preprint arXiv:2012.14906.
47.
Tolstaya
,
E.
,
Paulos
,
J.
,
Kumar
,
V.
, and
Ribeiro
,
A.
,
2020
, “
Multi-Robot Coverage and Exploration Using Spatial Graph Neural Networks
,” IEEE/RSJ International Conference on Intelligent Robots and Systems (
IROS
), Prague, Czech Republic, Sept. 27–Oct. 1, pp.
8944
8950
.10.1109/IROS51168.2021.9636675
48.
Blumenkamp
,
J.
,
Morad
,
S.
,
Gielis
,
J.
,
Li
,
Q.
, and
Prorok
,
A.
,
2022
, “
A Framework for Real-World Multi-Robot Systems Running Decentralized Gnn-Based Policies
,”
International Conference on Robotics and Automation
(
ICRA
), Philadelphia, PA, May 23–27, pp.
8772
8778
.10.1109/ICRA46639.2022.9811744
49.
Omotuyi
,
O.
, and
Kumar
,
M.
,
2022
, “
Learning Decentralized Controllers for Segregation of Heterogeneous Robot Swarms With Graph Neural Networks
,” International Conference on Manipulation, Automation and Robotics at Small Scales (
MARSS
), Toronto, ON, Canada, July 25–29, pp.
1
6
.10.1109/MARSS55884.2022.9870482
50.
Gama
,
F.
,
Li
,
Q.
,
Tolstaya
,
E.
,
Prorok
,
A.
, and
Ribeiro
,
A. R.
,
2022
, “
Synthesizing Decentralized Controllers With Graph Neural Networks and Imitation Learning
,”
IEEE Trans. Signal Process.
,
70
, pp.
1932
1946
.10.1109/TSP.2022.3166401
51.
Gama
,
F.
,
Marques
,
A. G.
,
Ribeiro
,
A.
, and
Leus
,
G.
,
2019
, “
Aggregation Graph Neural Networks
,”
ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing
(
ICASSP
), Brighton, UK, May 12–17, pp.
4943
4947
.10.1109/ICASSP.2019.8682975
52.
Ferreira-Filho
,
E. B.
, and
Pimenta
,
L. C.
,
2019
, “
Abstraction Based Approach for Segregation in Heterogeneous Robotic Swarms
,”
Rob. Autom. Syst.
,
122
, p.
103295
.10.1016/j.robot.2019.103295
53.
Omotuyi
,
O.
, and
Kumar
,
M.
,
2021
, “
Uav Visual-Inertial Dynamics (vi-d) Odometry Using Unscented Kalman Filter
,”
IFAC-PapersOnLine
,
54
(
20
), pp.
814
819
.10.1016/j.ifacol.2021.11.272
54.
Omotuyi
,
O.
,
2021
, “
Dynamics-Enabled Localization of Uavs Using Unscented Kalman Filter
,”
Master's thesis
,
University of Cincinnati
, Cincinnati, OH.https://www.proquest.com/openview/883bfbd7cc64165d3f206d9665049379/1?pqorigsite=gscholar&cbl=18750&diss=y
55.
Michael
,
N.
,
Mellinger
,
D.
,
Lindsey
,
Q.
, and
Kumar
,
V.
,
2010
, “
The Grasp Multiple Micro-Uav Testbed
,”
IEEE Rob. Autom. Mag.
,
17
(
3
), pp.
56
65
.10.1109/MRA.2010.937855
56.
Ross
,
S.
,
Gordon
,
G.
, and
Bagnell
,
D.
,
2011
, “
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
,”
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
, Fort Lauderdale, FL, Apr. 11–13, pp.
627
635
.https://proceedings.mlr.press/v15/ross11a.html
57.
Panerati
,
J.
,
Zheng
,
H.
,
Zhou
,
S.
,
Xu
,
J.
,
Prorok
,
A.
, and
Schoellig
,
A. P.
,
2021
, “
Learning to Fly-a Gym Environment With Pybullet Physics for Reinforcement Learning of Multi-Agent Quadcopter Control
,” IEEE/RSJ International Conference on Intelligent Robots and Systems (
IROS
), Prague, Czech Republic, Sept. 27–Oct. 1, pp.
7512
7519
.10.1109/IROS51168.2021.9635857
58.
Ayomoh
,
M. K.
,
Omotuyi
,
O. A.
,
Roux
,
A.
, and
Olufayo
,
O. A.
,
2018
, “
Robot Navigation Model in a Multi-Target Domain Amidst Static and Dynamic Obstacles
,” Proceedings of the IASTED International Conference Intelligent Systems and Control (
ISC 2018
), Calgary, AB, Canada, July 16–17, pp.
44
51
.10.2316/P.2018.858-015