💻 SysAdmin Network =========================== .. contents:: Table of Contents :depth: 3 The Multi-agent SysAdmin problem is a widely used benchmark in the study of decision-making over networked systems. Originally introduced by Guestrin et al. in 2002 as a single-agent factored MDP benchmark :cite:authorpar:`guestrin2001max`, it was later extended into a multi-agent formulation to evaluate coordinated reinforcement learning algorithms :cite:authorpar:`guestrin2002coordinated`. Over the years, it has remained a standard reference in the field, with recent works reaffirming its relevance :cite:authorpar:`bargiacchi2021cooperative`, :cite:authorpar:`bianchi2024developing`. This environment provides a modern and open-source implementation of the multi-agent version of the SysAdmin problem, specifically designed for multi-agent reinforcement learning (MARL). It maintains the original intent of testing structure-aware planning and coordination under uncertainty, while ensuring compatibility with modern MARL libraries. Environment Description ----------------------- The environment models a network of computers performing tasks. Each computer (agent) can be in one of several health states and may also be processing a task. Over time, machines may become *faulty*, slowing down task completion, or *dead*, making the task impossible to complete. Faults can spread probabilistically to neighboring computers. At each timestep, agents may choose to *reboot* their machine, which resets it to a working state (*good*) with high probability, but also discards progress on any active task. This decentralized setting models a **Partially Observable Markov Decision Process** (PoMDP), where coordination between agents is critical for maintaining system-wide performance and limiting fault propagation. .. figure:: ../assets/sysadmin-illustration.png :align: center :width: 100% :alt: Illustration for one state update in the SysAdmin Network problem. Illustration for one state update in the SysAdmin Network problem. Graph Topology ~~~~~~~~~~~~~~ As in the Binary Consensus environment, the underlying graph defines the network topology. It determines which agents (computers) are neighbors and hence how faults can spread between them. State Space ----------- Each agent's state is defined by two categorical variables: - **Health status:** one of *good*, *faulty*, or *dead* - **Task status:** one of *idle*, *loaded*, or *successful* Formally, the global joint state at time :math:`t` is an element of: .. math:: S(t) \in \{\text{good}, \text{faulty}, \text{dead}\}^N \times \{\text{idle}, \text{loaded}, \text{successful}\}^N This results in a total state space size of :math:`9^N`. Action Space ------------ At each timestep, every agent selects one of two discrete actions: - **Do nothing** (continue current operation) - **Reboot** the machine (resets health to *good* with high probability but loses current task progress) Objective --------- The overall goal is to **maximize the number of successfully completed tasks** over time. This objective can be framed as either a finite-horizon or infinite-horizon cumulative reward problem. Performance is closely tied to how effectively agents collaborate and leverage the graph structure to mitigate cascading faults. By modeling local and global trade-offs in a structured environment, the Multi-agent SysAdmin problem serves as an effective testbed for evaluating decentralized and semi-centralized MARL strategies under uncertainty and partial observability. Environment ------------------------------ .. automodule:: cognac.env.SysAdmin.env :members: :show-inheritance: :undoc-members: :private-members: Rewards ---------------------------------- .. automodule:: cognac.env.SysAdmin.rewards :members: :show-inheritance: :undoc-members: :private-members: