Multi-Agent Systems

Published on February 9, 2025 by Remy

1. Multi-Agent Systems Overview

  • Definition: What is an agent?
  • Concept clarification: Differences and connections between workflow and (multi-)agent systems

2. Design Patterns for Multi-Agent Systems

2.1 Fundamentals: From Workflow to Agents

  • How to distinguish and choose between workflow vs agent architecture
  • From simple to complex: Progressive construction methodology

2.2 Key Components of Multi-Agent Systems

  • Agents: Systems where LLMs can dynamically guide their own processes and tool usage, capable of autonomous control over task completion methods.

  • Environment: The external world in which agents operate, which agents can perceive and act upon. The environment can be a software world or physical spaces like factories, roads, power grids, etc.

  • Interactions: Agents communicate through standardized communication languages. Depending on system requirements, interactions between agents include various forms such as cooperation, coordination, negotiation, etc.

  • Organization: Agents can be organized through hierarchical control or self-organize based on emergent behavior.

3. Key Modules

3.1 Basic Components

  • Multi-Agent System Structures:

    • Equi-Level Structure: Studies agent systems operating at the same level, such as DMAS (Chen et al., 2023).

    • Hierarchical Structure: Explores hierarchical structures with leader and follower roles, such as Stackelberg games (Von Stackelberg, 2010; Conitzer & Sandholm, 2006; Harris et al., 2023).

    • Nested Structure: Studies nested or hybrid structures, such as (Chan et al., 2023).

    • Dynamic Structure: Discusses dynamic structures of multi-agent systems, such as (Talebirad & Nadiri, 2023).

  • Planning:

    • Global Planning: Involves understanding overall tasks and decomposing tasks into subtasks, as well as coordinating workflows between agents.

    • Single-Agent Task Decomposition: Within individual agents, task decomposition involves breaking down large tasks into a series of manageable small tasks.

  • Memory/Context Management:

    • Short-term Memory: Immediate, temporary memory used during conversations or interactions.

    • Long-term Memory: Memory that stores historical conversations and responses.

    • External Data Storage: Such as RAG (Lewis et al., 2020), used for supplementary information sources.

    • Episodic Memory: Memory involving a series of interactions in multi-agent systems.

    • Consensus Memory: In multi-agent systems, serves as a unified source of shared information.

[cite] LLM Multi-Agent Systems: Challenges and Open Problems

3.2 Framework Selection

  • Key considerations for framework selection: when to use, when to avoid
  • Introduction to existing general frameworks: langgraph, autogen, swarm
  • Selection recommendations

3.3 Interaction Design for Multi-Agent Applications

  • Interaction design principles:

    • Task-oriented: Multi-agent applications should customize optimal interaction interfaces for each task, conforming to workflow design rather than using chat windows for everything.
    • Detail abstraction: Users don’t need to understand internal implementation details of agents, only their external behavior and interaction methods
    • Simple interaction: Interactions between users and agents should be as simple as possible, avoiding complex operational procedures
    • Adequate feedback: Agent behavior and decision-making processes should have clear feedback
    • Controllability: Users should be able to intervene and adjust agent behavior
  • Typical product case analysis:

    • Multi-agent applications (code development): cline, devin, cursor agent
    • Multi-agent applications (writing): deep researcher
    • Multi-agent building platforms: coze, langgraph studio

3.4 Evaluation Methods, Datasets, and Metrics

  • Benchmarks

    • Public benchmarks:
      • ToolBench: GitHub - OpenBMB/ToolBench: [ICLR’24 spotlight] An open platform for training, serving, and evaluation Large-scale tool calling instruction dataset supporting multi-step reasoning and real API integration
      • SWE-bench: https://github.com/princeton-nlp/SWE-bench Benchmark for evaluating LLM’s ability to solve GitHub issues, containing 2,294 Python code fixing tasks
      • Mind2Web: https://huggingface.co/datasets/osunlp/Mind2Web General web interaction dataset covering diverse DOM operations and user trajectories
      • WebArena: GitHub - web-arena-x/webarena: Code repo for “WebArena: A Realistic Web Environment for Building Autonomous Agents” Real web environment testing platform integrating 4 applications and tool libraries
      • AgentInstruct: https://huggingface.co/datasets/zai-org/AgentInstruct High-quality agent instruction dataset enhancing model task generalization capabilities
  • Evaluation Metrics

    • Performance Metrics

      • Task Resolution Rate (% Resolved)

        • Core performance indicator reflecting system’s problem-solving capability
      • Localization Accuracy (F1 Score)

        • Balance of precision and recall for code defect localization
      • Visual Necessity Impact

        • Improvement in task resolution rate from image input
      • Image Type Sensitivity

        • Adaptability to different image types (code screenshots/UI images/charts)
      • Code Modification Complexity (Patch Complexity)

        • Scale of reference solution modifications (number of files/lines/functions)
      • Task Difficulty Distribution

        • Proportion of tasks with different time requirements (simple/medium/difficult/extremely difficult)
      • Task Success Rate

        • This metric measures the percentage of tasks successfully completed by agents, directly indicating overall efficiency.
        • Formula: Task Success Rate=Number of Successfully Completed TasksTotal Number of Tasks×100%\text{Task Success Rate} = \frac{\text{Number of Successfully Completed Tasks}}{\text{Total Number of Tasks}} \times 100\%
      • Collaboration Efficiency

        • Evaluates how effectively multiple agents cooperate to achieve common goals
      • Task Allocation Accuracy

        • Whether tasks are assigned to the most suitable agents
        • Formula: Task Allocation Accuracy=Number of Correctly Allocated TasksTotal Number of Tasks×100%\text{Task Allocation Accuracy} = \frac{\text{Number of Correctly Allocated Tasks}}{\text{Total Number of Tasks}} \times 100\%
      • Task Completion Accuracy

        • Accuracy of output results
        • Formula: Task Completion Accuracy=Number of Correct Output ResultsTotal Number of Completed Tasks×100%\text{Task Completion Accuracy} = \frac{\text{Number of Correct Output Results}}{\text{Total Number of Completed Tasks}} \times 100\%
      • Output Coherence

        • Logical consistency of generated content
      • Progress Rate

        • Reference: AgentBoard https://arxiv.org/pdf/2401.13178
        • Continuous metric calculated through sub-goal matching or state similarity, reflecting gradual progress during task completion (0 to 1)
        • Formula:
          • Continuous tasks (such as state matching): rtmatch=max0itf(si,g)r_t^{\text{match}} = \max_{0 \leq i \leq t} f(s_i, g) where f(si,g)f(s_i, g) is the similarity function between current state sis_i and target state gg (such as table cell matching ratio).

          • Discrete sub-goal tasks (such as multi-step planning): rtsubgoal=max0it1Kk=1Kf(si,gk)r_t^{\text{subgoal}} = \max_{0 \leq i \leq t} \frac{1}{K} \sum_{k=1}^K f(s_i, g_k) where gkg_k is the manually annotated kk-th sub-goal, and f(si,gk){0,1}f(s_i, g_k) \in \{0,1\} indicates whether the sub-goal is completed.

    • Efficiency Metrics

      • Average Cost (Avg. Cost)
        • Economic cost of single-task inference
      • Abnormal Termination Rate
        • Proportion of task interruptions due to cost overruns or errors (reflecting resource waste)
      • Communication Latency: Agent response time (milliseconds)
      • Task Completion Time
    • Scalability Metrics

      • Multi-file Type Editing
        • Proportion of tasks that simultaneously modify multiple file types (TS/HTML/CSS)
      • Language Adaptability Comparison
        • Resolution rate degradation of Python-centric systems in JavaScript tasks
    • Reliability Metrics

      • Test Consistency
        • Consistency of the same patch passing multiple runs
      • Error Recovery Capability
        • Effectiveness of automatic retry or rollback mechanisms after task termination

4. Application Domains

4.1 Characteristics and Applicable Scenarios of Multi-Agent Systems

  • Characteristics of multi-agent systems: Non-real-time, non-deterministic
  • Suitable scenarios for multi-agent systems: Offline operations, error tolerance, high user tolerance and willingness to intervene
  • When (and when not) to use multi-agent systems

4.2 Typical Applications of Multi-Agent Systems in Banking Scenarios

  • Customer service robots: From knowledge operations to intelligent customer service
  • Intelligent writing: Simple copywriting, research report style, marketing copywriting style
  • Intelligent marketing: From customer profiling to precision marketing

5. References

Prompt-To-Agent: Create custom engineering agents for your code

Ad Blocker Detected

We noticed that you are using an ad blocker. This site relies on advertisements to provide free content and stay operational.

How to whitelist our site:

To continue accessing our content, please disable your ad blocker or whitelist our site. Once you've disabled it, please refresh the page.

Thank you for your understanding and support! 🙏