Intent-Driven Network Management with Multi-Agent LLMs: The Confucius Framework

WenyunXu · September 10, 2025, 8:16am

Title: Intent-Driven Network Management with Multi-Agent LLMs: The Confucius Framework

Authors：Zhaodong Wang, Samuel Lin, Guanqing Yan(Meta); Soudeh Ghorbani(Meta, Johns Hopkins University); Minlan Yu(Harvard University) ; Jiawei Zhou(Stony Brook University); Nathan Hu, Lopa Baruah, Sam Peters, Srikanth Kamath, Jerry Yang, Ying Zhang(Meta)

Scribe: Wenyun Xu(Xiamen University)

Introduction

The paper addresses the challenge of managing large-scale networks, a task that remains complex and resource-intensive despite significant research efforts. While Large Language Models (LLMs) show promise, simply using them in a single step for intricate, multi-step network management tasks is not effective. The problem is that these tasks are highly domain-specific and require a nuanced approach that incorporates expert knowledge and iterative refinement.

Key idea and contribution

The authors introduce Confucius, a novel multi-agent LLM framework for network management at Meta. The key idea is to decompose complex management tasks into smaller, structured subtasks that can be executed by specialized, domain-specific tools and databases. The framework’s core contributions are: modeling network management workflows as Directed Acyclic Graphs (DAGs) to aid planning ; seamlessly integrating LLMs with existing tools and procedures (like MOPs/workflows) ; employing Retrieval-Augmented Generation (RAG) for effective long-term memory ; and creating a set of primitives to systematically facilitate human-model interaction and ensure correctness. The authors’ main contribution is demonstrating a pioneering, comprehensive framework for building and deploying LLM-assisted network management applications in a hyper-scale production environment.

Evaluation

Confucius has been successfully deployed in a production environment for two years and has supported over 60 network management applications. The evaluation shows that Confucius leads to significant time savings for developers, reducing the average development time by 17 engineer-hours per week. The framework also improves accuracy by up to 21% compared to solutions that rely solely on foundation models. This result is significant because it demonstrates that a pragmatic, multi-agent approach can lead to tangible improvements in both efficiency and performance in a real-world, large-scale network environment.

Q: How do you ensure the outcome from your tool is correct？And when you encounter incredibly results , what are you going to do with that?

A: The basic idea is to tie in with our existing validation tools such as dry run to begin with. The last result is usually human for very mission critical events. Confucius made this as building so that the output will first pass into this automated systems for validation. And if there’s error, this is all transparent to the user. Together with the error it will be fed back to the model, the model will try again. The other way we tried is to use multiple models as experts to have a majority vote on the answer.

Q: If you already use a validator , have you ever considered to connect your validator with your tool so that you can close the loop?

A: Some tools are already connected to a validator, in which case we simply use that existing setup. However, other tools are not connected due to their specific operational processes, as their output may not be directly applied to production but rather used for planning purposes. In these separate cases, we essentially embed validation into the planning phase, making it a sub-task within the directed acyclic graph (DAG) of the overall execution.

Q: How is the multi agent collaboration work in your framework?

A: In our multi-agent framework, the basic approach is to separate agents based on their function. We have one agent that specializes in planning and others that specialize in specific tools. Each tool can have a different agent, and the goal is to allow each engineer to focus on developing their own agent effectively. For instance, if I am developing an SQL query agent, I can ensure my agent understands its context very well. These specialized agents are then connected through the planning process, which is the reasoning part that we leverage the large language model to perform.

Personal thoughts

This paper provides a highly valuable and pragmatic look into the real-world application of LLMs in a complex domain like network management. What I like is the focus on integration over replacement, leveraging the vast number of existing tools and domain knowledge rather than trying to reinvent the wheel with a single, monolithic LLM. The emphasis on correctness and safety through validation frameworks and human-in-the-loop primitives is also a critical and well-addressed aspect.

A potential open question or area for future work is how the framework handles highly novel or unpredictable network incidents that don’t fit into existing codified workflows. While the multi-agent approach provides flexibility, it’s an interesting question how much the system can “self-learn” new diagnostic or management procedures without explicit human-defined templates. The paper is an experience report from a specific environment (Meta), and further research could explore the generalizability of the framework to different enterprise sizes and network infrastructures.