July 7th, 2024 Open AGI Summit Brussels
Professor Narasimhan (Princeton University, Sierra AI) presented some of his thoughts on language agents and multi-agent interaction at the Open AGI summit last week in Brussels; here are the key points he covered.
Full Talk:
Talk Notes:
- We can define a language agent as one who can understand and generate language while also being capable of taking action.
- This includes programming languages or even math.
- Humans are language agents.
An example of an agent:
- In Software Engineering, writing code is usually only 20% of the work. Most of the time and effort goes into debugging, refactoring, and maintaining the code.
- Professor Narasimhanâs lab at Princeton recently developed SWE-agent, an agent that can automate parts of software engineering: taking an issue description and trying hundreds of actions until the issue is resolved. [1] (This is an Open Source Devin)
Multi-agent interaction:
- To some extent, the entire world is a giant multi-agent playground.
- When you think about it, itâs obvious that: Language is key for any multi-agent communication and collaboration.
- Language has been key to how we have built society and accelerated progress.
- Language is key for language agents because it:
- Allows for real-time communication with other agents (humans and AI)
- Allows for understanding the world by âreadingâ and âlisteningâ
- In the near term, itâs likely that we will have agents talking to each other, not just humans.
Challenges for building AI Agents:
The primary challenges for building Agents in the days ahead include:
- Building good evaluations:
- Static benchmarks and datasets are not likely to be successful for testing how agents will perform in the real world in a dynamic setting.
- Developing and using principled frameworks for agent development (e.g., CoALA [Sumers et al., 2023]). [2]
- Ensure trustworthiness and safety:
- Agents can be much more powerful and dynamic than one-pass models and have the potential to affect society in ways that we canât even imagine today.
Recent work at Sierra on building good evaluations:
At Professor Narasimhanâs startup Sierra AI, they have recently developed a new benchmark for evaluating agents [t - bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains; Yao et al., 2024]: [3]
- TAU-bench: Tool Agent User Benchmark. In this benchmark:
- An agent has to take on the role of a customer service representative
- The agent is interacting with a human and an environment.
- This is a challenging and realistic task:
- The agent has to deal with partial information
- The agent has to interact with tools
- Testing popular models on this benchmark, we find that current language agents are not up for this task.
- As depicted in the figure above, as you run the scenario a few times, even the best models degrade in performance very quickly
- While progress is being made towards a multi-agent future, we need to solve critical issues like reliability and dependability.
- If we can solve the crux of the issue, which is having them understand language and use language as a tool for reasoning or more complex computation, we can really have a collaborative multi-agent society in the future.
References: