Banghua Zhu - Nexus Flow - Assistant Professor at University of Washington
Full Talk Recording:
Talk Notes:
At the Open AGI summit in Brussels, Professor Banghua Zhu, co-founder of Nexusflow and Professor at University of Washington discussed Nexusflow’s approach for creating language software agents using their own customized language models.
Extractive vs Abstractive Reasoning:
- We start with the goal of complex instruction following and tool use in a safe and helpful way
- Consider the following example task: Asking an AI agent to drive you to a destination and to pick up your son along the way
- To achieve this requires complex instruction following and tool use in a helpful way
- Case study from Nexus flow customers: GPT4 is insufficient in terms of:
- Accuracy
- Hallucination
- Latency
- Gen AI agents have a reasoning trade-off:
-
There is a trade-off between being able to call functions directly and to be creative
-
Nexus flow believes that for smaller models it is better to separate extractive reasoning and abstract reasoning and have taken this approach with significant success
Nexus Flow Models:
- Nexusflow has open-sourced their two separated models:
- NexusRaven-V2-13B
- Based on CodeLlama-13B
- Surpasses GPT-4 in complex tool use with minimal hallucination
- Starling-7B
- Best Mistral-7B based model on ChatbotArena
- Nexus flow uses Starling-7B to interact with customers and NexusRaven-V2-13B behind the scenes to call the right functions.
- NexusRaven-V2-13B
NexusRaven-V2 LLM:
- AI agents have multiple tool-API cases:
- Single tool API: Only one function to accomplish a user query
- Parallel tool API: User requests require multiple calls
- Nested tool API: Nested tool APIs: When API calls need to be embedded into other API calls
- On their own benchmark, NexusRaven-V2 achieves 7% better accuracy for nested and parallel APIs at a 100x smaller model size.
- NexusRaven-V2-13B greatly improves extractive reasoning capability in reliable tools use over CodeLlama-13B
RoT Bench:
- On external benchmarks: NexusRaven-V2 greatly outperforms GPT-4 and GPT-3.5 on RoTBench: A Multi-level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning
- On per task basis: NexusRaven-V2 a 71% win rate over GPT-4 on RoTBench
Starling-7B
- Human-preference alignment at SoTA data efficiency
- At the time of release, 13th on Chatbot Arena, beating:
- Llama-2-Chat 70B
- Gemini-Pro-V1.0
- Mistral-8x7B-Intruct
- Both Starling and Raven have been out for 6 months
- There will be an Open Model much stronger in Chat and Function Calling Coming soon from Nexus Flow!