Banghua Zhu: Nexusflow seperated open source models and agents

OpenAGISummit · July 28, 2024, 11:35am

Banghua Zhu - Nexus Flow - Assistant Professor at University of Washington

Full Talk Recording:

Talk Notes:

At the Open AGI summit in Brussels, Professor Banghua Zhu, co-founder of Nexusflow and Professor at University of Washington discussed Nexusflow’s approach for creating language software agents using their own customized language models.

Extractive vs Abstractive Reasoning:

We start with the goal of complex instruction following and tool use in a safe and helpful way
Consider the following example task: Asking an AI agent to drive you to a destination and to pick up your son along the way
- To achieve this requires complex instruction following and tool use in a helpful way
Case study from Nexus flow customers: GPT4 is insufficient in terms of:
- Accuracy
- Hallucination
- Latency
Gen AI agents have a reasoning trade-off:

There is a trade-off between being able to call functions directly and to be creative
Nexus flow believes that for smaller models it is better to separate extractive reasoning and abstract reasoning and have taken this approach with significant success

Nexus Flow Models:

Nexusflow has open-sourced their two separated models:
- NexusRaven-V2-13B
  - Based on CodeLlama-13B
  - Surpasses GPT-4 in complex tool use with minimal hallucination
- Starling-7B
  - Best Mistral-7B based model on ChatbotArena
- Nexus flow uses Starling-7B to interact with customers and NexusRaven-V2-13B behind the scenes to call the right functions.

NexusRaven-V2 LLM:

AI agents have multiple tool-API cases:
- Single tool API: Only one function to accomplish a user query
- Parallel tool API: User requests require multiple calls
- Nested tool API: Nested tool APIs: When API calls need to be embedded into other API calls

On their own benchmark, NexusRaven-V2 achieves 7% better accuracy for nested and parallel APIs at a 100x smaller model size.
NexusRaven-V2-13B greatly improves extractive reasoning capability in reliable tools use over CodeLlama-13B

RoT Bench:

On external benchmarks: NexusRaven-V2 greatly outperforms GPT-4 and GPT-3.5 on RoTBench: A Multi-level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

On per task basis: NexusRaven-V2 a 71% win rate over GPT-4 on RoTBench

Starling-7B

Human-preference alignment at SoTA data efficiency
At the time of release, 13th on Chatbot Arena, beating:
- Llama-2-Chat 70B
- Gemini-Pro-V1.0
- Mistral-8x7B-Intruct
Both Starling and Raven have been out for 6 months
There will be an Open Model much stronger in Chat and Function Calling Coming soon from Nexus Flow!

firsthokage100 · September 18, 2024, 3:59pm

This is so beautiful to see

Ruslan111 · September 18, 2024, 7:34pm

I completely agree

alireza65 · September 21, 2024, 3:07pm

this sound is so nice

ke_og · January 1, 2025, 7:47pm

cool, your researches are really helpfull

Husbandman24 · January 9, 2025, 4:10pm

This was a beautiful read. Great stuff!

Melissa · January 10, 2025, 7:19am

So cool research ! thx you so much

nancyknh · January 14, 2025, 8:44am

will we fleap openai?

MrGrut · January 16, 2025, 7:08am

This approach aligns well with real-world needs, especially in complex tool use

kris1 · January 25, 2025, 8:42am

Consider to depoly into the internet?

kris1 · January 25, 2025, 8:43am

Maybe we can try to publish it in the app.

blazeX · January 29, 2025, 2:55am

Great stuff, great to see

fibon · February 9, 2025, 9:42am

very nice. i really like flow of this blog post.

Topic	Replies	Views
2nd Place: Apple Pi ETH Zurich Datathon (ODS)	61	April 20, 2025
Leveling Up Reasoning Via Games: a Post AGI-thon Analysis AGI-thon: Agent Building	169	December 12, 2024
Top 5: Carpal Tunnel Bros ETH Zurich Datathon (ODS)	48	April 21, 2025
Honorable Mention: Siuuupremacy ETH Zurich Datathon (ODS)	46	April 22, 2025
Honorable Mention: BluBomberBing ETH Zurich Datathon (ODS)	45	April 22, 2025

Banghua Zhu: Nexusflow seperated open source models and agents

Related topics