Banghua Zhu: Nexusflow seperated open source models and agents

Banghua Zhu - Nexus Flow - Assistant Professor at University of Washington

Full Talk Recording:

Talk Notes:

At the Open AGI summit in Brussels, Professor Banghua Zhu, co-founder of Nexusflow and Professor at University of Washington discussed Nexusflow’s approach for creating language software agents using their own customized language models.

Extractive vs Abstractive Reasoning:

  • We start with the goal of complex instruction following and tool use in a safe and helpful way
  • Consider the following example task: Asking an AI agent to drive you to a destination and to pick up your son along the way
    • To achieve this requires complex instruction following and tool use in a helpful way
  • Case study from Nexus flow customers: GPT4 is insufficient in terms of:
    • Accuracy
    • Hallucination
    • Latency
  • Gen AI agents have a reasoning trade-off:

  • There is a trade-off between being able to call functions directly and to be creative

  • Nexus flow believes that for smaller models it is better to separate extractive reasoning and abstract reasoning and have taken this approach with significant success

Nexus Flow Models:

  • Nexusflow has open-sourced their two separated models:
    • NexusRaven-V2-13B
      • Based on CodeLlama-13B
      • Surpasses GPT-4 in complex tool use with minimal hallucination
    • Starling-7B
      • Best Mistral-7B based model on ChatbotArena
    • Nexus flow uses Starling-7B to interact with customers and NexusRaven-V2-13B behind the scenes to call the right functions.

NexusRaven-V2 LLM:

  • AI agents have multiple tool-API cases:
    • Single tool API: Only one function to accomplish a user query
    • Parallel tool API: User requests require multiple calls
    • Nested tool API: Nested tool APIs: When API calls need to be embedded into other API calls

  • On their own benchmark, NexusRaven-V2 achieves 7% better accuracy for nested and parallel APIs at a 100x smaller model size.
  • NexusRaven-V2-13B greatly improves extractive reasoning capability in reliable tools use over CodeLlama-13B

RoT Bench:

  • On external benchmarks: NexusRaven-V2 greatly outperforms GPT-4 and GPT-3.5 on RoTBench: A Multi-level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

  • On per task basis: NexusRaven-V2 a 71% win rate over GPT-4 on RoTBench

Starling-7B

  • Human-preference alignment at SoTA data efficiency
  • At the time of release, 13th on Chatbot Arena, beating:
    • Llama-2-Chat 70B
    • Gemini-Pro-V1.0
    • Mistral-8x7B-Intruct
  • Both Starling and Raven have been out for 6 months
  • There will be an Open Model much stronger in Chat and Function Calling Coming soon from Nexus Flow!
12 Likes

This is so beautiful to see

3 Likes

:handshake: I completely agree

4 Likes

this sound is so nice

3 Likes