Top 5: Big Brain Bigger Appetite

Big Brain Bigger Appetite

Members’ names: Frederieke Lohmann, Yi-Yi Ly, Arvid Ban, David Hofer

Motivation
Our goal of the 24-hour challenge was to improve Sentient’s OpenDeepSearch Tool by exploring improvements on the architecture.

Our approach consisted of

  1. extensive brainstorming and research of possible methods
  2. implementation/hacking
  3. evaluating on a small subset of the FRAMES dataset
  4. reiterate at step (1)

We hereby note that we used a own baseline by running the code as we received it. For the autograder, we consistently used llama-v3p1-70b-instruct to grade our results. The baseline achieved an accuracy of 52.7%.

Exploration
We explored several model architectures and evaluated their accuracy on a fixed subset of FRAMES of 243. Some evaluation results are based on a fixed subset of 88 samples.

First, we implemented an ensemble method where we stacked five models, varying in size, aggregating their results with both an embedding-based and an LLM based approach. Second, we implemented query rephrasing, where the original user query was augmented to three rephrased queries. We ingested both original user query and the three rephrased queries. Third, we implemented different planning strategies. Finally, we also took the total combination of these newly implemented methods.

The accuracy was determined by dividing the grades A over the total number of samples (824).

Screenshot 2025-04-21 040210

Presentation
GitHub Repo
Writeup PDF

2 Likes