Basic Agent Evaluation Runner
Instructions:
- Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
- Log in to your Hugging Face account using the button below. This uses your HF username for submission.
- Enter your OpenAI key below (if required by your agent).
- Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
Disclaimers: Once clicking on the "submit" button, it can take quite some time (this is the time for the agent to go through all the questions). This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance, for the delay process of the submit button, a solution could be to cache the answers and submit in a separate action or even to answer the questions in async.
Questions and Agent Answers