# Introduction

### What can Benchflow do?

* Tons of easy to run benchmarks for your agent
* Life span agent track including prompts, cost, time, metrics ...
* Agent evaluation of over 3x speedup
* Benchmark bundle to help you stands out among your competitors

### Agent Developer

Only two steps to test your agents

{% stepper %}
{% step %}

### Integrate your own agent

{% tabs %}
{% tab title="Python" %}

```python
from benchflow import BaseAgent

class YourAgent(BaseAgent):
    def call_api():
        # Define how to run your agent
        pass
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

### Test your agent on benchmarks

{% tabs %}
{% tab title="Python" %}

```python
from benchflow import load_benchmarks

bench = load_benchmark("benchmark_name")
your_agent = YourAgents()

run_ids = bench.run(
    agents=your_agent,
    requirements_dir = "requirements.txt",
    api={"OPENAI_API_KEY": os.getenv("OPENAI_API_KEY")},
    params={}
)

results = bench.get_results(run_ids)
```

{% endtab %}
{% endtabs %}
{% endstep %}
{% endstepper %}

### Benchmark Developer

Three steps to integrate your benchmarks

{% stepper %}
{% step %}

### Make Your Benchmark a Client

{% tabs %}
{% tab title="Python" %}

```python
from benchflow import BenchClient
from typing import Dict, Any

class YourBenchClient(BenchClient):
    def __init__(self, agent_url: str):
        super().__init__(agent_url)
        
    def prepare_environment(self, state_update: Dict) -> Dict:
        """Prepare the benchmark infomation for agents"""
        return {
            "env_info": {
                "info1": state_update['info1'],
                "info2": state_update['info2']
            }
        }
    
    def parse_action(self, raw_action: str) -> str:
        """Process the agent response.""" 
        return parsed_action
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

### Package and Upload Your Benchmark Docker Image

* Packaged your benchmark logic into a Docker image.
* Configured the image to read required environment variables (such as `AGENT_URL`, `TEST_START_IDX`, etc.).
* Uploaded the Docker image to DockerHub
  {% endstep %}

{% step %}

### Upload Your Benchmark to Benchflow&#x20;

Extend the `BaseBench` class and upload your work to benchflow!

{% tabs %}
{% tab title="Python" %}

<pre class="language-python"><code class="lang-python">class YourbenchConfig(BaseBenchConfig):
    # specific required params here
    required_env = []
    # specific optional params here
    optional_env = ["INSTANCE_IDS", "MAX_WORKERS", "RUN_ID"]

    def __init__(self, params: Dict[str, Any], task_id: str):
        # specific default params here
        params.setdefault("INSTANCE_IDS", task_id)
        params.setdefault("MAX_WORKERS", 1)
        params.setdefault("RUN_ID", task_id)
        super().__init__(params)
<strong>
</strong><strong>class YourBench(BaseBench):
</strong>    def __init__(self):
        super().__init__()

    def get_config(self, params: Dict[str, Any], task_id: str) -> BaseBenchConfig:
        """
        Return a WebArenaConfig instance that validates the input parameters.
        """
        return YourbenchConfig(params)
    
    def get_image_name(self) -> str:
        """
        Return the Docker image name for running the WebArena benchmark.
        """
        return "image/url"
    
    def get_results_dir_in_container(self) -> str:
        """
        Return the directory inside the container where benchmark results will be stored.
        """
        return "/app/results"
    
    def get_log_files_dir_in_container(self) -> str:
        """
        Return the directory inside the container where log files will be stored.
        """
        return "/app/log_files"
    
    def get_result(self, task_id: str) -> Dict[str, Any]:
        """
        Read and parse the benchmark result from log files.
        This method expects a file named 'log_files.txt' in the results directory.
        It reads the content of each log file listed, aggregates the logs, and extracts
        the average score and pass status.
        """
        # define how to get logs and results here         
        return {
            "is_resolved": is_resolved,
            "score": score,
            "message": {"details": "Task runs successfully."},
            "log": log_content
        }
    
    def get_all_tasks(self, split: str) -> Dict[str, Any]:
        """
        Return a dictionary containing all task IDs and an optional error message.
        """
        return {"task_ids": task_ids, "error_message": None}
</code></pre>

{% endtab %}
{% endtabs %}
{% endstep %}
{% endstepper %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://benchflow.gitbook.io/benchflow/introduction/introduction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
