Targets Configuration

Targets define which agent or LLM provider to evaluate. AgentV uses one composable config graph across project manifests and eval files:

.agentv/config.yaml is the project-local discovery and composition root. It can hold targets, graders, tests, defaults, execution policy, results settings, and repo-local project policy.
$AGENTV_HOME/config.yaml is the user/operator config. Use it for defaults that apply across projects, project registry data, default result locations, and provider defaults that should not be copied into each repo.
eval.yaml is a focused, shareable slice of the same graph. Use it for a suite-specific target, grader, tests, evaluator settings, or run controls that should travel with the eval.

Any supported top-level field can stay inline or become a direct field reference such as targets: file://targets.yaml. Both forms normalize to the same config graph.

Structure

targets:
  - id: local-openai
    provider: openai
    runtime: host
    config:
      api_format: chat
      base_url: ${{ LOCAL_OPENAI_PROXY_BASE_URL }}
      api_key: ${{ LOCAL_OPENAI_PROXY_API_KEY }}
      model: ${{ LOCAL_OPENAI_PROXY_MODEL }}

  - id: codex-local
    provider: codex-app-server
    runtime: host
    config:
      command: ["codex", "app-server"]
      model: gpt-5-codex

graders:
  - id: openai-grader
    provider: openai
    config:
      model: gpt-5-mini

defaults:
  target: codex-local
  grader: openai-grader

Use id for the stable AgentV target identity. provider selects the adapter or control boundary. runtime describes where the provider runs; use host as the shorthand for the current machine, or object form when you need mode: host | profile | sandbox plus runtime-specific settings. Provider settings belong under config. Process-backed coding-agent providers use config.command as a non-empty argv array.

Runtime Modes

Use runtime: host when you want AgentV to run the target exactly as it is installed on the current machine. This is the best fit for local research, subscription-auth workflows, and evaluating the same CLI profile an engineer uses manually.

Use runtime.mode: profile when the target still runs as a host process but should use an isolated home/config directory, such as a dedicated CODEX_HOME or HOME.

Use runtime.mode: sandbox when the target should run inside a separate execution substrate. The built-in sandbox runner currently supports Docker for provider: cli; provider-specific coding-agent adapters such as codex-cli, claude-cli, copilot-cli, and pi-cli return a structured unsupported target error until their transcript parsers are wired through sandbox-aware runners.

targets:
  - id: codex-sandbox
    provider: codex-cli
    runtime:
      mode: sandbox
      engine: docker
      image: ghcr.io/acme/codex-agent:sha256
      workdir: /workspace
      network: none
      mounts:
        - source: ./workspace
          target: /workspace
          access: rw
        - source: ./.agentv/results
          target: /results
          access: rw
      env:
        AGENTV_RESULT_DIR: /results
      secrets:
        OPENAI_API_KEY: ${{ OPENAI_API_KEY }}
    config:
      command: ["codex", "exec", "--json"]
      timeout_seconds: 300

Sandbox mode does not inherit host credentials by default. Mount only the workspace, results, cache, or credential paths the target needs, and pass only the environment variables and secrets listed under runtime.env and runtime.secrets. Install the target CLI by using an image that already contains it or by adding explicit setup under runtime.setup; locate the CLI with config.command.

For CI, API-key or explicitly injected secret auth is the most reproducible path. Subscription OAuth can work in a sandbox only when you intentionally mount or seed the relevant profile directory into the sandbox. That makes the run less portable than API-key CI and should be reserved for workflows where matching a local subscription profile is the point of the evaluation.

Inline and decomposed forms are equivalent. This single-file config:

targets:
  - id: codex-local
    provider: codex-app-server
    runtime: host
    config:
      command: ["codex", "app-server"]
      model: gpt-5-codex

graders:
  - id: openai-grader
    provider: openai
    config:
      model: gpt-5-mini

tests:
  - id: smoke
    input: Fix the failing test.

defaults:
  target: codex-local
  grader: openai-grader

can be decomposed like this:

targets: file://targets.yaml
graders: file://graders.yaml
tests: file://tests.yaml
defaults: file://defaults.yaml

Referenced field files contain the field value directly. targets.yaml contains a bare array, not an object wrapped in targets::

- id: codex-local
  provider: codex-app-server
  runtime: host
  config:
    command: ["codex", "app-server"]
    model: gpt-5-codex

target: codex-local
grader: openai-grader

File refs are optional. Use them when a config field is large, reused, or owned by a separate team; keep fields inline when that is easier to read.

Environment Variables

Use ${{ VARIABLE_NAME }} syntax to reference values from your environment. AgentV reads exported process environment variables directly, and it also loads .env files from the eval directory hierarchy when present:

targets:
  - id: my-target
    provider: anthropic
    runtime: host
    config:
      api_key: ${{ ANTHROPIC_API_KEY }}
      model: ${{ ANTHROPIC_MODEL }}

This keeps secrets out of version-controlled files and avoids requiring a CI step that rewrites already-exported secrets into .env.

Supported Providers

Provider	Type	Description
`azure`	LLM	Azure OpenAI
`anthropic`	LLM	Anthropic Claude API
`gemini`	LLM	Google Gemini
`claude-cli`	Agent	Claude CLI subprocess
`claude-sdk`	Agent	Claude Agent SDK in an isolated child runner
`codex-cli`	Agent	Codex CLI subprocess
`codex-app-server`	Agent	Codex app-server subprocess
`codex-sdk`	Agent	Codex SDK in an isolated child runner
`copilot-cli`	Agent	Copilot CLI subprocess
`copilot-log`	Agent	Passive Copilot CLI session log reader
`copilot-sdk`	Agent	Copilot SDK in an isolated child runner
`pi-sdk`	Agent	Pi SDK in an isolated child runner
`pi-cli`	Agent	Pi CLI subprocess
`pi-rpc`	Agent	Pi RPC subprocess over stdio
`vscode`	Agent	VS Code with Copilot
`vscode-insiders`	Agent	VS Code Insiders
`cli`	Agent	Any CLI command — see CLI Provider
`mock`	Testing	Explicit mock target for examples and tests

Referencing Targets in Evals

Select the system under test with defaults.target, top-level target, or CLI --target, depending on the command flow. Test cases do not choose targets; split target-specific cases into separate eval suites, select them with tags/filters, or run the same eval with different --target values.

target: local-openai

tests:
  - id: test-1
  - id: test-2

The string is a configured target id. Use object form when an eval needs a local target variant:

target:
  id: codex-high-reasoning
  provider: codex-app-server
  runtime: host
  config:
    command: ["codex", "app-server"]
    model: gpt-5-codex
    reasoning_effort: high

Use defaults.grader for the project default grader. A specific evaluator can still choose its own grader target when the evaluator supports that override.

Lifecycle Extensions

Run non-provisioning setup at Promptfoo-compatible lifecycle points using top-level extensions. The harness materializes workspace.template and workspace.repos first, then runs beforeAll extensions. Use extensions for dependency installs, builds, fixture generation, and agent-rule staging. Use target hooks for runner-specific setup. Keep repo identity and checkout pins in workspace.repos; extensions must not become the default repo acquisition path.

extensions:
  - file://scripts/workspace.mjs:beforeAll
  - file://scripts/workspace.mjs:beforeEach
  - file://scripts/workspace.mjs:afterEach
  - file://scripts/workspace.mjs:afterAll
  - id: agentv:agent-rules
    hook: beforeAll
    skills: agent-rules/skills
    rules: agent-rules/AGENTS.md

workspace:
  template: ./workspace-templates/my-project
  hooks:
    after_each:
      reset: fast

Field	Description
`template`	Directory to copy as workspace
`extensions[]`	`file://...:beforeAll`, `beforeEach`, `afterEach`, `afterAll`, or `agentv:agent-rules`
`hooks.after_each.reset`	Reset mode: `none`, `fast`, `strict`

Lifecycle order: template copy → repo materialization → extensions.beforeAll → target hooks.before_all → git baseline → (extensions.beforeEach → target hooks.before_each → agent runs → file changes captured → target hooks.after_each → extensions.afterEach → workspace.hooks.after_each.reset) × N tests → target hooks.after_all → extensions.afterAll → cleanup

Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).

Error handling:

beforeAll / beforeEach extension failure aborts the affected run with an error result
afterAll / afterEach extension failure is non-fatal

File hook context: Exported functions receive a JSON-compatible object with case context:

{
  "workspace_path": "/home/user/.agentv/workspaces/run-123/case-01",
  "test_id": "case-01",
  "eval_run_id": "run-123",
  "case_input": "Fix the bug",
  "case_metadata": { "repo": "sympy/sympy", "source_commit": "abc123" }
}

workspace.hooks remains the reset-policy home for after_each.reset. Legacy command hooks still parse for existing local suites, but new portable evals should use extensions for executable setup.

Repository Lifecycle

Materialize git repositories into the shared eval workspace. Repo entries declare provenance only: the repository identity and checkout pin. AgentV resolves acquisition separately using registered projects, configured mirrors, its git cache, and finally remote clone. Define repos at the suite level or per test:

workspace:
  repos:
    - path: ./my-repo
      repo: https://github.com/org/repo.git
      commit: main
      ancestor: 1          # check out the parent commit
  hooks:
    after_each:
      reset: fast             # none | fast | strict
  scope: suite                # suite (default) | attempt

repo declares the repository identity. Acquisition is harness-owned: AgentV first applies configured repo_resolvers, then uses the built-in git path of registered projects, configured mirrors, AgentV’s git cache, and remote clone. See Workspace Architecture for the resolver order, command resolver protocol, and git_cache.mirrors config.

Field	Description
`repos[].path`	Directory within the workspace to clone into
`repos[].repo`	Repository identity: full clone URL or GitHub `org/name` shorthand
`repos[].commit`	Branch, tag, or SHA to check out (default: `HEAD`)
`repos[].ancestor`	Walk N commits back from the checked-out ref (e.g., `1` for parent)
`repos[].sparse`	Sparse checkout paths
`hooks.after_each.reset`	Reset policy after each test: `none`, `fast`, `strict`
`scope`	`suite` reuses one harness-managed workspace for the suite; `attempt` creates a clean workspace for each resolved execution attempt
`hooks.enabled`	Boolean (default: `true`). Set `false` to skip all lifecycle hooks.

Use scope: attempt when mutating agents need clean filesystem state for every prompt-target-test-repeat execution. Use scope: suite when the suite intentionally shares state across tests.

Existing local workspaces: do not commit local paths in eval YAML. Use --workspace-path /path/to/workspace for a one-off run, or put execution.workspace_path in .agentv/config.local.yaml.

Workspace command:

agentv workspace deps <eval-paths> — scan eval files and output a JSON manifest of required git repos (for CI pre-cloning)

Common patterns:

# Pinned commit
workspace:
  repos:
    - path: ./repo
      repo: https://github.com/org/repo.git
      commit: abc123def

# Multi-repo shared workspace with reset
workspace:
  repos:
    - path: ./frontend
      repo: https://github.com/org/frontend.git
    - path: ./backend
      repo: https://github.com/org/backend.git
  hooks:
    after_each:
      reset: fast

# GitHub shorthand with a pinned commit
workspace:
  repos:
    - path: ./repo
      repo: org/repo
      commit: abc123def

Cleanup Behavior

Default finish behavior:

Success: cleanup
Failure: keep

CLI overrides:

--retain-on-success keep|cleanup
--retain-on-failure keep|cleanup

cwd

Use cwd on a target to run in an existing directory (shared across tests). If not set, the eval file’s directory is used as the working directory.

Target Hooks

Eval files can define per-target hooks that run setup/teardown scripts to customize the workspace for each target variant. This enables comparing different harness configurations (e.g., baseline vs with-plugins) in a single eval file.

Targets do not declare repos. Repositories belong to the shared eval workspace so every target runs in the same world; target hooks customize the harness under evaluation. Use hooks for per-target setup such as enabling wrappers or changing provider-local config. Keep installs, builds, fixture generation, and case setup in top-level lifecycle extensions.

Target hooks can be scoped to an eval-local target object:

target:
  extends: default
  hooks:
    before_each:
      command: ["setup-plugins.sh", "skills"]

Hook execution order

Target hooks run after workspace hooks on setup, before workspace hooks on teardown:

Extension beforeAll
Target before_all
For each test:
- Workspace before_each
- Target before_each
- Test executes
- Target after_each
- Workspace after_each
Target after_all
Workspace after_all

Hook schema

Target hooks follow the same schema as workspace hooks:

hooks:
  before_all:
    command: ["setup.sh"]       # Command array or shell string
    timeout_ms: 60000           # Optional timeout
    cwd: "./scripts"            # Optional working directory
  before_each:
    command: "echo setup"       # String shorthand (runs via sh -c)
  after_each:
    command: ["cleanup.sh"]
  after_all:
    command: ["teardown.sh"]