Rate limit guidance

Rate Limiting in Sprint Execution¶

This guide provides practical strategies for managing GitHub API rate limits during rapid sprint execution, automation workflows, and batch operations.

Quick Facts¶

Authenticated rate limit: 5,000 requests per hour
Per-minute average: ~83 requests per minute
Reset window: Hourly (UTC-based)
Critical threshold: ≤ 50 remaining requests triggers exponential backoff
Warning threshold: ≤ 200 remaining requests logs a warning

Rate Limit Context¶

The GitHub REST API enforces rate limits per authenticated user or OAuth token, not per repository or organization. This means:

All API calls against owner/repo count toward your token's quota
CI/CD workflows using repository secrets inherit the quota from that token
Concurrent API calls from multiple scripts/agents deplete quota faster than sequential calls
Each workflow dispatch, issue creation, comment, label assignment, and PR check consumes quota

During sprint execution with multiple workflows, agents making API calls, and batch operations, it's easy to exhaust 5,000 requests in minutes, leading to: - Blocked PRs (checks hang waiting for rate limit reset) - Failed deployments (workflow dispatch throttled) - Stalled agent operations (API calls time out)

Sprint Execution Guidance: 60-Second Minimum Spacing¶

Rule: Space major API-consuming operations ≥ 60 seconds apart.

Major operations include: - Workflow dispatch (gh workflow run) - Creating issues/PRs - Bulk label assignments - Running validation scripts with multiple API calls - Triggering status checks

Why 60 seconds? - Leaves headroom for background polling (code review agents, metrics collection) - Allows gh CLI automatic retries to complete - Gives rate limit headers time to update - Prevents CI queue buildup

Example sprint sequence:

10:00:00 — Dispatch workflow 1 (API: ~5 requests)
10:01:00 — Create issue with labels (API: ~3 requests)
10:02:00 — Dispatch workflow 2 (API: ~5 requests)
10:03:00 — Run validation script (API: ~20 requests)
10:04:00 — Create PR (API: ~4 requests)
---
Total: 10 minutes elapsed, ~37 API requests (well under 83/min average)

Polling Strategies with Exponential Backoff¶

Scenario: You're polling a workflow run status until it completes.

Backoff formula:

delay = min(initialDelay * (multiplier ^ attempt), maxDelay)
Initial: 30 seconds
Multiplier: 1.5
Max delay: 90 seconds

Sequence:

Attempt 1: Wait 30s, then poll
Attempt 2: Wait 45s, then poll
Attempt 3: Wait 67.5s, then poll
Attempt 4+: Wait 90s, then poll

PowerShell example:

function Invoke-WithExponentialBackoff {
    param(
        [scriptblock]$Action,
        [int]$MaxAttempts = 5,
        [int]$InitialDelaySeconds = 30,
        [decimal]$Multiplier = 1.5,
        [int]$MaxDelaySeconds = 90
    )

    for ($attempt = 1; $attempt -le $MaxAttempts; $attempt++) {
        try {
            return & $Action
        } catch {
            if ($attempt -eq $MaxAttempts) { throw }

            $delay = [math]::Min(
                $InitialDelaySeconds * [math]::Pow($Multiplier, $attempt - 1),
                $MaxDelaySeconds
            )

            Write-Warning "Attempt $attempt failed. Waiting $($delay)s before retry..."
            Start-Sleep -Seconds $delay
        }
    }
}

# Usage
$result = Invoke-WithExponentialBackoff {
    gh api repos/owner/repo/actions/runs/12345 --jq '.status'
}

Batch Operations: 5 per 60 Seconds¶

Rule: Don't fire more than 5 API-consuming operations in a 60-second window.

Guidance:

Operation Type	Requests per Call	Max Batch Size	Recommended Interval
Workflow dispatch	2–5	5	60s between batches
Issue creation	3–5 (with labels)	5	60s between batches
PR comment	1–2	10	30s between batches
Label assignment	1	20	10s between batches

Example: Dispatching 20 workflow runs

$workflows = @( "ci.yml", "test.yml", "build.yml", "deploy.yml", "validate.yml" )
$dispatch = $workflows * 4  # repeat 4 times for 20 total

for ($i = 0; $i -lt $dispatch.Count; $i += 5) {
    $batch = $dispatch[$i..($i + 4)]  # take 5 at a time

    foreach ($wf in $batch) {
        Write-Host "Dispatching $wf..."
        gh workflow run $wf --ref main
    }

    if ($i + 5 -lt $dispatch.Count) {
        Write-Host "Batch complete. Waiting 60s before next batch..."
        Start-Sleep -Seconds 60
    }
}

Expected duration: 20 runs × 2 requests each = 40 API calls, distributed over 4 batches × 60s = ~4 minutes.

Logging Rate Limit Headers¶

Every automation script should log rate limit information for debugging.

PowerShell pattern:

function Get-ApiWithRateLimitLogging {
    param([string]$Endpoint)

    $response = gh api $Endpoint --include 'headers' 2>&1
    $headers = $response | Select-Object -Last 10

    # Extract rate limit info
    $remaining = $headers | grep -i "x-ratelimit-remaining" | cut -d' ' -f2
    $reset = $headers | grep -i "x-ratelimit-reset" | cut -d' ' -f2
    $limit = $headers | grep -i "x-ratelimit-limit" | cut -d' ' -f2

    Write-Host "Rate limit: $remaining / $limit remaining (resets at $reset)"

    if ($remaining -lt 50) {
        $resetTime = [DateTime]::UnixEpoch.AddSeconds($reset)
        Write-Error "Critical: Only $remaining requests remaining! Resetting at $resetTime"
        Start-Sleep -Seconds ([math]::Max(0, ($resetTime - [DateTime]::UtcNow).TotalSeconds + 5))
    }
    elseif ($remaining -lt 200) {
        Write-Warning "Warning: Only $remaining requests remaining in quota"
    }

    return $response
}

CI/CD integration (GitHub Actions):

- name: Check rate limit before batch operations
  run: |
    REMAINING=$(gh api rate_limit --jq '.rate.remaining')
    LIMIT=$(gh api rate_limit --jq '.rate.limit')
    RESET=$(gh api rate_limit --jq '.rate.reset')
    echo "Rate limit: $REMAINING / $LIMIT (resets at $(date -d @$RESET))"

    if [ $REMAINING -lt 50 ]; then
      echo "ERROR: Only $REMAINING requests remaining!"
      exit 1
    fi

Rate Limit Headers Explained¶

Header	Meaning	Action
`X-RateLimit-Limit`	Total quota this window	Monitor for changes (usually 5000)
`X-RateLimit-Remaining`	Requests left	Pause if < 50; warn if < 200
`X-RateLimit-Used`	Requests consumed	Track for reporting
`X-RateLimit-Reset`	Unix timestamp of reset	Wait until this time if exhausted
`X-RateLimit-Resource`	Quota being tracked	Usually `core` for REST API

CLI commands to inspect:

# Check current quota
gh api rate_limit

# Parse structured output
gh api rate_limit --jq '.rate | "Used: \(.used), Remaining: \(.remaining), Reset: \(.reset)"'

# Monitor rate limit in a loop (useful during batch operations)
for ($i = 0; $i -lt 10; $i++) {
    $remaining = $(gh api rate_limit --jq '.rate.remaining')
    Write-Host "[$(Get-Date)] Remaining: $remaining"
    Start-Sleep -Seconds 10
}

Practical Example: Sprint Validation Script¶

Imagine you're validating 50 repositories. Here's a robust approach:

function Validate-RepositoriesSafely {
    param(
        [string[]]$Repos,
        [int]$BatchSize = 5,
        [int]$BatchIntervalSeconds = 60
    )

    for ($i = 0; $i -lt $Repos.Count; $i += $BatchSize) {
        $batch = $Repos[$i..($i + $BatchSize - 1)]

        Write-Host "Processing batch: $($i / $BatchSize + 1) / $([math]::Ceiling($Repos.Count / $BatchSize))"

        # Check rate limit before batch
        $remaining = $(gh api rate_limit --jq '.rate.remaining')
        if ($remaining -lt 50) {
            Write-Error "Quota exhausted. Aborting."
            exit 1
        }

        foreach ($repo in $batch) {
            Write-Host "  Validating $repo..."
            gh api repos/$repo --jq '.description'
        }

        if ($i + $BatchSize -lt $Repos.Count) {
            Write-Host "Batch complete. Waiting $BatchIntervalSeconds seconds before next batch..."
            Start-Sleep -Seconds $BatchIntervalSeconds
        }
    }

    Write-Host "All repositories validated successfully."
}

Common Failure Modes¶

Symptom	Root Cause	Fix
"API rate limit exceeded" error mid-sprint	Concurrent scripts, no backoff	Implement backoff; space operations 60s apart
PR checks hang for 1 hour	Rate limit exhausted, auto-reset pending	Check rate limit before workflows; add quota buffer
Workflow dispatch "pending" forever	Status check polling depleted quota	Reduce polling frequency; increase backoff delay
Flaky tests due to timeout	Script waiting for rate limit reset	Add exponential backoff; log rate limit headers

Checklist for Sprint Planning¶

[ ] All polling scripts implement exponential backoff (30–90s)
[ ] Workflow dispatches are spaced ≥ 60s apart
[ ] Batch operations respect 5-per-60s limit
[ ] Rate limit check runs before major operations (e.g., gh api rate_limit)
[ ] Rate limit headers are logged in CI (for debugging throttling issues)
[ ] If ≥ 200 workflows run in sprint, consider using GitHub Apps (separate quota)
[ ] Dry-run test with --dry-run flag where available before production run

AI Agent / Copilot Fleet Rate Limits¶

This is a separate constraint from the GitHub REST API limit above. Enterprise Copilot has concurrent model session limits enforced at the org level. Exceeding them returns HTTP 429 from the AI model endpoint — not from the GitHub API.

How it differs from the GitHub API limit¶

Dimension	GitHub REST API	Copilot / AI Model
Unit	Requests per hour per token	Concurrent sessions per org
Limit	5,000 req/hr	~3–4 simultaneous long-running sessions
Reset	Hourly (rolling)	Immediate once sessions complete
Error	`API rate limit exceeded`	HTTP 429 on model inference
Scope	Your PAT / GitHub App	Your enterprise Copilot seat allocation

Fleet Concurrency Rules¶

These are enforced limits, not suggestions:

Scenario	Max concurrent agents	Notes
Background fleet agents (general-purpose)	3 safe, 4 risky, 5+ will 429	Each agent holds a model session for its full duration
Short tasks (< 2 min)	Up to 5	Session released quickly
Long tasks (> 10 min)	Max 3	Hold sessions for extended periods
Mixed (short + long)	3 long + 1–2 short	Budget carefully

Wave Pattern for Fleet Sprints¶

Dispatch agents in waves — never all at once:

Wave 1: dispatch 3–4 agents → wait for ALL to complete
Wave 2: dispatch next 3–4 → wait for ALL to complete
Wave 3: etc.

Minimum inter-agent delay: Wait at least 15 seconds between dispatching individual agents within a wave. This staggers session establishment and reduces burst pressure on the model endpoint.

# Good: staggered dispatch within a wave
foreach ($task in $wave) {
    Start-Agent $task
    Start-Sleep -Seconds 15  # stagger session starts
}
# Then wait for all to complete before Wave 2

Recovery from a 429¶

When an agent gets a 429 from the AI model endpoint:

Stop dispatching — do not retry immediately
Wait 60–90 seconds — allow in-flight sessions to complete
Check active agents — use list_agents to see how many are still running
Resume with 1 fewer agent — if 4 caused a 429, next wave uses 3

# Check remaining rate limit before dispatching next wave
$remaining = gh api rate_limit --jq '.rate.remaining'
if ($remaining -lt 100) {
    Write-Warning "GitHub API quota low ($remaining). Pausing 60s."
    Start-Sleep -Seconds 60
}
# Also manually verify no long-running agents are still active before Wave N+1

Configuration¶

The default fleet concurrency is set in .github/base-coat/agent-routing.json (distributed to consumer repos via sync):

{
  "configurable": {
    "max_fleet_agents": 4,
    "default_fleet_concurrency": 3
  }
}

Override default_fleet_concurrency to 2 for repos with heavy background CI load (code review agents, security scans) running concurrently with fleet sprints.

Checklist for Fleet Sprints¶

[ ] Plan wave sizes ≤ 3 for long-running tasks, ≤ 4 for short tasks
[ ] Add 15s delay between agent dispatches within a wave
[ ] Wait for all wave N agents to complete before starting wave N+1
[ ] If a 429 occurs: stop, wait 90s, reduce next wave size by 1
[ ] Check list_agents before any dispatch — don't add to an already-busy pool
[ ] For sprints > 10 agents total, log a plan with wave breakdown before starting

#451 — Concurrency control: Coordinating parallel batch operations without quota collision
#443 — Observed problem: Sprint execution blocked due to rate limit exhaustion; caused by unspaced workflow dispatches