Rate limit guidance
Rate Limiting in Sprint Execution¶
This guide provides practical strategies for managing GitHub API rate limits during rapid sprint execution, automation workflows, and batch operations.
Quick Facts¶
- Authenticated rate limit: 5,000 requests per hour
- Per-minute average: ~83 requests per minute
- Reset window: Hourly (UTC-based)
- Critical threshold: ≤ 50 remaining requests triggers exponential backoff
- Warning threshold: ≤ 200 remaining requests logs a warning
Rate Limit Context¶
The GitHub REST API enforces rate limits per authenticated user or OAuth token, not per repository or organization. This means:
- All API calls against
owner/repocount toward your token's quota - CI/CD workflows using repository secrets inherit the quota from that token
- Concurrent API calls from multiple scripts/agents deplete quota faster than sequential calls
- Each workflow dispatch, issue creation, comment, label assignment, and PR check consumes quota
During sprint execution with multiple workflows, agents making API calls, and batch operations, it's easy to exhaust 5,000 requests in minutes, leading to: - Blocked PRs (checks hang waiting for rate limit reset) - Failed deployments (workflow dispatch throttled) - Stalled agent operations (API calls time out)
Sprint Execution Guidance: 60-Second Minimum Spacing¶
Rule: Space major API-consuming operations ≥ 60 seconds apart.
Major operations include:
- Workflow dispatch (gh workflow run)
- Creating issues/PRs
- Bulk label assignments
- Running validation scripts with multiple API calls
- Triggering status checks
Why 60 seconds?
- Leaves headroom for background polling (code review agents, metrics collection)
- Allows gh CLI automatic retries to complete
- Gives rate limit headers time to update
- Prevents CI queue buildup
Example sprint sequence:
10:00:00 — Dispatch workflow 1 (API: ~5 requests)
10:01:00 — Create issue with labels (API: ~3 requests)
10:02:00 — Dispatch workflow 2 (API: ~5 requests)
10:03:00 — Run validation script (API: ~20 requests)
10:04:00 — Create PR (API: ~4 requests)
---
Total: 10 minutes elapsed, ~37 API requests (well under 83/min average)
Polling Strategies with Exponential Backoff¶
Scenario: You're polling a workflow run status until it completes.
Backoff formula:
delay = min(initialDelay * (multiplier ^ attempt), maxDelay)
Initial: 30 seconds
Multiplier: 1.5
Max delay: 90 seconds
Sequence:
Attempt 1: Wait 30s, then poll
Attempt 2: Wait 45s, then poll
Attempt 3: Wait 67.5s, then poll
Attempt 4+: Wait 90s, then poll
PowerShell example:
function Invoke-WithExponentialBackoff {
param(
[scriptblock]$Action,
[int]$MaxAttempts = 5,
[int]$InitialDelaySeconds = 30,
[decimal]$Multiplier = 1.5,
[int]$MaxDelaySeconds = 90
)
for ($attempt = 1; $attempt -le $MaxAttempts; $attempt++) {
try {
return & $Action
} catch {
if ($attempt -eq $MaxAttempts) { throw }
$delay = [math]::Min(
$InitialDelaySeconds * [math]::Pow($Multiplier, $attempt - 1),
$MaxDelaySeconds
)
Write-Warning "Attempt $attempt failed. Waiting $($delay)s before retry..."
Start-Sleep -Seconds $delay
}
}
}
# Usage
$result = Invoke-WithExponentialBackoff {
gh api repos/owner/repo/actions/runs/12345 --jq '.status'
}
Batch Operations: 5 per 60 Seconds¶
Rule: Don't fire more than 5 API-consuming operations in a 60-second window.
Guidance:
| Operation Type | Requests per Call | Max Batch Size | Recommended Interval |
|---|---|---|---|
| Workflow dispatch | 2–5 | 5 | 60s between batches |
| Issue creation | 3–5 (with labels) | 5 | 60s between batches |
| PR comment | 1–2 | 10 | 30s between batches |
| Label assignment | 1 | 20 | 10s between batches |
Example: Dispatching 20 workflow runs
$workflows = @( "ci.yml", "test.yml", "build.yml", "deploy.yml", "validate.yml" )
$dispatch = $workflows * 4 # repeat 4 times for 20 total
for ($i = 0; $i -lt $dispatch.Count; $i += 5) {
$batch = $dispatch[$i..($i + 4)] # take 5 at a time
foreach ($wf in $batch) {
Write-Host "Dispatching $wf..."
gh workflow run $wf --ref main
}
if ($i + 5 -lt $dispatch.Count) {
Write-Host "Batch complete. Waiting 60s before next batch..."
Start-Sleep -Seconds 60
}
}
Expected duration: 20 runs × 2 requests each = 40 API calls, distributed over 4 batches × 60s = ~4 minutes.
Logging Rate Limit Headers¶
Every automation script should log rate limit information for debugging.
PowerShell pattern:
function Get-ApiWithRateLimitLogging {
param([string]$Endpoint)
$response = gh api $Endpoint --include 'headers' 2>&1
$headers = $response | Select-Object -Last 10
# Extract rate limit info
$remaining = $headers | grep -i "x-ratelimit-remaining" | cut -d' ' -f2
$reset = $headers | grep -i "x-ratelimit-reset" | cut -d' ' -f2
$limit = $headers | grep -i "x-ratelimit-limit" | cut -d' ' -f2
Write-Host "Rate limit: $remaining / $limit remaining (resets at $reset)"
if ($remaining -lt 50) {
$resetTime = [DateTime]::UnixEpoch.AddSeconds($reset)
Write-Error "Critical: Only $remaining requests remaining! Resetting at $resetTime"
Start-Sleep -Seconds ([math]::Max(0, ($resetTime - [DateTime]::UtcNow).TotalSeconds + 5))
}
elseif ($remaining -lt 200) {
Write-Warning "Warning: Only $remaining requests remaining in quota"
}
return $response
}
CI/CD integration (GitHub Actions):
- name: Check rate limit before batch operations
run: |
REMAINING=$(gh api rate_limit --jq '.rate.remaining')
LIMIT=$(gh api rate_limit --jq '.rate.limit')
RESET=$(gh api rate_limit --jq '.rate.reset')
echo "Rate limit: $REMAINING / $LIMIT (resets at $(date -d @$RESET))"
if [ $REMAINING -lt 50 ]; then
echo "ERROR: Only $REMAINING requests remaining!"
exit 1
fi
Rate Limit Headers Explained¶
| Header | Meaning | Action |
|---|---|---|
X-RateLimit-Limit |
Total quota this window | Monitor for changes (usually 5000) |
X-RateLimit-Remaining |
Requests left | Pause if < 50; warn if < 200 |
X-RateLimit-Used |
Requests consumed | Track for reporting |
X-RateLimit-Reset |
Unix timestamp of reset | Wait until this time if exhausted |
X-RateLimit-Resource |
Quota being tracked | Usually core for REST API |
CLI commands to inspect:
# Check current quota
gh api rate_limit
# Parse structured output
gh api rate_limit --jq '.rate | "Used: \(.used), Remaining: \(.remaining), Reset: \(.reset)"'
# Monitor rate limit in a loop (useful during batch operations)
for ($i = 0; $i -lt 10; $i++) {
$remaining = $(gh api rate_limit --jq '.rate.remaining')
Write-Host "[$(Get-Date)] Remaining: $remaining"
Start-Sleep -Seconds 10
}
Practical Example: Sprint Validation Script¶
Imagine you're validating 50 repositories. Here's a robust approach:
function Validate-RepositoriesSafely {
param(
[string[]]$Repos,
[int]$BatchSize = 5,
[int]$BatchIntervalSeconds = 60
)
for ($i = 0; $i -lt $Repos.Count; $i += $BatchSize) {
$batch = $Repos[$i..($i + $BatchSize - 1)]
Write-Host "Processing batch: $($i / $BatchSize + 1) / $([math]::Ceiling($Repos.Count / $BatchSize))"
# Check rate limit before batch
$remaining = $(gh api rate_limit --jq '.rate.remaining')
if ($remaining -lt 50) {
Write-Error "Quota exhausted. Aborting."
exit 1
}
foreach ($repo in $batch) {
Write-Host " Validating $repo..."
gh api repos/$repo --jq '.description'
}
if ($i + $BatchSize -lt $Repos.Count) {
Write-Host "Batch complete. Waiting $BatchIntervalSeconds seconds before next batch..."
Start-Sleep -Seconds $BatchIntervalSeconds
}
}
Write-Host "All repositories validated successfully."
}
Common Failure Modes¶
| Symptom | Root Cause | Fix |
|---|---|---|
| "API rate limit exceeded" error mid-sprint | Concurrent scripts, no backoff | Implement backoff; space operations 60s apart |
| PR checks hang for 1 hour | Rate limit exhausted, auto-reset pending | Check rate limit before workflows; add quota buffer |
| Workflow dispatch "pending" forever | Status check polling depleted quota | Reduce polling frequency; increase backoff delay |
| Flaky tests due to timeout | Script waiting for rate limit reset | Add exponential backoff; log rate limit headers |
Checklist for Sprint Planning¶
- [ ] All polling scripts implement exponential backoff (30–90s)
- [ ] Workflow dispatches are spaced ≥ 60s apart
- [ ] Batch operations respect 5-per-60s limit
- [ ] Rate limit check runs before major operations (e.g.,
gh api rate_limit) - [ ] Rate limit headers are logged in CI (for debugging throttling issues)
- [ ] If ≥ 200 workflows run in sprint, consider using GitHub Apps (separate quota)
- [ ] Dry-run test with
--dry-runflag where available before production run
AI Agent / Copilot Fleet Rate Limits¶
This is a separate constraint from the GitHub REST API limit above. Enterprise Copilot has concurrent model session limits enforced at the org level. Exceeding them returns HTTP 429 from the AI model endpoint — not from the GitHub API.
How it differs from the GitHub API limit¶
| Dimension | GitHub REST API | Copilot / AI Model |
|---|---|---|
| Unit | Requests per hour per token | Concurrent sessions per org |
| Limit | 5,000 req/hr | ~3–4 simultaneous long-running sessions |
| Reset | Hourly (rolling) | Immediate once sessions complete |
| Error | API rate limit exceeded |
HTTP 429 on model inference |
| Scope | Your PAT / GitHub App | Your enterprise Copilot seat allocation |
Fleet Concurrency Rules¶
These are enforced limits, not suggestions:
| Scenario | Max concurrent agents | Notes |
|---|---|---|
| Background fleet agents (general-purpose) | 3 safe, 4 risky, 5+ will 429 | Each agent holds a model session for its full duration |
| Short tasks (< 2 min) | Up to 5 | Session released quickly |
| Long tasks (> 10 min) | Max 3 | Hold sessions for extended periods |
| Mixed (short + long) | 3 long + 1–2 short | Budget carefully |
Wave Pattern for Fleet Sprints¶
Dispatch agents in waves — never all at once:
Wave 1: dispatch 3–4 agents → wait for ALL to complete
Wave 2: dispatch next 3–4 → wait for ALL to complete
Wave 3: etc.
Minimum inter-agent delay: Wait at least 15 seconds between dispatching individual agents within a wave. This staggers session establishment and reduces burst pressure on the model endpoint.
# Good: staggered dispatch within a wave
foreach ($task in $wave) {
Start-Agent $task
Start-Sleep -Seconds 15 # stagger session starts
}
# Then wait for all to complete before Wave 2
Recovery from a 429¶
When an agent gets a 429 from the AI model endpoint:
- Stop dispatching — do not retry immediately
- Wait 60–90 seconds — allow in-flight sessions to complete
- Check active agents — use
list_agentsto see how many are still running - Resume with 1 fewer agent — if 4 caused a 429, next wave uses 3
# Check remaining rate limit before dispatching next wave
$remaining = gh api rate_limit --jq '.rate.remaining'
if ($remaining -lt 100) {
Write-Warning "GitHub API quota low ($remaining). Pausing 60s."
Start-Sleep -Seconds 60
}
# Also manually verify no long-running agents are still active before Wave N+1
Configuration¶
The default fleet concurrency is set in .github/base-coat/agent-routing.json
(distributed to consumer repos via sync):
Override default_fleet_concurrency to 2 for repos with heavy background CI
load (code review agents, security scans) running concurrently with fleet sprints.
Checklist for Fleet Sprints¶
- [ ] Plan wave sizes ≤ 3 for long-running tasks, ≤ 4 for short tasks
- [ ] Add 15s delay between agent dispatches within a wave
- [ ] Wait for all wave N agents to complete before starting wave N+1
- [ ] If a 429 occurs: stop, wait 90s, reduce next wave size by 1
- [ ] Check
list_agentsbefore any dispatch — don't add to an already-busy pool - [ ] For sprints > 10 agents total, log a plan with wave breakdown before starting
Related Issues¶
- #451 — Concurrency control: Coordinating parallel batch operations without quota collision
- #443 — Observed problem: Sprint execution blocked due to rate limit exhaustion; caused by unspaced workflow dispatches