KubernetesGhaRunnerScaleSet - Technical Documentation
Overview
The KubernetesGhaRunnerScaleSet deployment component enables declarative deployment of GitHub Actions self-hosted runners on Kubernetes clusters. It leverages the official Actions Runner Controller (ARC) Helm chart to create AutoScalingRunnerSet resources that dynamically scale based on workflow demand.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ GitHub │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Repository/Organization │ │
│ │ │ │
│ │ Workflow Job → Queue → Webhook → Controller → Runner │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────────────┐ ┌─────────────────────────────┐ │
│ │ Controller (ARC) │ │ Runner Scale Set │ │
│ │ ───────────────── │ │ ─────────────── │ │
│ │ Watches for jobs │───▶│ AutoScalingRunnerSet │ │
│ │ Creates runner pods │ │ EphemeralRunner pods │ │
│ │ Manages lifecycle │ │ PVCs for caching │ │
│ └──────────────────────┘ └─────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
How It Works
Registration Flow
- Helm chart creates an
AutoScalingRunnerSetcustom resource - Controller registers the scale set with GitHub via the config URL
- GitHub associates runners with the specified repository/organization/enterprise
- Runners appear in GitHub Settings → Actions → Runners
Job Execution Flow
- Workflow job is triggered with
runs-on: [self-hosted, scale-set-name] - GitHub queues the job and notifies the controller
- Controller creates an
EphemeralRunnerpod - Runner pod registers with GitHub, picks up the job
- Job executes in the runner pod
- Runner pod terminates after job completion
Scaling Behavior
| Scenario | Behavior |
|---|---|
| Jobs queued | Scale up to handle queue (up to maxRunners) |
| Jobs complete | Scale down to minRunners |
| No jobs | Maintain minRunners (can be 0 for cost savings) |
| Surge in jobs | Parallel runners up to maxRunners |
Container Modes
DIND (Docker-in-Docker)
containerMode:
type: DIND
- Runner pod includes a privileged DinD sidecar
- Supports
docker build,docker run, etc. - Required for workflows that build/push Docker images
- Requires: Privileged container support in cluster
KUBERNETES
containerMode:
type: KUBERNETES
workVolumeClaim:
storageClass: fast-ssd
size: "50Gi"
- Each workflow step runs as a separate Kubernetes pod
- Native Kubernetes container execution
- Uses container hooks for orchestration
- Requires: Ephemeral volume support, ServiceAccount permissions
KUBERNETES_NO_VOLUME
containerMode:
type: KUBERNETES_NO_VOLUME
- Same as KUBERNETES but without ephemeral volumes
- For clusters that don't support ephemeral volume claims
- Workspace is not persisted between steps
DEFAULT
containerMode:
type: DEFAULT
- Direct execution on the runner pod
- No container isolation for steps
- Simple workflows that don't need Docker
Persistent Volumes
Purpose
PVCs persist data across runner pod restarts, enabling:
- Dependency caching: npm, maven, gradle, pip packages
- Docker layer caching: Faster image builds
- Build artifacts: Share between jobs
Implementation
persistentVolumes:
- name: npm-cache
size: "20Gi"
storageClass: standard
mountPath: /home/runner/.npm
Creates:
- A
PersistentVolumeClaimnamed{release-name}-npm-cache - Volume mount in the runner container spec
- Volume reference in the pod spec
Cache Effectiveness
For optimal caching:
- Use
minRunners >= 1to keep at least one runner warm - PVCs are per-scale-set, not per-runner (shared cache)
- Consider storage class with good IOPS for large caches
Authentication
PAT Token
Personal Access Token authentication:
Permissions needed:
| Scope | Repository | Organization | Enterprise |
|---|---|---|---|
repo | Required | - | - |
admin:org | - | Required | - |
manage_runners:enterprise | - | - | Required |
Secret structure:
github_token: ghp_xxxxxxxxxxxx
GitHub App
Recommended for organizations:
Permissions needed:
- Repository:
actions:read,metadata:read - Organization:
self_hosted_runners:read/write
Secret structure:
github_app_id: "123456"
github_app_installation_id: "654321"
github_app_private_key: |
-----BEGIN RSA PRIVATE KEY-----
...
Existing Secret
For secrets provisioned outside this component:
github:
existingSecretName: my-github-secret
Secret must contain either PAT or GitHub App fields.
IaC Implementations
Pulumi Module
Location: iac/pulumi/module/
Key files:
main.go: Entry point, orchestrates deploymentlocals.go: Configuration parsing, defaults, exportsrunner.go: Helm release and PVC creationvars.go: Constants (chart name, repo, version)
Terraform Module
Location: iac/tf/
Key files:
main.tf: Namespace, PVCs, Helm releaselocals.tf: Value transformationsvariables.tf: Input variable definitionsoutputs.tf: Stack outputs
Relationship with Controller
The runner scale set requires the controller to be installed:
KubernetesGhaRunnerScaleSetController (one per cluster)
└── KubernetesGhaRunnerScaleSet (many per cluster)
├── Scale Set 1: repo runners
├── Scale Set 2: org runners
└── Scale Set 3: enterprise runners
Controller discovery:
- Helm chart looks for controller by label
app.kubernetes.io/part-of=gha-rs-controller - If not found, specify
controllerServiceAccountexplicitly
Troubleshooting
Runners Not Appearing in GitHub
- Check controller logs:
kubectl logs -n arc-system deploy/arc-controller-manager - Verify GitHub credentials: Secret must have correct keys
- Check config URL format:
https://github.com/<owner>/<repo>orhttps://github.com/<org>
Runners Stuck Pending
- Check PVC binding:
kubectl get pvc -n <namespace> - Verify storage class exists
- Check node resources for pod scheduling
Jobs Not Picked Up
- Verify
runs-onlabel matchesrunnerScaleSetName - Check runner group permissions in GitHub
- Ensure maxRunners > 0
Docker Not Working in DIND Mode
- Verify privileged containers are allowed (PSP/PSA)
- Check DinD sidecar logs
- Ensure DOCKER_HOST environment is set
Best Practices
- Start with minRunners: 0 for cost efficiency
- Set realistic maxRunners based on cluster capacity
- Use persistent volumes for dependency caching
- Prefer GitHub App over PAT for organizations
- Use runner groups to control repository access
- Monitor runner metrics via controller metrics endpoint
Next article