Troubleshooting Guide

Solutions to common problems you might encounter using Project Planton.


Manifest Validation Errors

"kind not supported" or "Unsupported cloud resource kind"

Symptom: CLI doesn't recognize your resource kind.

Common Causes:

  • Typo in kind field (case-sensitive)
  • Kind doesn't exist in Project Planton
  • Wrong apiVersion for the kind

Solutions:

# 1. Check spelling (kinds are PascalCase)
# Wrong: awsS3Bucket, aws_s3_bucket
# Right: AwsS3Bucket

# 2. Verify kind exists in catalog
# Browse: https://project-planton.org/docs/catalog

# 3. Check available kinds in your CLI version
# (List is shown in error message)

"validation error: spec.field ..."

Symptom: Manifest validation fails with field-specific errors.

Common Examples:

spec.replicas: value must be >= 1 and <= 10 (got: 0)
spec.container.cpu: value must match pattern "^[0-9]+m$" (got: "2cores")
spec.region: value is required

Solutions:

# 1. Read the error message carefully - it tells you exactly what's wrong

# 2. Check field types in proto definition or docs
# Integers: replicas: 3 (not "3")
# Strings: region: "us-west-2"
# Booleans: enabled: true (not "true")

# 3. Verify value formats
# CPU: "500m" or "2" (not "2cores" or "500mcpu")
# Memory: "1Gi" or "512Mi" (not "1GB" or "512MB")

# 4. Check required fields
# If error says "value is required", you must provide it

YAML Syntax Errors

Symptom: "failed to load yaml to json" or similar parsing errors.

Solutions:

# Validate YAML syntax
cat manifest.yaml | yq .

# Or use Python
python3 -c "import yaml; yaml.safe_load(open('manifest.yaml'))"

# Common issues:
# - Inconsistent indentation (use spaces, not tabs)
# - Missing colons after keys
# - Unquoted strings with special characters

Authentication & Credentials

AWS: "Unable to locate credentials"

Symptom: AWS provider can't find credentials.

Solutions:

# Check environment variables
env | grep AWS

# Should show:
# AWS_ACCESS_KEY_ID=AKIA...
# AWS_SECRET_ACCESS_KEY=...

# If not set:
export AWS_ACCESS_KEY_ID="your-key-id"
export AWS_SECRET_ACCESS_KEY="your-secret"
export AWS_DEFAULT_REGION="us-west-2"

# Or use AWS profile
export AWS_PROFILE=production

# Verify credentials work
aws sts get-caller-identity

AWS: "Access Denied" or Permission Errors

Symptom: Credentials work but lack permissions.

Solutions:

# Check what identity you're using
aws sts get-caller-identity

# Check attached policies
aws iam list-attached-user-policies --user-name your-username

# Common issues:
# - Need specific IAM permissions for resource type
# - Resource locked by SCPs or permission boundaries
# - Wrong AWS account/region

# Contact your AWS administrator to grant necessary permissions

GCP: "Application Default Credentials not found"

Symptom: GCP provider can't find credentials.

Solutions:

# Option 1: Use service account key
export GOOGLE_APPLICATION_CREDENTIALS=~/gcp-key.json
project-planton pulumi up --manifest resource.yaml

# Option 2: Use application default credentials (local dev)
gcloud auth application-default login
project-planton pulumi up --manifest resource.yaml

# Verify credentials
gcloud auth list
gcloud config get-value project

GCP: Permission Denied Errors

Symptom: Credentials work but can't create resources.

Solutions:

# Check current project
gcloud config get-value project

# Check service account permissions
gcloud projects get-iam-policy PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:YOUR_SA@*"

# Common issues:
# - Service account lacks necessary roles
# - APIs not enabled (e.g., GKE API for clusters)
# - Organization policies blocking

# Enable required APIs
gcloud services enable container.googleapis.com  # For GKE
gcloud services enable sqladmin.googleapis.com   # For Cloud SQL

Azure: "Failed to authenticate"

Symptom: Azure provider can't authenticate.

Solutions:

# Check environment variables
env | grep ARM_

# Should show:
# ARM_CLIENT_ID=abc-123
# ARM_CLIENT_SECRET=xyz-789
# ARM_TENANT_ID=def-456
# ARM_SUBSCRIPTION_ID=...

# Or use Azure CLI login
az login
az account show

# Verify correct subscription
az account list
az account set --subscription "My Subscription"

Cloudflare: "Authentication failed"

Symptom: Can't authenticate with Cloudflare.

Solutions:

# Check API token
echo $CLOUDFLARE_API_TOKEN

# Test token
curl -X GET "https://api.cloudflare.com/client/v4/user/tokens/verify" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"

# Common issues:
# - Token expired
# - Token has insufficient permissions
# - Wrong token type (API token vs API key)

# Create new token with correct permissions in Cloudflare dashboard

Pulumi-Specific Issues

"no stack named '...' found"

Symptom: Pulumi can't find the stack.

Solutions:

# Option 1: Initialize the stack first
project-planton pulumi init --manifest resource.yaml
project-planton pulumi preview --manifest resource.yaml

# Option 2: Use 'up' which auto-creates stack
project-planton pulumi up --manifest resource.yaml

# Option 3: Check stack label in manifest
# Ensure manifest has:
metadata:
  labels:
    pulumi.project-planton.org/stack.name: "org/project/stack"

"another update is currently in progress"

Symptom: Stack is locked by another operation.

Solutions:

# Check if operation is actually running
pulumi stack --stack <stack-fqdn>

# If no operation running, cancel the lock
pulumi cancel --stack <stack-fqdn>

# Then retry
project-planton pulumi up --manifest resource.yaml

"Stack still has resources" (during delete)

Symptom: Can't delete stack because resources still exist.

Solutions:

# Destroy resources first
project-planton pulumi destroy --manifest resource.yaml

# Then delete stack
project-planton pulumi delete --manifest resource.yaml

# Or if resources are actually gone (state is wrong):
project-planton pulumi refresh --manifest resource.yaml
project-planton pulumi delete --manifest resource.yaml

# Or force delete (use with caution)
project-planton pulumi delete --manifest resource.yaml --force

OpenTofu-Specific Issues

"Backend initialization required"

Symptom: OpenTofu commands fail with initialization error.

Solutions:

# Run init first
project-planton tofu init --manifest resource.yaml

# Then try command again
project-planton tofu plan --manifest resource.yaml

"state locked"

Symptom: State file is locked by another operation.

Solutions:

# Wait for other operation to complete

# Or force unlock (if certain no operation is running)
cd <module-directory>
tofu force-unlock <lock-id>

# The lock-id is shown in the error message

"Error acquiring the state lock"

Symptom: Can't acquire lock on state file.

Solutions:

# Check backend configuration
# - Ensure backend supports locking (S3 with DynamoDB, GCS, etc.)
# - Local backend doesn't support locking

# For S3 backend, verify DynamoDB table exists
aws dynamodb describe-table --table-name terraform-locks

# If using local backend, only one operation at a time

Kustomize Issues

"kustomization path does not exist"

Symptom: Can't find kustomize directory or overlay.

Solutions:

# Verify directory structure
ls -R services/api/kustomize/

# Should contain:
# overlays/prod/kustomization.yaml

# Check overlay name spelling
# Wrong: --overlay production
# Right: --overlay prod (must match directory name)

# Verify full path
ls services/api/kustomize/overlays/prod/kustomization.yaml

"accumulating resources: accumulation err='accumulating resources from '../../base': ..."

Symptom: Kustomize can't build overlay.

Solutions:

# 1. Verify kustomization.yaml syntax
cat overlays/prod/kustomization.yaml

# 2. Check resource paths exist
# If kustomization.yaml references "../../base"
ls -la overlays/prod/../../base

# 3. Test kustomize directly
cd services/api/kustomize
kustomize build overlays/prod

# Error messages from kustomize are more detailed

Deployment Failures

"resource already exists"

Symptom: Can't create resource because it already exists.

Causes:

  • Resource created outside of Pulumi/OpenTofu
  • State file lost/corrupted
  • Deploying to wrong environment

Solutions:

# Option 1: Import existing resource (Pulumi)
pulumi import <type> <name> <id> --stack <stack-fqdn>

# Option 2: Import existing resource (OpenTofu)
cd <module-dir>
tofu import <resource-type>.<resource-name> <id>

# Option 3: Delete existing resource manually
# (via cloud console or CLI)

# Option 4: Use different resource name
vim manifest.yaml  # Change metadata.name

Partial Deployment Failures

Symptom: Some resources created, others failed.

Solutions:

# For Pulumi: State automatically updated, can re-run 'up'
project-planton pulumi up --manifest resource.yaml
# Will continue from where it failed

# For OpenTofu: State updated, can re-run 'apply'
project-planton tofu apply --manifest resource.yaml
# Will attempt to create failed resources

# If repeatedly failing:
# 1. Check error messages
# 2. Fix underlying issue (permissions, quotas, etc.)
# 3. Retry

Timeout Errors

Symptom: Deployment times out waiting for resource.

Causes:

  • Resource taking longer than expected (e.g., database initialization)
  • Resource stuck in provisioning state
  • Provider API issues

Solutions:

# Check resource status in cloud console
# - Is it actually creating?
# - Any errors shown?

# Wait and retry
# Some resources (databases, clusters) can take 10-20 minutes

# For Kubernetes resources: check pod status
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>

Module Issues

"module not found"

Symptom: Can't find IaC module directory.

Solutions:

# Check module directory exists
ls -la <module-dir>

# Should contain:
# - Pulumi: main.go, go.mod, Pulumi.yaml
# - OpenTofu: *.tf files

# Verify --module-dir path
project-planton pulumi up \
  --manifest resource.yaml \
  --module-dir /full/path/to/module  # Use absolute path

# Or cd to module directory
cd /path/to/module
project-planton pulumi up --manifest ~/manifests/resource.yaml

Module Compilation Errors

Symptom: Pulumi module fails to compile.

Causes:

  • Missing Go dependencies
  • Syntax errors in module code
  • Incompatible Go version

Solutions:

# Verify Go installation
go version

# Update module dependencies
cd <module-dir>
go mod tidy
go mod download

# Test compilation
go build .

# Check for syntax errors
go vet ./...

State Issues

Drift Detection

Symptom: Preview/plan shows unexpected changes.

Causes:

  • Manual changes made outside Pulumi/OpenTofu
  • Provider defaults changed
  • Upstream dependencies changed

Solutions:

# For Pulumi: refresh state
project-planton pulumi refresh --manifest resource.yaml

# For OpenTofu: refresh state
project-planton tofu refresh --manifest resource.yaml

# Then plan again to see if changes persist
project-planton pulumi preview --manifest resource.yaml
project-planton tofu plan --manifest resource.yaml

# If unexpected changes remain:
# - Investigate who made manual changes
# - Update manifest to match reality
# - Or revert changes via apply/up

State File Corruption

Symptom: State file is corrupted or lost.

Solutions:

# For Pulumi: export and restore state
pulumi stack export --stack <stack-fqdn> > backup.json
# If corrupted, restore from backup
pulumi stack import --stack <stack-fqdn> < backup.json

# For OpenTofu: restore from backend backup
# (Depends on backend - S3 versioning, GCS versioning, etc.)

# If no backup available:
# - Manually recreate state by importing resources
# - Or destroy and recreate resources

Network & Connectivity

"connection refused" or Timeout Errors

Symptom: Can't connect to cloud provider APIs.

Solutions:

# Check internet connection
curl -I https://api.github.com

# Check if cloud provider APIs are accessible
curl -I https://api.aws.amazon.com    # AWS
curl -I https://www.googleapis.com    # GCP
curl -I https://api.cloudflare.com    # Cloudflare

# If behind corporate proxy:
export HTTP_PROXY=http://proxy.company.com:8080
export HTTPS_PROXY=http://proxy.company.com:8080

# Check firewall rules
# - Corporate firewall may block cloud provider APIs
# - VPN may be required

Installation & Prerequisites

"pulumi: command not found"

Symptom: Pulumi CLI not installed.

Solutions:

# Install Pulumi
brew install pulumi

# Or download from pulumi.com
curl -fsSL https://get.pulumi.com | sh

# Verify installation
pulumi version

"tofu: command not found"

Symptom: OpenTofu CLI not installed.

Solutions:

# Install OpenTofu
brew install opentofu

# Or download from opentofu.org

# Verify installation
tofu version

"project-planton: command not found"

Symptom: Project Planton CLI not installed.

Solutions:

# Install via Homebrew
brew install project-planton/tap/project-planton

# Verify installation
project-planton version

# If Homebrew tap not added:
brew tap project-planton/tap
brew install project-planton

Provider-Specific Issues

Kubernetes: "Unable to connect to cluster"

Symptom: Can't connect to Kubernetes cluster.

Solutions:

# Verify kubeconfig
kubectl cluster-info

# Check current context
kubectl config current-context

# List available contexts
kubectl config get-contexts

# Switch to correct context
kubectl config use-context my-cluster

# Verify cluster access
kubectl get nodes

AWS: "CredentialRequireScopeError"

Symptom: AWS credentials missing scope/region.

Solutions:

# Set region explicitly
export AWS_DEFAULT_REGION=us-west-2

# Or in credential file
cat > aws-cred.yaml <<EOF
accessKeyId: AKIA...
secretAccessKey: ...
region: us-west-2
EOF

project-planton pulumi up --manifest resource.yaml --aws-credential aws-cred.yaml

GCP: "API not enabled"

Symptom: Can't create resource because API isn't enabled.

Solutions:

# Enable required API
gcloud services enable compute.googleapis.com       # Compute Engine
gcloud services enable container.googleapis.com     # GKE
gcloud services enable sqladmin.googleapis.com      # Cloud SQL

# Check enabled APIs
gcloud services list --enabled

Common Error Messages

"no such file or directory"

Likely causes:

  • Manifest file path is wrong
  • Kustomize directory doesn't exist
  • Module directory not found

Solution: Verify all paths exist and are spelled correctly.

"permission denied"

Likely causes:

  • File permissions issue (can't read manifest)
  • Cloud provider permission issue (can't create resource)

Solution: Check file permissions or cloud credentials.

"context deadline exceeded" or Timeout

Likely causes:

  • Resource taking longer than expected
  • Network issues
  • Provider API slowness

Solution: Wait and retry. Some resources (databases, clusters) can take 15+ minutes.


Getting Additional Help

Enable Debug Logging

# Pulumi verbose logging
export PULUMI_LOG_LEVEL=3
project-planton pulumi up --manifest resource.yaml

# OpenTofu debug logging
export TF_LOG=DEBUG
project-planton tofu apply --manifest resource.yaml

# Or trace level for maximum verbosity
export TF_LOG=TRACE

Check Provider Documentation

Community Support

GitHub Issues: project-planton/issues

GitHub Discussions: project-planton/discussions

When reporting issues:

  • Include Project Planton version (project-planton version)
  • Include relevant error messages (sanitize credentials!)
  • Include manifest structure (sanitize sensitive data)
  • Describe what you expected vs. what happened
  • Include steps to reproduce

Preventive Measures

Before Deploying to Production

  • Validate manifest: project-planton validate --manifest prod.yaml
  • Preview changes: project-planton pulumi preview --manifest prod.yaml
  • Test in lower environment first
  • Verify credentials are correct (not dev/staging credentials)
  • Check region/zone is correct
  • Review resource names (ensure they're production-appropriate)
  • Backup existing data (for databases, storage)
  • Have rollback plan ready

Regular Maintenance

  • Rotate credentials every 90 days
  • Update provider versions regularly
  • Clean up unused stacks/state files
  • Review and update manifests for best practices
  • Monitor cloud provider quotas/limits
  • Keep Project Planton CLI updated


Still stuck? Open an issue with details, and we'll help you out!