Deploying Istio Service Mesh on Kubernetes

Introduction

For years, the conventional wisdom around service meshes was that they were complex infrastructure best left to large enterprises with dedicated platform teams. Istio, in particular, earned a reputation for being powerful but operationally demanding – a tool that could solve your observability and security challenges, but only if you were willing to dedicate significant engineering resources to managing it.

That narrative has fundamentally changed.

Modern Istio (1.18+) is dramatically simpler to deploy and operate than earlier versions. The project has consolidated its components, refined its installation methods, and developed proven patterns for production deployments across thousands of organizations. What was once a sprawling collection of microservices (Mixer, Pilot, Citadel, Galley) is now a single control plane binary (istiod) that's straightforward to manage and reason about.

This document explores how to deploy Istio on Kubernetes in a way that balances production-readiness with operational simplicity. We'll examine the landscape of deployment methods – from manual CLI tools to fully managed cloud services – and explain which approaches Project Planton supports and why. Whether you're running a single cluster in development or orchestrating multi-cloud service mesh deployments, understanding these patterns will help you make informed decisions.

The Evolution of Istio Deployment

The journey to deploy Istio has evolved significantly, and understanding this progression helps clarify why certain approaches are recommended today.

Level 0: The Anti-Pattern – In-Cluster Istio Operator

Before Istio 1.24, the project shipped an optional in-cluster operator that could watch IstioOperator custom resources and reconcile the mesh installation continuously. On paper, this sounded appealing – define your desired state in a CR and let a controller handle the rest.

In practice, this approach had serious drawbacks:

  • Security concerns: The operator required cluster-admin privileges to create Istio resources, creating a high-privilege pod running continuously in your cluster.
  • Operational complexity: Running an operator to manage Istio added another failure mode and debugging surface.
  • Confusion: The term "IstioOperator" referred to both the API spec and the controller, leading to confusion.

The Istio project deprecated and removed the in-cluster operator as of version 1.24, with maintainers explicitly stating: "Use of the operator for new Istio installations is discouraged." A security audit reinforced this, noting that running high-privilege in-cluster operators is not recommended.

Verdict: Avoid the deprecated in-cluster operator. Modern deployments should use istioctl or Helm directly.

Level 1: Manual Installation with istioctl

The istioctl CLI is Istio's official installation tool – a single binary that can deploy a complete service mesh with sensible defaults in one command. It comes with several built-in installation profiles (minimal, default, demo) that configure different component sets for various use cases.

Strengths:

  • Simplicity: One command can install everything – istioctl install --set profile=default
  • Profiles: Pre-configured bundles eliminate guesswork about which components to deploy
  • Integrated upgrades: Native support for canary upgrades using revision tags
  • Validation: Built-in checks warn you about configuration issues before deployment

Limitations:

  • Imperative: It's a CLI tool, not a declarative resource, making it less natural for GitOps workflows
  • Automation: While scriptable, it requires wrapping in automation for repeatable deployments

Use cases: Interactive installations, proof-of-concepts, development environments, and as the engine behind higher-level automation (which is exactly how Project Planton uses it under the hood).

Level 2: Declarative Installation with Helm

Istio publishes official Helm charts for all its components (base, istiod, gateways, CNI). The Helm approach splits installation into modular charts that you install in sequence: first the base chart (CRDs), then the control plane (istiod), then any gateways.

Strengths:

  • GitOps native: Helm charts integrate seamlessly with ArgoCD, Flux, and other GitOps tools
  • IaC friendly: Terraform and Pulumi have first-class Helm support
  • Widespread adoption: Helm is a standard tool that most platform teams already use
  • Flexibility: Each component can be configured independently

Considerations:

  • Multi-step: You must install charts in the correct order (base before istiod)
  • Configuration surface: More knobs to understand compared to istioctl profiles
  • Upgrade coordination: Need to carefully orchestrate chart upgrades

Use cases: Production deployments, CI/CD pipelines, GitOps workflows, and Infrastructure-as-Code frameworks (like Project Planton).

Users report that Terraform with the Helm provider is "generally fine" and often "the fastest method to provision a full setup." This aligns with modern platform engineering practices where infrastructure is version-controlled and continuously delivered.

Level 3: Managed Service Mesh (Cloud Provider)

Cloud providers offer managed service mesh solutions that eliminate operational overhead entirely:

ProviderOfferingBased on Istio?Key Features
Google CloudAnthos Service Mesh (ASM)✅ YesFully managed Istio distribution, GCP integrations, hybrid/multi-cloud support
AzureAKS Service Mesh Add-on✅ YesManaged Istio control plane, Azure Monitor integration, certificate automation
AWSApp Mesh❌ No (Envoy)AWS-managed control plane, IAM integration, simpler feature set

Strengths:

  • Zero operational burden: Provider handles installation, upgrades, and management
  • Tested integrations: Deep integration with cloud-native services (IAM, logging, monitoring)
  • Enterprise support: Backed by cloud provider SLAs

Trade-offs:

  • Vendor lock-in: Managed services may lag upstream Istio releases or have proprietary extensions
  • Less flexibility: Cannot customize as deeply as self-managed installations
  • Cost: Managed services typically incur additional charges

Use cases: Organizations prioritizing operational simplicity over control, single-cloud deployments, enterprises requiring vendor support.

Note on AWS App Mesh: While AWS provides App Mesh as a managed service, it uses a different control plane than Istio (though both use Envoy proxies). App Mesh has simpler configuration but lacks Istio's rich feature set. Organizations needing Istio's capabilities on EKS typically self-manage Istio using Helm or istioctl.

Level 4: Infrastructure-as-Code with Pulumi/Terraform

Modern platform engineering treats service mesh as infrastructure – declaratively defined, version-controlled, and continuously delivered. Tools like Pulumi and Terraform can deploy Istio by wrapping Helm charts or invoking istioctl.

Pulumi (what Project Planton uses):

  • Write infrastructure in real programming languages (Go, TypeScript, Python)
  • Compose Istio deployment with other infrastructure resources
  • Strong typing catches configuration errors at compile time
  • Native Helm support for deploying charts

Terraform (popular alternative):

  • HCL (HashiCorp Configuration Language) for declarative infrastructure
  • Mature Helm provider for deploying Istio charts
  • Large ecosystem of community modules

Use cases: Multi-cloud deployments, complex platform automation, teams already using IaC tools.

This is where Project Planton sits: providing a high-level Kubernetes-like API (defined in protobuf) that automatically generates Pulumi code to deploy Istio with production-ready defaults. You declare your desired state in a simple manifest, and Pulumi handles the orchestration.

Production Installation Methods Compared

When deploying Istio in production, two approaches dominate: istioctl and Helm. Both are officially supported, production-ready, and widely used. The choice depends on your tooling ecosystem and operational preferences.

istioctl vs. Helm: The Practical Comparison

AspectistioctlHelm
InstallationSingle command deploys all componentsMulti-step (base → istiod → gateways)
ProfilesBuilt-in profiles (default, minimal, etc.)Pass profile values to each chart
UpgradesNative canary upgrade with revisionsCanary possible, requires manual orchestration
GitOpsRequires wrapper toolingNative support in ArgoCD/Flux
IaCMust be wrapped in scriptsFirst-class support in Terraform/Pulumi
CustomizationIstioOperator spec via YAML or flagsHelm values files
Learning curveSlightly simpler (profiles abstract complexity)Standard Helm knowledge applies

As Istio maintainer @howardjohn notes: "Istioctl and Helm are roughly equivalent in stability; use whichever fits best... Helm tends to integrate much better with other tooling like Terraform, ArgoCD, etc., so is a reasonable first choice."

Project Planton's choice: We use Helm under the hood because it integrates naturally with Pulumi and supports declarative, version-controlled deployments. The KubernetesIstio API abstracts away Helm's complexity while exposing the 20% of configuration that 80% of users need.

Installation Profiles: Choosing the Right Starting Point

Istio provides several installation profiles that bundle sensible defaults:

ProfileComponentsUse Case
minimalControl plane only (istiod)Development, custom deployments, primary cluster in multi-cluster
defaultControl plane + ingress gatewayRecommended for production – complete mesh with edge ingress
demoEverything + high telemetryTutorials, demos, evaluation – not for production
remoteNo control planeRemote clusters in primary-remote multi-cluster setup

Production recommendation: Start with default profile. It includes the control plane and one ingress gateway with production-appropriate resource settings. Customize from there based on your specific needs (add egress gateway, enable CNI, adjust resources).

Licensing: Open Source and Commercial Options

Istio Core: Apache 2.0 License – fully open source, free to use, backed by the CNCF since 2022.

All deployment methods discussed (istioctl, Helm, IaC tools) deploy the open source Istio distribution. No license fees, no vendor lock-in.

Commercial Extensions: Several companies offer enterprise products built on Istio:

  • Tetrate Service Bridge (TSB): Multi-cluster management, FIPS compliance, enterprise UI, commercial support
  • Solo.io Gloo Mesh: Management plane with GUI, multi-mesh federation, WASM plugins (Enterprise tier requires license)

These are optional value-adds for organizations needing enterprise features or support. For most deployments, open source Istio is production-ready without commercial extensions.

Project Planton deploys the open source Apache 2.0 Istio distribution, ensuring maximum compatibility and zero licensing costs.

The 80/20 Configuration Principle

Istio is incredibly powerful and correspondingly complex – the full configuration surface is vast. However, most production deployments only customize a small fraction of available settings. Understanding this 80/20 split helps design simple, maintainable service mesh deployments.

What 80% of Production Deployments Actually Configure

Essential Components

  1. Control Plane (istiod): The brain of Istio – handles service discovery, certificate management, configuration distribution. Every mesh needs this.

  2. Envoy Sidecars: Automatically injected into application pods (via namespace label istio-injection=enabled). The data plane that enforces policies and routes traffic.

  3. Ingress Gateway: Envoy proxy at cluster edge for external traffic. Handles TLS termination, routing rules, authentication policies at the boundary.

  4. CNI Plugin (recommended): Moves network initialization out of application pods (which otherwise need elevated privileges). Essential for hardened security postures.

Common Settings

  • mTLS Mode: Start with PERMISSIVE (accepts both plaintext and mTLS) during migration, then move to STRICT in production for zero-trust networking.

  • Resource Limits: Override defaults for control plane (istiod) and gateways based on cluster size. Typical starting points:

    • Istiod: 1-2 CPU cores, 2-4GB memory
    • Ingress Gateway: 500m-1 CPU, 256-512MB memory
    • Sidecars: 50-100m CPU, 128-256MB memory
  • High Availability: Run 2-3 replicas of istiod in production. Use PodDisruptionBudgets to ensure availability during node maintenance.

  • Telemetry: Keep default metrics enabled (Prometheus scraping). Tune trace sampling (default 1% is usually fine; increase only if needed).

What 80% of Users Don't Need to Touch

  • MeshConfig: Global mesh settings like protocol detection, access logging format. Defaults work well.
  • ProxyConfig: Global sidecar settings (concurrency, resource allocation). Tune only for specific performance needs.
  • Custom Envoy Filters: Low-level proxy customization. Use Istio's built-in features first.
  • Advanced locality-based routing: Works automatically if clusters are properly labeled.
  • Egress Gateway: Optional component for forcing outbound traffic through a controlled exit point. Only needed for strict egress control.

Example Configurations

Development: Minimal Footprint

profile: minimal  # Only control plane
components:
  ingressGateways:
    - enabled: false  # No gateway in dev
pilot:
  resources:
    requests:
      cpu: 100m
      memory: 256Mi

Rationale: Lightweight for local testing (kind, minikube). Enough to inject sidecars and test mesh features.

Staging: Production Simulation

profile: default  # Control plane + ingress
meshConfig:
  accessLogFile: /dev/stdout  # Enable for troubleshooting
components:
  ingressGateways:
    - enabled: true
      k8s:
        replicaCount: 1
pilot:
  replicaCount: 1
  resources:
    requests:
      cpu: 500m
      memory: 1Gi

Rationale: Mirrors production but at smaller scale. Access logs enabled for debugging.

Production: Hardened and Highly Available

profile: default
components:
  ingressGateways:
    - enabled: true
      k8s:
        replicaCount: 2  # HA
        resources:
          requests:
            cpu: 500m
            memory: 256Mi
  egressGateways:
    - enabled: true  # Optional: for strict egress control
pilot:
  replicaCount: 3  # HA control plane
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
values:
  global:
    mtls:
      enabled: true  # Enforce strict mTLS

Rationale: Multiple replicas for resilience, enforced mTLS for zero-trust, sufficient resources for scale.

Production Best Practices

High Availability

  • Control Plane Redundancy: Run at least 2 istiod replicas (3 for large clusters). Use PodAntiAffinity to spread across zones.
  • Gateway Redundancy: Deploy multiple ingress gateway pods with anti-affinity rules.
  • Multi-Cluster Patterns: For critical workloads, consider multi-primary deployments where each cluster has its own control plane.

Security Hardening

  • Strict mTLS: Enforce mutual TLS mesh-wide via PeerAuthentication policies after initial rollout.
  • Certificate Management:
    • Use cert-manager for ingress gateway certificates (Let's Encrypt or corporate CA)
    • For mesh CA, consider plugging in corporate PKI via istio-csr or custom root certificates
  • Authorization Policies: Use Istio's AuthorizationPolicy CRDs to enforce fine-grained access control based on service identity (not IP addresses).
  • Least Privilege: Enable Istio CNI to avoid requiring elevated privileges in application pods.

Observability

  • Metrics: Ensure Prometheus scrapes Istio components (control plane and proxies). Use the official Grafana dashboards.
  • Distributed Tracing: Integrate with Jaeger, Zipkin, or Tempo. Tune sampling rate (default 1% is usually sufficient).
  • Logging: Keep Envoy access logs disabled in production for performance. Enable selectively for debugging.
  • Service Mesh Dashboard: Consider deploying Kiali for real-time visualization of service topology and traffic flows.

Resource Management

  • Right-size Sidecars: Default proxy resources are conservative. For high-throughput services, increase sidecar CPU/memory limits.
  • Control Plane Sizing: Monitor istiod CPU and memory. Large meshes (1000+ pods) may need vertical scaling.
  • Capacity Planning: Remember that each pod now runs two containers (app + sidecar). Plan cluster capacity accordingly.

Upgrade Strategy

  • Canary Upgrades: Always use revision-based upgrades in production. Install new version alongside old, migrate workloads gradually, then remove old version.
  • Version Skipping: Don't skip more than one minor version (e.g., 1.18 → 1.20 requires going through 1.19).
  • Testing: Validate new versions in staging with realistic traffic before production rollout.

Common Pitfalls to Avoid

  1. Missing Sidecar Injection: Forgetting to label namespaces with istio-injection=enabled is the most common mistake. Pods without sidecars are effectively outside the mesh.

  2. Undersized Resources: Default resource limits are conservative. Monitor and adjust based on actual usage to prevent throttling.

  3. In-Place Upgrades: Avoid in-place upgrades in production. Always use canary upgrades with revisions.

  4. Overly Permissive Traffic Policy: Consider setting outboundTrafficPolicy.mode=REGISTRY_ONLY to require explicit ServiceEntry for external services.

  5. Configuration Complexity: Resist the urge to use low-level EnvoyFilters. Use Istio's built-in features first.

Integration with the Kubernetes Ecosystem

Ingress: Istio Gateway vs. Traditional Controllers

Istio Gateway (recommended for mesh deployments):

  • Uses Envoy proxy at cluster edge, fully integrated with service mesh
  • Supports rich L7 routing via Gateway + VirtualService resources
  • Enables seamless application of mesh policies (circuit breakers, retries, tracing) at ingress
  • Can terminate external TLS and originate internal mTLS

Traditional Ingress (NGINX, Traefik):

  • Can coexist with Istio – services behind NGINX can still use sidecars
  • Loses some mesh integration (Istio routing rules don't apply at ingress)
  • Consider if you have existing investment in specific ingress features

For new deployments, Istio's ingress gateway provides better integration and more powerful traffic management.

Certificate Management

Ingress TLS: Use cert-manager to automate certificate provisioning (Let's Encrypt or corporate CA). Istio gateways read certificates from Kubernetes secrets, which cert-manager manages automatically.

Mesh CA: Istio includes a built-in CA (Citadel) that issues workload certificates. For production:

  • Use istio-csr to integrate cert-manager as the mesh CA
  • Or plug in corporate PKI by providing root certificates via the cacerts secret

GitOps Integration

Istio is naturally GitOps-friendly:

  • Installation: Store IstioOperator CRs or Helm values in Git, let ArgoCD/Flux apply them
  • Configuration: Store VirtualServices, DestinationRules, etc. in Git alongside application configs
  • Drift Detection: GitOps controllers will detect and reconcile manual changes

This aligns perfectly with Project Planton's philosophy: declare desired state in version-controlled manifests, let automation handle the rest.

Service Mesh Interface (SMI)

Istio has its own rich CRD API that predates SMI (a later standardization effort). While adapters exist to translate SMI resources to Istio configuration, they're rarely used in practice. Istio's native API is more feature-complete and widely supported.

Recommendation: Use Istio's native CRDs directly. SMI adds complexity without meaningful benefit for Istio deployments.

Multi-Cluster Service Mesh

Istio supports connecting multiple clusters into a unified service mesh, enabling cross-cluster service discovery and traffic routing:

Primary-Remote Pattern

  • One cluster (primary) runs the control plane (istiod)
  • Remote clusters run only data plane (sidecars, gateways) and connect to primary's control plane
  • Use case: Simpler operational model – single control plane to manage
  • Limitation: Primary cluster becomes critical dependency

Multi-Primary Pattern

  • Each cluster runs its own control plane
  • Control planes form one mesh (shared root CA, service discovery federation)
  • Use case: High availability – no single point of failure
  • Complexity: More operational overhead, requires coordinating multiple control planes

For multi-cloud deployments, multi-primary on different networks is typical, with east-west gateways facilitating cross-cluster communication.

The Project Planton Approach

Project Planton provides a Kubernetes-like API for deploying Istio that hides low-level complexity while exposing essential configuration:

apiVersion: kubernetes.project-planton.org/v1
kind: KubernetesIstio
metadata:
  name: production-mesh
spec:
  container:
    resources:
      requests:
        cpu: 1000m
        memory: 2Gi
  # Additional high-level settings...

What Project Planton abstracts:

  • Helm chart installation and ordering
  • Component selection (base, istiod, gateways)
  • Profile defaults and reasonable production settings

What you control:

  • Istio version
  • Resource allocation for control plane
  • High availability settings (replica counts)
  • Optional components (egress gateway, CNI)

Behind the scenes, Project Planton generates Pulumi code that:

  1. Installs Istio Helm charts in correct order
  2. Applies production-ready defaults based on the 80/20 principle
  3. Honors your customizations without exposing every Helm value

This approach gives you production-ready Istio in minutes without becoming an Istio installation expert.

Conclusion

The landscape of Istio deployment has matured dramatically. What was once a complex, operationally intensive undertaking is now straightforward and well-understood. The key insights:

  1. Modern Istio is simpler: Consolidated architecture (single control plane), refined installation tools, and proven patterns have reduced operational burden.

  2. Installation methods are robust: Both istioctl and Helm are production-ready. Choose based on your ecosystem (Helm for GitOps/IaC, istioctl for simplicity).

  3. The 80/20 rule applies: Most deployments only customize a handful of settings. Start with defaults, measure, then tune specific pain points.

  4. Production patterns are well-established: HA configurations, canary upgrades, security hardening – these are solved problems with clear best practices.

  5. Ecosystem integration is mature: Istio works seamlessly with cert-manager, Prometheus, GitOps tools, and modern IaC frameworks.

Project Planton builds on these insights to provide the simplest possible path to production-ready service mesh: declare your desired state in a high-level API, let automation handle the complexity. Whether you're deploying to a single cluster in development or orchestrating multi-cloud service meshes in production, the patterns and practices outlined here will help you succeed.

The era of service mesh complexity is over. Welcome to service mesh as infrastructure-as-code.

Next article

Jenkins

Deploying Jenkins on Kubernetes: From Anti-Patterns to Production Introduction For years, the conventional wisdom was clear: Jenkins, a stateful application designed in the pre-container era, and Kubernetes, a platform optimized for ephemeral workloads, were an awkward pairing at best. Many teams assumed that deploying Jenkins to Kubernetes meant fighting the platform rather than embracing it. That conventional wisdom is outdated. The maturity of Kubernetes primitives (StatefulSets, CSI...
Read next article