Deploying Istio Service Mesh on Kubernetes
Introduction
For years, the conventional wisdom around service meshes was that they were complex infrastructure best left to large enterprises with dedicated platform teams. Istio, in particular, earned a reputation for being powerful but operationally demanding – a tool that could solve your observability and security challenges, but only if you were willing to dedicate significant engineering resources to managing it.
That narrative has fundamentally changed.
Modern Istio (1.18+) is dramatically simpler to deploy and operate than earlier versions. The project has consolidated its components, refined its installation methods, and developed proven patterns for production deployments across thousands of organizations. What was once a sprawling collection of microservices (Mixer, Pilot, Citadel, Galley) is now a single control plane binary (istiod) that's straightforward to manage and reason about.
This document explores how to deploy Istio on Kubernetes in a way that balances production-readiness with operational simplicity. We'll examine the landscape of deployment methods – from manual CLI tools to fully managed cloud services – and explain which approaches Project Planton supports and why. Whether you're running a single cluster in development or orchestrating multi-cloud service mesh deployments, understanding these patterns will help you make informed decisions.
The Evolution of Istio Deployment
The journey to deploy Istio has evolved significantly, and understanding this progression helps clarify why certain approaches are recommended today.
Level 0: The Anti-Pattern – In-Cluster Istio Operator
Before Istio 1.24, the project shipped an optional in-cluster operator that could watch IstioOperator custom resources and reconcile the mesh installation continuously. On paper, this sounded appealing – define your desired state in a CR and let a controller handle the rest.
In practice, this approach had serious drawbacks:
- Security concerns: The operator required cluster-admin privileges to create Istio resources, creating a high-privilege pod running continuously in your cluster.
- Operational complexity: Running an operator to manage Istio added another failure mode and debugging surface.
- Confusion: The term "IstioOperator" referred to both the API spec and the controller, leading to confusion.
The Istio project deprecated and removed the in-cluster operator as of version 1.24, with maintainers explicitly stating: "Use of the operator for new Istio installations is discouraged." A security audit reinforced this, noting that running high-privilege in-cluster operators is not recommended.
Verdict: Avoid the deprecated in-cluster operator. Modern deployments should use istioctl or Helm directly.
Level 1: Manual Installation with istioctl
The istioctl CLI is Istio's official installation tool – a single binary that can deploy a complete service mesh with sensible defaults in one command. It comes with several built-in installation profiles (minimal, default, demo) that configure different component sets for various use cases.
Strengths:
- Simplicity: One command can install everything –
istioctl install --set profile=default - Profiles: Pre-configured bundles eliminate guesswork about which components to deploy
- Integrated upgrades: Native support for canary upgrades using revision tags
- Validation: Built-in checks warn you about configuration issues before deployment
Limitations:
- Imperative: It's a CLI tool, not a declarative resource, making it less natural for GitOps workflows
- Automation: While scriptable, it requires wrapping in automation for repeatable deployments
Use cases: Interactive installations, proof-of-concepts, development environments, and as the engine behind higher-level automation (which is exactly how Project Planton uses it under the hood).
Level 2: Declarative Installation with Helm
Istio publishes official Helm charts for all its components (base, istiod, gateways, CNI). The Helm approach splits installation into modular charts that you install in sequence: first the base chart (CRDs), then the control plane (istiod), then any gateways.
Strengths:
- GitOps native: Helm charts integrate seamlessly with ArgoCD, Flux, and other GitOps tools
- IaC friendly: Terraform and Pulumi have first-class Helm support
- Widespread adoption: Helm is a standard tool that most platform teams already use
- Flexibility: Each component can be configured independently
Considerations:
- Multi-step: You must install charts in the correct order (base before istiod)
- Configuration surface: More knobs to understand compared to istioctl profiles
- Upgrade coordination: Need to carefully orchestrate chart upgrades
Use cases: Production deployments, CI/CD pipelines, GitOps workflows, and Infrastructure-as-Code frameworks (like Project Planton).
Users report that Terraform with the Helm provider is "generally fine" and often "the fastest method to provision a full setup." This aligns with modern platform engineering practices where infrastructure is version-controlled and continuously delivered.
Level 3: Managed Service Mesh (Cloud Provider)
Cloud providers offer managed service mesh solutions that eliminate operational overhead entirely:
| Provider | Offering | Based on Istio? | Key Features |
|---|---|---|---|
| Google Cloud | Anthos Service Mesh (ASM) | ✅ Yes | Fully managed Istio distribution, GCP integrations, hybrid/multi-cloud support |
| Azure | AKS Service Mesh Add-on | ✅ Yes | Managed Istio control plane, Azure Monitor integration, certificate automation |
| AWS | App Mesh | ❌ No (Envoy) | AWS-managed control plane, IAM integration, simpler feature set |
Strengths:
- Zero operational burden: Provider handles installation, upgrades, and management
- Tested integrations: Deep integration with cloud-native services (IAM, logging, monitoring)
- Enterprise support: Backed by cloud provider SLAs
Trade-offs:
- Vendor lock-in: Managed services may lag upstream Istio releases or have proprietary extensions
- Less flexibility: Cannot customize as deeply as self-managed installations
- Cost: Managed services typically incur additional charges
Use cases: Organizations prioritizing operational simplicity over control, single-cloud deployments, enterprises requiring vendor support.
Note on AWS App Mesh: While AWS provides App Mesh as a managed service, it uses a different control plane than Istio (though both use Envoy proxies). App Mesh has simpler configuration but lacks Istio's rich feature set. Organizations needing Istio's capabilities on EKS typically self-manage Istio using Helm or istioctl.
Level 4: Infrastructure-as-Code with Pulumi/Terraform
Modern platform engineering treats service mesh as infrastructure – declaratively defined, version-controlled, and continuously delivered. Tools like Pulumi and Terraform can deploy Istio by wrapping Helm charts or invoking istioctl.
Pulumi (what Project Planton uses):
- Write infrastructure in real programming languages (Go, TypeScript, Python)
- Compose Istio deployment with other infrastructure resources
- Strong typing catches configuration errors at compile time
- Native Helm support for deploying charts
Terraform (popular alternative):
- HCL (HashiCorp Configuration Language) for declarative infrastructure
- Mature Helm provider for deploying Istio charts
- Large ecosystem of community modules
Use cases: Multi-cloud deployments, complex platform automation, teams already using IaC tools.
This is where Project Planton sits: providing a high-level Kubernetes-like API (defined in protobuf) that automatically generates Pulumi code to deploy Istio with production-ready defaults. You declare your desired state in a simple manifest, and Pulumi handles the orchestration.
Production Installation Methods Compared
When deploying Istio in production, two approaches dominate: istioctl and Helm. Both are officially supported, production-ready, and widely used. The choice depends on your tooling ecosystem and operational preferences.
istioctl vs. Helm: The Practical Comparison
| Aspect | istioctl | Helm |
|---|---|---|
| Installation | Single command deploys all components | Multi-step (base → istiod → gateways) |
| Profiles | Built-in profiles (default, minimal, etc.) | Pass profile values to each chart |
| Upgrades | Native canary upgrade with revisions | Canary possible, requires manual orchestration |
| GitOps | Requires wrapper tooling | Native support in ArgoCD/Flux |
| IaC | Must be wrapped in scripts | First-class support in Terraform/Pulumi |
| Customization | IstioOperator spec via YAML or flags | Helm values files |
| Learning curve | Slightly simpler (profiles abstract complexity) | Standard Helm knowledge applies |
As Istio maintainer @howardjohn notes: "Istioctl and Helm are roughly equivalent in stability; use whichever fits best... Helm tends to integrate much better with other tooling like Terraform, ArgoCD, etc., so is a reasonable first choice."
Project Planton's choice: We use Helm under the hood because it integrates naturally with Pulumi and supports declarative, version-controlled deployments. The KubernetesIstio API abstracts away Helm's complexity while exposing the 20% of configuration that 80% of users need.
Installation Profiles: Choosing the Right Starting Point
Istio provides several installation profiles that bundle sensible defaults:
| Profile | Components | Use Case |
|---|---|---|
| minimal | Control plane only (istiod) | Development, custom deployments, primary cluster in multi-cluster |
| default | Control plane + ingress gateway | Recommended for production – complete mesh with edge ingress |
| demo | Everything + high telemetry | Tutorials, demos, evaluation – not for production |
| remote | No control plane | Remote clusters in primary-remote multi-cluster setup |
Production recommendation: Start with default profile. It includes the control plane and one ingress gateway with production-appropriate resource settings. Customize from there based on your specific needs (add egress gateway, enable CNI, adjust resources).
Licensing: Open Source and Commercial Options
Istio Core: Apache 2.0 License – fully open source, free to use, backed by the CNCF since 2022.
All deployment methods discussed (istioctl, Helm, IaC tools) deploy the open source Istio distribution. No license fees, no vendor lock-in.
Commercial Extensions: Several companies offer enterprise products built on Istio:
- Tetrate Service Bridge (TSB): Multi-cluster management, FIPS compliance, enterprise UI, commercial support
- Solo.io Gloo Mesh: Management plane with GUI, multi-mesh federation, WASM plugins (Enterprise tier requires license)
These are optional value-adds for organizations needing enterprise features or support. For most deployments, open source Istio is production-ready without commercial extensions.
Project Planton deploys the open source Apache 2.0 Istio distribution, ensuring maximum compatibility and zero licensing costs.
The 80/20 Configuration Principle
Istio is incredibly powerful and correspondingly complex – the full configuration surface is vast. However, most production deployments only customize a small fraction of available settings. Understanding this 80/20 split helps design simple, maintainable service mesh deployments.
What 80% of Production Deployments Actually Configure
Essential Components
-
Control Plane (istiod): The brain of Istio – handles service discovery, certificate management, configuration distribution. Every mesh needs this.
-
Envoy Sidecars: Automatically injected into application pods (via namespace label
istio-injection=enabled). The data plane that enforces policies and routes traffic. -
Ingress Gateway: Envoy proxy at cluster edge for external traffic. Handles TLS termination, routing rules, authentication policies at the boundary.
-
CNI Plugin (recommended): Moves network initialization out of application pods (which otherwise need elevated privileges). Essential for hardened security postures.
Common Settings
-
mTLS Mode: Start with
PERMISSIVE(accepts both plaintext and mTLS) during migration, then move toSTRICTin production for zero-trust networking. -
Resource Limits: Override defaults for control plane (istiod) and gateways based on cluster size. Typical starting points:
- Istiod: 1-2 CPU cores, 2-4GB memory
- Ingress Gateway: 500m-1 CPU, 256-512MB memory
- Sidecars: 50-100m CPU, 128-256MB memory
-
High Availability: Run 2-3 replicas of istiod in production. Use PodDisruptionBudgets to ensure availability during node maintenance.
-
Telemetry: Keep default metrics enabled (Prometheus scraping). Tune trace sampling (default 1% is usually fine; increase only if needed).
What 80% of Users Don't Need to Touch
- MeshConfig: Global mesh settings like protocol detection, access logging format. Defaults work well.
- ProxyConfig: Global sidecar settings (concurrency, resource allocation). Tune only for specific performance needs.
- Custom Envoy Filters: Low-level proxy customization. Use Istio's built-in features first.
- Advanced locality-based routing: Works automatically if clusters are properly labeled.
- Egress Gateway: Optional component for forcing outbound traffic through a controlled exit point. Only needed for strict egress control.
Example Configurations
Development: Minimal Footprint
profile: minimal # Only control plane
components:
ingressGateways:
- enabled: false # No gateway in dev
pilot:
resources:
requests:
cpu: 100m
memory: 256Mi
Rationale: Lightweight for local testing (kind, minikube). Enough to inject sidecars and test mesh features.
Staging: Production Simulation
profile: default # Control plane + ingress
meshConfig:
accessLogFile: /dev/stdout # Enable for troubleshooting
components:
ingressGateways:
- enabled: true
k8s:
replicaCount: 1
pilot:
replicaCount: 1
resources:
requests:
cpu: 500m
memory: 1Gi
Rationale: Mirrors production but at smaller scale. Access logs enabled for debugging.
Production: Hardened and Highly Available
profile: default
components:
ingressGateways:
- enabled: true
k8s:
replicaCount: 2 # HA
resources:
requests:
cpu: 500m
memory: 256Mi
egressGateways:
- enabled: true # Optional: for strict egress control
pilot:
replicaCount: 3 # HA control plane
resources:
requests:
cpu: 1000m
memory: 2Gi
values:
global:
mtls:
enabled: true # Enforce strict mTLS
Rationale: Multiple replicas for resilience, enforced mTLS for zero-trust, sufficient resources for scale.
Production Best Practices
High Availability
- Control Plane Redundancy: Run at least 2 istiod replicas (3 for large clusters). Use PodAntiAffinity to spread across zones.
- Gateway Redundancy: Deploy multiple ingress gateway pods with anti-affinity rules.
- Multi-Cluster Patterns: For critical workloads, consider multi-primary deployments where each cluster has its own control plane.
Security Hardening
- Strict mTLS: Enforce mutual TLS mesh-wide via
PeerAuthenticationpolicies after initial rollout. - Certificate Management:
- Use cert-manager for ingress gateway certificates (Let's Encrypt or corporate CA)
- For mesh CA, consider plugging in corporate PKI via istio-csr or custom root certificates
- Authorization Policies: Use Istio's
AuthorizationPolicyCRDs to enforce fine-grained access control based on service identity (not IP addresses). - Least Privilege: Enable Istio CNI to avoid requiring elevated privileges in application pods.
Observability
- Metrics: Ensure Prometheus scrapes Istio components (control plane and proxies). Use the official Grafana dashboards.
- Distributed Tracing: Integrate with Jaeger, Zipkin, or Tempo. Tune sampling rate (default 1% is usually sufficient).
- Logging: Keep Envoy access logs disabled in production for performance. Enable selectively for debugging.
- Service Mesh Dashboard: Consider deploying Kiali for real-time visualization of service topology and traffic flows.
Resource Management
- Right-size Sidecars: Default proxy resources are conservative. For high-throughput services, increase sidecar CPU/memory limits.
- Control Plane Sizing: Monitor istiod CPU and memory. Large meshes (1000+ pods) may need vertical scaling.
- Capacity Planning: Remember that each pod now runs two containers (app + sidecar). Plan cluster capacity accordingly.
Upgrade Strategy
- Canary Upgrades: Always use revision-based upgrades in production. Install new version alongside old, migrate workloads gradually, then remove old version.
- Version Skipping: Don't skip more than one minor version (e.g., 1.18 → 1.20 requires going through 1.19).
- Testing: Validate new versions in staging with realistic traffic before production rollout.
Common Pitfalls to Avoid
-
Missing Sidecar Injection: Forgetting to label namespaces with
istio-injection=enabledis the most common mistake. Pods without sidecars are effectively outside the mesh. -
Undersized Resources: Default resource limits are conservative. Monitor and adjust based on actual usage to prevent throttling.
-
In-Place Upgrades: Avoid in-place upgrades in production. Always use canary upgrades with revisions.
-
Overly Permissive Traffic Policy: Consider setting
outboundTrafficPolicy.mode=REGISTRY_ONLYto require explicit ServiceEntry for external services. -
Configuration Complexity: Resist the urge to use low-level EnvoyFilters. Use Istio's built-in features first.
Integration with the Kubernetes Ecosystem
Ingress: Istio Gateway vs. Traditional Controllers
Istio Gateway (recommended for mesh deployments):
- Uses Envoy proxy at cluster edge, fully integrated with service mesh
- Supports rich L7 routing via Gateway + VirtualService resources
- Enables seamless application of mesh policies (circuit breakers, retries, tracing) at ingress
- Can terminate external TLS and originate internal mTLS
Traditional Ingress (NGINX, Traefik):
- Can coexist with Istio – services behind NGINX can still use sidecars
- Loses some mesh integration (Istio routing rules don't apply at ingress)
- Consider if you have existing investment in specific ingress features
For new deployments, Istio's ingress gateway provides better integration and more powerful traffic management.
Certificate Management
Ingress TLS: Use cert-manager to automate certificate provisioning (Let's Encrypt or corporate CA). Istio gateways read certificates from Kubernetes secrets, which cert-manager manages automatically.
Mesh CA: Istio includes a built-in CA (Citadel) that issues workload certificates. For production:
- Use istio-csr to integrate cert-manager as the mesh CA
- Or plug in corporate PKI by providing root certificates via the
cacertssecret
GitOps Integration
Istio is naturally GitOps-friendly:
- Installation: Store IstioOperator CRs or Helm values in Git, let ArgoCD/Flux apply them
- Configuration: Store VirtualServices, DestinationRules, etc. in Git alongside application configs
- Drift Detection: GitOps controllers will detect and reconcile manual changes
This aligns perfectly with Project Planton's philosophy: declare desired state in version-controlled manifests, let automation handle the rest.
Service Mesh Interface (SMI)
Istio has its own rich CRD API that predates SMI (a later standardization effort). While adapters exist to translate SMI resources to Istio configuration, they're rarely used in practice. Istio's native API is more feature-complete and widely supported.
Recommendation: Use Istio's native CRDs directly. SMI adds complexity without meaningful benefit for Istio deployments.
Multi-Cluster Service Mesh
Istio supports connecting multiple clusters into a unified service mesh, enabling cross-cluster service discovery and traffic routing:
Primary-Remote Pattern
- One cluster (primary) runs the control plane (istiod)
- Remote clusters run only data plane (sidecars, gateways) and connect to primary's control plane
- Use case: Simpler operational model – single control plane to manage
- Limitation: Primary cluster becomes critical dependency
Multi-Primary Pattern
- Each cluster runs its own control plane
- Control planes form one mesh (shared root CA, service discovery federation)
- Use case: High availability – no single point of failure
- Complexity: More operational overhead, requires coordinating multiple control planes
For multi-cloud deployments, multi-primary on different networks is typical, with east-west gateways facilitating cross-cluster communication.
The Project Planton Approach
Project Planton provides a Kubernetes-like API for deploying Istio that hides low-level complexity while exposing essential configuration:
apiVersion: kubernetes.project-planton.org/v1
kind: KubernetesIstio
metadata:
name: production-mesh
spec:
container:
resources:
requests:
cpu: 1000m
memory: 2Gi
# Additional high-level settings...
What Project Planton abstracts:
- Helm chart installation and ordering
- Component selection (base, istiod, gateways)
- Profile defaults and reasonable production settings
What you control:
- Istio version
- Resource allocation for control plane
- High availability settings (replica counts)
- Optional components (egress gateway, CNI)
Behind the scenes, Project Planton generates Pulumi code that:
- Installs Istio Helm charts in correct order
- Applies production-ready defaults based on the 80/20 principle
- Honors your customizations without exposing every Helm value
This approach gives you production-ready Istio in minutes without becoming an Istio installation expert.
Conclusion
The landscape of Istio deployment has matured dramatically. What was once a complex, operationally intensive undertaking is now straightforward and well-understood. The key insights:
-
Modern Istio is simpler: Consolidated architecture (single control plane), refined installation tools, and proven patterns have reduced operational burden.
-
Installation methods are robust: Both istioctl and Helm are production-ready. Choose based on your ecosystem (Helm for GitOps/IaC, istioctl for simplicity).
-
The 80/20 rule applies: Most deployments only customize a handful of settings. Start with defaults, measure, then tune specific pain points.
-
Production patterns are well-established: HA configurations, canary upgrades, security hardening – these are solved problems with clear best practices.
-
Ecosystem integration is mature: Istio works seamlessly with cert-manager, Prometheus, GitOps tools, and modern IaC frameworks.
Project Planton builds on these insights to provide the simplest possible path to production-ready service mesh: declare your desired state in a high-level API, let automation handle the complexity. Whether you're deploying to a single cluster in development or orchestrating multi-cloud service meshes in production, the patterns and practices outlined here will help you succeed.
The era of service mesh complexity is over. Welcome to service mesh as infrastructure-as-code.
Next article