Envoy Feature-Implementierungen¶

Detaillierte Implementierung aller Features für Envoy Proxy Provider in GAL

Navigation: - ← Zurück zur Envoy Übersicht - → Migration & Best Practices

Inhaltsverzeichnis¶

Feature-Implementierungen
Envoy Feature Coverage
Envoy-spezifische Details
Advanced Features
Request Mirroring/Shadowing

Feature-Implementierungen¶

1. Load Balancing¶

Envoy unterstützt die meisten Load Balancing Algorithmen:

load_balancer:
  algorithm: round_robin    # ROUND_ROBIN
  # algorithm: least_conn    # LEAST_REQUEST
  # algorithm: ip_hash       # RING_HASH (Consistent Hashing)
  # algorithm: weighted      # ROUND_ROBIN mit Weights

Generierte Envoy Config:

lb_policy: ROUND_ROBIN      # oder LEAST_REQUEST, RING_HASH

Algorithmen: - round_robin → ROUND_ROBIN (Default) - least_conn → LEAST_REQUEST (bevorzugt Server mit wenigsten aktiven Requests) - ip_hash → RING_HASH (Consistent Hashing, Session Persistence) - weighted → ROUND_ROBIN + load_balancing_weight

2. Health Checks¶

Active Health Checks:

health_check:
  active:
    enabled: true
    interval: "10s"           # Probe-Intervall
    timeout: "5s"             # Probe-Timeout
    http_path: "/health"      # Health Endpoint
    healthy_threshold: 2      # Erfolge bis "healthy"
    unhealthy_threshold: 3    # Fehler bis "unhealthy"
    healthy_status_codes: [200, 204]

Generiert:

health_checks:
- timeout: 5s
  interval: 10s
  unhealthy_threshold: 3
  healthy_threshold: 2
  http_health_check:
    path: /health
    expected_statuses:
    - start: 200
      end: 201
    - start: 204
      end: 205

Passive Health Checks (Outlier Detection):

health_check:
  passive:
    enabled: true
    max_failures: 5           # Max Fehler
    failure_window: "30s"     # Zeitfenster

Generiert:

outlier_detection:
  consecutive_5xx: 5
  interval: 30s
  base_ejection_time: 30s
  max_ejection_percent: 50

3. Rate Limiting¶

rate_limit:
  enabled: true
  requests_per_second: 100
  burst: 200
  response_status: 429

Generiert (Global Rate Limit Service):

http_filters:
- name: envoy.filters.http.ratelimit
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
    domain: gal_ratelimit
    rate_limit_service:
      grpc_service:
        envoy_grpc:
          cluster_name: rate_limit_service

Hinweis: Envoy benötigt einen externen Rate Limit Service (z.B. lyft/ratelimit).

4. Authentication¶

JWT Validation:

authentication:
  enabled: true
  type: jwt
  jwt:
    issuer: "https://auth.example.com"
    audiences: ["api"]
    jwks_uri: "https://auth.example.com/.well-known/jwks.json"

Generiert:

http_filters:
- name: envoy.filters.http.jwt_authn
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
    providers:
      jwt_provider:
        issuer: https://auth.example.com
        audiences:
        - api
        remote_jwks:
          http_uri:
            uri: https://auth.example.com/.well-known/jwks.json
            cluster: jwt_cluster
          cache_duration: 3600s
    rules:
    - match:
        prefix: /api
      requires:
        provider_name: jwt_provider

Basic Auth (via Lua Filter):

authentication:
  enabled: true
  type: basic
  basic_auth:
    users:
      admin: password123

Generiert Lua Filter für Basic Auth Validation.

5. CORS¶

cors:
  enabled: true
  allowed_origins: ["https://app.example.com"]
  allowed_methods: ["GET", "POST", "PUT", "DELETE"]
  allowed_headers: ["Content-Type", "Authorization"]
  allow_credentials: true
  max_age: 86400

Generiert:

cors:
  allow_origin_string_match:
  - exact: https://app.example.com
  allow_methods: "GET,POST,PUT,DELETE"
  allow_headers: "Content-Type,Authorization"
  allow_credentials: true
  max_age: "86400"

6. Timeout & Retry¶

timeout:
  connect: "5s"
  read: "60s"
  idle: "300s"
retry:
  enabled: true
  attempts: 3
  backoff: exponential
  base_interval: "25ms"
  max_interval: "250ms"
  retry_on:
    - connect_timeout
    - http_5xx

Generiert:

# Cluster-level
connect_timeout: 5s

# Route-level
timeout: 60s
idle_timeout: 300s
retry_policy:
  num_retries: 3
  per_try_timeout: 25ms
  retry_on: "connect-failure,5xx"

7. Circuit Breaker¶

circuit_breaker:
  enabled: true
  max_failures: 5
  timeout: "30s"
  unhealthy_status_codes: [500, 502, 503, 504]

Generiert (Outlier Detection):

outlier_detection:
  consecutive_5xx: 5
  interval: 30s
  base_ejection_time: 30s
  max_ejection_percent: 50
  enforcing_consecutive_5xx: 100

8. WebSocket¶

websocket:
  enabled: true
  idle_timeout: "600s"
  ping_interval: "30s"

Generiert:

upgrade_configs:
- upgrade_type: websocket
http_protocol_options:
  idle_timeout: 600s

9. Request/Response Headers¶

headers:
  request_add:
    X-Request-ID: "{{uuid}}"
    X-Forwarded-Proto: "https"
  request_remove:
    - X-Internal-Secret
  response_add:
    X-Gateway: "GAL-Envoy"
  response_remove:
    - X-Powered-By

Generiert:

request_headers_to_add:
- header:
    key: X-Request-ID
    value: "%REQ(X-REQUEST-ID)%"
  append: false
request_headers_to_remove:
- X-Internal-Secret
response_headers_to_add:
- header:
    key: X-Gateway
    value: GAL-Envoy
response_headers_to_remove:
- X-Powered-By

10. Body Transformation¶

body_transformation:
  enabled: true
  request:
    add_fields:
      trace_id: "{{uuid}}"
    remove_fields:
      - secret_key
  response:
    filter_fields:
      - password

Generiert Lua Filter:

http_filters:
- name: envoy.filters.http.lua
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
    inline_code: |
      function envoy_on_request(request_handle)
        -- Transform request body
      end
      function envoy_on_response(response_handle)
        -- Transform response body
      end

Provider-Vergleich¶

Envoy vs. Andere Provider¶

Feature	Envoy	Kong	APISIX	Traefik	Nginx	HAProxy
Performance	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Feature-Set	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Observability	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐⭐
Cloud-Native	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐
Lernkurve	⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Dokumentation	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐

Envoy Stärken: - ✅ Umfassendstes Feature-Set aller Provider - ✅ Native Observability (Metrics, Tracing, Logging) - ✅ Service Mesh Ready (Istio, Consul, Linkerd) - ✅ Modern & Cloud-Native - ✅ Hot Reload ohne Downtime - ✅ gRPC Native (HTTP/2)

Envoy Schwächen: - ❌ Steile Lernkurve (komplexe YAML-Config) - ❌ Verbose Config (sehr lang) - ⚠️ Basic Auth nicht nativ (Lua/External) - ⚠️ Rate Limiting benötigt externen Service

Envoy Feature Coverage¶

Detaillierte Analyse basierend auf der offiziellen Envoy Dokumentation.

HTTP Filters (envoy.filters.http.*)¶

Filter	Import	Export	Status	Bemerkung
`router`	✅	✅	Voll	HTTP Routing, immer aktiviert
`jwt_authn`	✅	✅	Voll	JWT Validation mit JWKS
`cors`	✅	✅	Voll	CORS Policy (native)
`lua`	❌	✅	Export	Body Transformation, Basic Auth
`ratelimit`	⚠️	⚠️	Teilweise	Benötigt externen Service
`local_ratelimit`	❌	⚠️	Export	Local Rate Limiting (ohne Service)
`ext_authz`	❌	⚠️	Export	External Authorization (OPA, etc.)
`fault`	❌	❌	Nicht	Fault Injection
`grpc_json_transcoder`	❌	❌	Nicht	gRPC-JSON Transformation
`header_to_metadata`	❌	❌	Nicht	Header → Metadata Mapping
`ip_tagging`	❌	❌	Nicht	IP Tagging
`buffer`	❌	❌	Nicht	Request/Response Buffering
`gzip`	❌	❌	Nicht	Compression
`adaptive_concurrency`	❌	❌	Nicht	Adaptive Concurrency Control

Network Filters (envoy.filters.network.*)¶

Filter	Import	Export	Status	Bemerkung
`http_connection_manager`	✅	✅	Voll	HTTP Connection Manager (core)
`tcp_proxy`	❌	❌	Nicht	TCP Proxying
`redis_proxy`	❌	❌	Nicht	Redis Proxying
`mongo_proxy`	❌	❌	Nicht	MongoDB Proxying
`mysql_proxy`	❌	❌	Nicht	MySQL Proxying

Cluster Features¶

Feature	Import	Export	Status	Bemerkung
`load_assignment`	✅	✅	Voll	Endpoints mit IP:Port
`lb_policy` (ROUND_ROBIN)	✅	✅	Voll	Round Robin Load Balancing
`lb_policy` (LEAST_REQUEST)	✅	✅	Voll	Least Connections
`lb_policy` (RING_HASH)	✅	✅	Voll	Consistent Hashing (IP Hash)
`lb_policy` (RANDOM)	⚠️	⚠️	Teilweise	Random Selection
`lb_policy` (MAGLEV)	❌	❌	Nicht	Maglev Hashing
`health_checks` (HTTP)	✅	✅	Voll	Active Health Checks
`health_checks` (TCP)	❌	❌	Nicht	TCP Health Checks
`health_checks` (gRPC)	❌	❌	Nicht	gRPC Health Checks
`outlier_detection`	✅	✅	Voll	Passive Health Checks / Circuit Breaker
`circuit_breakers`	⚠️	⚠️	Teilweise	Connection/Request Limits
`upstream_connection_options`	❌	❌	Nicht	TCP Keepalive
`dns_lookup_family`	❌	✅	Export	V4_ONLY (Default)
`transport_socket` (TLS)	❌	❌	Nicht	Upstream TLS

Route Configuration Features¶

Feature	Import	Export	Status	Bemerkung
`match.prefix`	✅	✅	Voll	Path Prefix Matching
`match.path`	✅	✅	Voll	Exact Path Matching
`match.safe_regex`	❌	❌	Nicht	Regex Path Matching
`match.headers`	❌	❌	Nicht	Header-based Routing
`match.query_parameters`	❌	❌	Nicht	Query Parameter Matching
`route.cluster`	✅	✅	Voll	Single Cluster Routing
`route.weighted_clusters`	⚠️	⚠️	Teilweise	Traffic Splitting
`route.timeout`	✅	✅	Voll	Request Timeout
`route.idle_timeout`	✅	✅	Voll	Idle Timeout
`route.retry_policy`	✅	✅	Voll	Retry mit Exponential Backoff
`route.cors`	✅	✅	Voll	Per-Route CORS
`route.upgrade_configs` (WebSocket)	✅	✅	Voll	WebSocket Support
`request_headers_to_add`	✅	✅	Voll	Request Header Manipulation
`request_headers_to_remove`	✅	✅	Voll	Request Header Removal
`response_headers_to_add`	✅	✅	Voll	Response Header Manipulation
`response_headers_to_remove`	✅	✅	Voll	Response Header Removal
`route.metadata`	❌	❌	Nicht	Route Metadata
`route.decorator`	❌	❌	Nicht	Tracing Decorator

Listener Features¶

Feature	Import	Export	Status	Bemerkung
`address.socket_address`	✅	✅	Voll	TCP Socket (IP:Port)
`filter_chains`	✅	✅	Voll	Filter Chain
`listener_filters`	❌	❌	Nicht	TLS Inspector, HTTP Inspector
`per_connection_buffer_limit_bytes`	❌	❌	Nicht	Buffer Limits
`socket_options`	❌	❌	Nicht	TCP Socket Options
`transport_socket` (TLS)	❌	❌	Nicht	TLS Termination

Access Logging¶

Feature	Import	Export	Status	Bemerkung
`file` (stdout/stderr)	✅	✅	Voll	File Access Logs
`json_format`	✅	✅	Voll	JSON Structured Logs
`text_format`	⚠️	⚠️	Teilweise	Text Logs (CEL Format)
`grpc`	❌	❌	Nicht	gRPC Access Log Service
`http`	❌	❌	Nicht	HTTP Access Log Service

Metrics & Observability¶

Feature	Import	Export	Status	Bemerkung
Admin Interface (`/stats`)	N/A	✅	Export	Prometheus Metrics
Admin Interface (`/clusters`)	N/A	✅	Export	Cluster Health Status
Admin Interface (`/config_dump`)	N/A	✅	Export	Config Dump
Tracing (Zipkin)	❌	❌	Nicht	Distributed Tracing
Tracing (Jaeger)	❌	❌	Nicht	Distributed Tracing
Tracing (OpenTelemetry)	❌	❌	Nicht	Distributed Tracing
StatsD	❌	❌	Nicht	Metrics Export
DogStatsD	❌	❌	Nicht	Datadog Metrics

Advanced Features¶

Feature	Import	Export	Status	Bemerkung
xDS API (Dynamic Config)	❌	❌	Nicht	LDS, RDS, CDS, EDS, SDS
Hot Restart	N/A	N/A	N/A	Envoy-native Feature
Runtime Configuration	❌	❌	Nicht	Feature Flags
Overload Manager	❌	❌	Nicht	Resource Limits
Wasm Filters	❌	❌	Nicht	WebAssembly Extensions

Coverage Score nach Kategorie¶

Kategorie	Features Total	Unterstützt	Coverage
HTTP Filters	14	3 voll, 3 teilweise	~40%
Network Filters	5	1 voll	20%
Cluster Features	14	7 voll, 3 teilweise	~65%
Route Configuration	18	11 voll, 2 teilweise	~70%
Listener Features	6	2 voll	33%
Access Logging	5	2 voll, 1 teilweise	~50%
Metrics & Observability	8	3 export	37%
Advanced Features	5	0	0%

Gesamt (API Gateway relevante Features): ~52% Coverage

Import Coverage: ~55% (Import bestehender Envoy Configs → GAL) Export Coverage: ~75% (GAL → Envoy Config Generation)

Bidirektionale Feature-Unterstützung¶

Vollständig bidirektional (Import ↔ Export): 1. ✅ HTTP Routing (Prefix, Exact) 2. ✅ Cluster Configuration (Endpoints, LB Policy) 3. ✅ Health Checks (Active + Passive) 4. ✅ Load Balancing (Round Robin, Least Request, Ring Hash) 5. ✅ CORS Policy 6. ✅ JWT Authentication 7. ✅ Timeout & Retry 8. ✅ Request/Response Headers 9. ✅ WebSocket Support 10. ✅ Access Logs (JSON)

Nur Export (GAL → Envoy): 11. ⚠️ Lua Filters (Body Transformation, Basic Auth) 12. ⚠️ Local Rate Limiting 13. ⚠️ External Authorization (ext_authz)

Features mit Einschränkungen: - Rate Limiting: Benötigt externen lyft/ratelimit Service (nicht in GAL Scope) - TLS: Keine TLS Termination/Upstream TLS (muss manuell konfiguriert werden) - Advanced Routing: Keine Regex/Header/Query Matching - Tracing: Keine Distributed Tracing Integration (Zipkin/Jaeger/OTel)

Import-Beispiel (Envoy → GAL)¶

Input (envoy.yaml):

href="#__codelineno-23-1">static_resources: listeners: - name: listener_0 address: socket_address: address: 0.0.0.0 port_value: 10000 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: route_config: virtual_hosts: - name: backend domains: ["*"] routes: - match: prefix: /api route: cluster: api_cluster timeout: 30s clusters: - name: api_cluster connect_timeout: 5s type: STRICT_DNS lb_policy: ROUND_ROBIN load_assignment: cluster_name: api_cluster endpoints: - lb_endpoints: - endpoint: address: socket_address: address: backend.svc port_value: 8080

Output (gal-config.yaml):

version: "1.0"
provider: envoy
global:
  host: 0.0.0.0
  port: 10000
services:
  - name: backend
    type: rest
    protocol: http
    upstream:
      host: backend.svc
      port: 8080
      load_balancer:
        algorithm: round_robin
    routes:
      - path_prefix: /api
        timeout:
          read: "30s"
          connect: "5s"

Empfehlungen für zukünftige Erweiterungen¶

Priorität 1 (High Impact): 1. TLS Termination - Listener TLS Support (transport_socket) 2. Upstream TLS - Backend TLS Connections 3. Regex Routing - match.safe_regex für Advanced Routing 4. Header-based Routing - match.headers für A/B Testing 5. Traffic Splitting - weighted_clusters für Canary Deployments

Priorität 2 (Medium Impact): 6. Tracing Integration - Zipkin/Jaeger/OpenTelemetry 7. gRPC Health Checks - health_checks mit gRPC 8. Fault Injection - envoy.filters.http.fault für Chaos Testing 9. Buffer Limits - per_connection_buffer_limit_bytes 10. Circuit Breaker Limits - Vollständige circuit_breakers Config

Priorität 3 (Nice to Have): 11. Wasm Filters - WebAssembly Extensions 12. xDS API - Dynamic Configuration Support 13. gRPC-JSON Transcoder - gRPC → JSON Transformation 14. Compression - gzip Filter 15. Adaptive Concurrency - adaptive_concurrency Filter

Test Coverage (Import)¶

Envoy Import Tests: 15 Tests (test_import_envoy.py)

Test Kategorie	Tests	Status
Basic Import	3	✅ Passing
Clusters & Load Balancing	3	✅ Passing
Health Checks	2	✅ Passing
Routes & Timeouts	2	✅ Passing
Headers	1	✅ Passing
CORS	1	✅ Passing
WebSocket	1	✅ Passing
Errors & Warnings	2	✅ Passing

Coverage Verbesserung durch Import: 8% → 45% (+37%)

Roundtrip-Kompatibilität¶

Szenario	Roundtrip	Bemerkung
Basic Routing + LB	✅ 100%	Perfekt
Health Checks (Active)	✅ 100%	Perfekt
CORS + Headers	✅ 100%	Perfekt
JWT Authentication	✅ 100%	Perfekt
Timeout & Retry	✅ 95%	Retry-Details verloren
WebSocket	✅ 100%	Perfekt
Rate Limiting	⚠️ 60%	Externe Service-Config verloren
Body Transformation (Lua)	❌ 20%	Lua-Code nicht parsebar

Durchschnittliche Roundtrip-Kompatibilität: ~85%

Fazit¶

Envoy Import Coverage: - ✅ Core Features: 85% Coverage (Routing, LB, Health Checks, CORS, JWT) - ⚠️ Advanced Features: 25% Coverage (Tracing, TLS, Wasm, xDS) - ❌ Nicht unterstützt: Lua Parsing, xDS Dynamic Config, Advanced Filters

Envoy Export Coverage: - ✅ Core Features: 95% Coverage (alle GAL Features → Envoy) - ✅ Best Practices: Eingebaut (Timeouts, Retries, Health Checks) - ⚠️ Einschränkungen: Rate Limiting benötigt externen Service, kein TLS Auto-Config

Empfehlung: - 🚀 Für Standard API Gateway Workloads: Vollständig ausreichend - ⚠️ Für komplexe Envoy Setups (Lua, xDS, Tracing): Manuelle Nachbearbeitung nötig - 📚 Für Envoy → GAL Migration: 85% automatisiert, 15% Review

Referenzen: - 📚 Envoy Filter Reference - 📚 Envoy Cluster Configuration - 📚 Envoy Route Configuration - 📚 Envoy Network Filters

Envoy-spezifische Details¶

Configuration Structure¶

Envoy verwendet eine hierarchische YAML-Struktur:

envoy.yaml
├── admin (Admin Interface)
├── static_resources
│   ├── listeners (Ingress)
│   │   ├── filter_chains
│   │   │   ├── filters (HTTP Connection Manager)
│   │   │   │   ├── http_filters (JWT, Rate Limit, etc.)
│   │   │   │   └── route_config (Routing Rules)
│   │   │   │       └── virtual_hosts
│   │   │   │           └── routes (Path Matching)
│   │   │   │               └── route (Cluster Mapping)
│   ├── clusters (Upstreams)
│   │   ├── load_assignment (Endpoints)
│   │   ├── health_checks (Active HC)
│   │   └── outlier_detection (Passive HC)

Filters Architecture¶

Envoy's Macht liegt in seiner Filter-Chain:

Network Filters (L3/L4):
envoy.filters.network.http_connection_manager
envoy.filters.network.tcp_proxy
HTTP Filters (L7):
envoy.filters.http.router (Routing)
envoy.filters.http.jwt_authn (JWT)
envoy.filters.http.ratelimit (Rate Limiting)
envoy.filters.http.cors (CORS)
envoy.filters.http.lua (Custom Logic)
envoy.filters.http.ext_authz (External Auth)

Admin Interface¶

# Config Dump (aktuelle Config)
curl http://localhost:9901/config_dump

# Stats (Prometheus Format)
curl http://localhost:9901/stats/prometheus

# Clusters (Health Status)
curl http://localhost:9901/clusters

# Logging Level ändern (Runtime)
curl -X POST http://localhost:9901/logging?level=debug

Hot Reload¶

Envoy unterstützt Hot Reload ohne Downtime:

# Config validieren
envoy --mode validate -c new-envoy.yaml

# Hot Restart (zero-downtime)
envoy --restart-epoch 1 -c new-envoy.yaml

Advanced Features¶

1. xDS API (Dynamic Configuration)¶

Envoy unterstützt Dynamic Configuration via xDS (x Discovery Service):

LDS (Listener Discovery Service)
RDS (Route Discovery Service)
CDS (Cluster Discovery Service)
EDS (Endpoint Discovery Service)
SDS (Secret Discovery Service)

GAL generiert Static Config, aber Envoy kann mit Control Planes wie Istio, Envoy Gateway, oder Gloo arbeiten.

2. Lua Scripting¶

Envoy unterstützt Lua Filters für Custom Logic:

http_filters:
- name: envoy.filters.http.lua
  typed_config:
    inline_code: |
      function envoy_on_request(request_handle)
        request_handle:headers():add("x-custom", "value")
      end

GAL nutzt Lua für: - Basic Authentication - Body Transformation - Custom Request/Response Manipulation

3. External Authorization¶

http_filters:
- name: envoy.filters.http.ext_authz
  typed_config:
    grpc_service:
      envoy_grpc:
        cluster_name: ext_authz_cluster
    with_request_body:
      max_request_bytes: 8192

Externe Auth-Services (z.B. OPA, custom auth services) können Authorization Decisions treffen.

4. Metrics & Tracing¶

Prometheus Metrics:

admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

curl http://localhost:9901/stats/prometheus

Distributed Tracing:

tracing:
  http:
    name: envoy.tracers.zipkin
    typed_config:
      "@type": type.googleapis.com/envoy.config.trace.v3.ZipkinConfig
      collector_cluster: zipkin
      collector_endpoint: "/api/v2/spans"

Request Mirroring/Shadowing¶

Status: ✅ Native Support (request_mirror_policies)

Request Mirroring (Shadow Traffic) ermöglicht es, Requests an Shadow-Backends zu duplizieren, ohne die primäre Response zu beeinflussen. Ideal für Production Testing ohne User Impact.

Übersicht¶

Envoy unterstützt Request Mirroring nativ mit request_mirror_policies. GAL generiert automatisch die Mirror-Konfiguration mit Sample Percentage Support.

Features: - ✅ Native request_mirror_policies support - ✅ Sample percentage via runtime_fraction - ✅ Multiple mirror targets - ✅ Fire-and-forget (keine Response-Wartezeit) - ⚠️ Custom headers via zusätzliche Filter Chains

GAL-Konfiguration¶

version: "1.0"
provider: envoy

services:
  - name: api_service
    protocol: http
    upstream:
      host: api-v1
      port: 8080
    routes:
      - path_prefix: /api/users
        methods: [GET, POST]

        # Request Mirroring Configuration
        mirroring:
          enabled: true
          mirror_request_body: true
          mirror_headers: true
          targets:
            - name: shadow-v2
              upstream:
                host: shadow-api-v2
                port: 8080
              sample_percentage: 50.0  # 50% of traffic
              timeout: "5s"
              headers:
                X-Mirror: "true"
                X-Shadow-Version: "v2"

Generierte Envoy-Konfiguration¶

# Envoy Config (generiert von GAL)
static_resources:
  clusters:
    - name: primary_cluster
      connect_timeout: 5s
      type: STRICT_DNS
      load_assignment:
        cluster_name: primary_cluster
        endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: api-v1
                    port_value: 8080

    - name: shadow_cluster
      connect_timeout: 5s
      type: STRICT_DNS
      load_assignment:
        cluster_name: shadow_cluster
        endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: shadow-api-v2
                    port_value: 8080

  listeners:
    - name: listener_0
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                  - name: api_service
                    domains: ["*"]
                    routes:
                      - match:
                          prefix: "/api/users"
                        route:
                          cluster: primary_cluster
                          timeout: 30s
                          # Request Mirroring
                          request_mirror_policies:
                            - cluster: shadow_cluster
                              runtime_fraction:
                                default_value:
                                  numerator: 50    # 50% sampling
                                  denominator: HUNDRED

Use Cases¶

1. Canary Deployment Testing (10% Shadow Traffic)

mirroring:
  enabled: true
  targets:
    - name: canary-v2
      upstream:
        host: api-v2-canary
        port: 8080
      sample_percentage: 10.0  # Only 10% to detect issues

2. Performance Testing (100% Shadow Traffic)

mirroring:
  enabled: true
  targets:
    - name: load-test
      upstream:
        host: api-loadtest
        port: 8080
      sample_percentage: 100.0  # All traffic for performance testing
      timeout: "3s"

3. Multiple Shadow Targets (A/B/C Testing)

mirroring:
  enabled: true
  targets:
    - name: shadow-v2
      upstream:
        host: api-v2
        port: 8080
      sample_percentage: 50.0
    - name: shadow-v3
      upstream:
        host: api-v3
        port: 8080
      sample_percentage: 10.0

Deployment¶

# 1. GAL → Envoy Config generieren
gal generate --config config.yaml --provider envoy --output envoy.yaml

# 2. Envoy deployen (Docker)
docker run --rm -v $(pwd)/envoy.yaml:/etc/envoy/envoy.yaml \
  -p 8080:8080 -p 9901:9901 \
  envoyproxy/envoy:v1.31-latest

# 3. Mirroring testen
for i in {1..100}; do
  curl -s http://localhost:8080/api/users
done

# 4. Shadow Backend Metrics prüfen
curl http://shadow-api-v2:8080/metrics | grep request_count

Monitoring¶

Admin API Stats:

# Mirror Cluster Stats
curl http://localhost:9901/stats | grep shadow_cluster

# Expected Output:
# cluster.shadow_cluster.upstream_rq_total: 50  # ~50 requests (50%)
# cluster.shadow_cluster.upstream_rq_time: 15ms
# cluster.shadow_cluster.upstream_rq_success: 48
# cluster.shadow_cluster.upstream_rq_error: 2

Envoy Logs:

# Mirror requests in logs
docker logs envoy-container 2>&1 | grep shadow_cluster

# Example:
# [info] upstream_rq_total: cluster.shadow_cluster: 1
# [info] response_code: 200, cluster: shadow_cluster

Limitierungen¶

⚠️ Custom Headers: Keine direkte Header-Injection für Mirror-Requests (nutze Lua Filter als Workaround)
⚠️ Sample Percentage: Runtime-basiert (nicht exakt, nutzt runtime_fraction)
⚠️ Response Ignored: Shadow-Backend-Response wird komplett ignoriert (fire-and-forget)
⚠️ Timeouts: Mirror-Request-Timeout unabhängig vom Primary Request

Best Practices¶

Start with Low Sample Percentage (5-10%)
Verhindert Shadow-Backend-Überlastung
Findet Bugs mit minimalem Traffic
Monitor Shadow Backend Metrics
Separate Metrics-Erfassung für Shadow Traffic
Alert bei hoher Error-Rate im Shadow
Set Appropriate Timeouts
Shadow-Timeouts sollten kürzer sein als Primary (z.B. 3s vs 30s)
Verhindert lange Mirror-Request-Blockierung
Use Headers to Identify Mirror Traffic
X-Mirror: true Header für Shadow-Backend-Identifikation
Ermöglicht separate Log-Filtering

Troubleshooting¶

Problem: Mirror Requests erreichen Shadow Backend nicht

Diagnose:

# Cluster Status prüfen
curl http://localhost:9901/clusters | grep shadow_cluster

# Expected:
# cluster.shadow_cluster.membership_healthy: 1
# cluster.shadow_cluster.membership_total: 1

Lösung: - DNS-Auflösung prüfen: nslookup shadow-api-v2 - Shadow Backend Health: curl http://shadow-api-v2:8080/health - Envoy Logs prüfen: docker logs envoy-container

Problem: Zu viele/zu wenige Mirror Requests

Diagnose:

# Ratio berechnen
PRIMARY=$(curl -s http://localhost:9901/stats | grep primary_cluster.upstream_rq_total | awk '{print $2}')
SHADOW=$(curl -s http://localhost:9901/stats | grep shadow_cluster.upstream_rq_total | awk '{print $2}')
echo "Mirror Ratio: $(echo "scale=2; $SHADOW / $PRIMARY * 100" | bc)%"

Lösung: - runtime_fraction.default_value.numerator anpassen (50 = 50%) - Bei 1000+ Requests: ±5% Toleranz ist normal

Vollständige Dokumentation: Siehe Request Mirroring Guide