跳过正文
  1. Posts/

Istio Gateway 生产部署最佳实践

·1141 字·6 分钟
王二麻
作者
王二麻
混迹 Linux 运维多年,专注 Kubernetes 生产实战、Golang 工具开发与稳定性工程。写点踩坑心得,聊聊技术人生。
Kubernetes 实战 - 这篇文章属于一个选集。
§ 11: 本文

概述
#

本文是 Istio Gateway 部署方式深度分析 的配套最佳实践指南,提供生产级的完整配置方案。

推荐架构:DaemonSet + HostNetwork + 专用节点 + CLB 权重 Controller

flowchart TB
    subgraph 公网
        User[用户]
    end
    
    subgraph 云服务商
        DDoS[DDoS 防护]
        WAF[WAF]
        CLB[CLB 四层]
    end
    
    subgraph Kubernetes Cluster
        subgraph Controller
            CLC[CLB Weight Controller]
        end
        subgraph GatewayNodes[网关节点池]
            subgraph GW1[gateway-node-1]
                IGW1[IngressGateway
DaemonSet] end subgraph GW2[gateway-node-2] IGW2[IngressGateway
DaemonSet] end subgraph GW3[gateway-node-3] IGW3[IngressGateway
DaemonSet] end end subgraph WorkerNodes[业务节点池] SVC[后端服务] end end User --> DDoS --> WAF --> CLB CLB --> IGW1 & IGW2 & IGW3 IGW1 & IGW2 & IGW3 --> SVC CLC -.->|Watch Pod| IGW1 & IGW2 & IGW3 CLC -.->|调整权重| CLB

方案选择
#

场景推荐方式原因
中小规模,运维简单优先Deployment + Service标准模式,易于理解和维护
大流量,低延迟要求DaemonSet + HostNetwork性能最优,专用节点
需要保留源 IPDeployment + Local兼顾灵活性与源 IP 保留
弹性扩缩,成本敏感Deployment + HPA按需扩缩,资源利用率高

本文聚焦 DaemonSet + HostNetwork 方案的完整实现。

完整配置
#

1. 网关节点准备
#

# 给网关节点打标签
kubectl label node gateway-node-1 node-role=gateway
kubectl label node gateway-node-2 node-role=gateway
kubectl label node gateway-node-3 node-role=gateway

# 添加污点,防止普通 Pod 调度
kubectl taint nodes gateway-node-1 node-role=gateway:NoSchedule
kubectl taint nodes gateway-node-2 node-role=gateway:NoSchedule
kubectl taint nodes gateway-node-3 node-role=gateway:NoSchedule

2. DaemonSet 配置
#

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: istio-ingressgateway
  namespace: istio-system
  labels:
    app: istio-ingressgateway
    istio: ingressgateway
spec:
  selector:
    matchLabels:
      app: istio-ingressgateway
      istio: ingressgateway
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: istio-ingressgateway
        istio: ingressgateway
      annotations:
        # Prometheus 监控
        prometheus.io/scrape: "true"
        prometheus.io/port: "15020"
    spec:
      # HostNetwork 模式
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      
      # 优雅终止时间(preStop 300s + 30s 缓冲)
      terminationGracePeriodSeconds: 330
      
      # 专用节点调度
      nodeSelector:
        node-role: gateway
      tolerations:
      - key: node-role
        value: gateway
        effect: NoSchedule
      
      # 服务账号
      serviceAccountName: istio-ingressgateway-service-account
      
      containers:
      - name: istio-proxy
        image: docker.io/istio/proxyv2:1.20.0
        args:
        - proxy
        - router
        - --domain
        - $(POD_NAMESPACE).svc.cluster.local
        - --proxyLogLevel=warning
        - --proxyComponentLogLevel=misc:error
        - --log_output_level=default:info
        
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        - containerPort: 15021
          name: status-port
          protocol: TCP
        - containerPort: 15090
          name: http-envoy-prom
          protocol: TCP
        
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: INSTANCE_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        
        resources:
          requests:
            cpu: "1"
            memory: 1Gi
          limits:
            cpu: "2"
            memory: 2Gi
        
        # 就绪探针
        readinessProbe:
          httpGet:
            path: /healthz/ready
            port: 15021
          initialDelaySeconds: 1
          periodSeconds: 2
          failureThreshold: 30
          successThreshold: 1
        
        # 存活探针
        livenessProbe:
          httpGet:
            path: /healthz/ready
            port: 15021
          initialDelaySeconds: 10
          periodSeconds: 10
          failureThreshold: 3
        
        # 优雅终止
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                # 1. 触发 Envoy drain(停止接收新请求)
                pilot-agent request POST drain_listeners
                
                # 2. 等待长连接排空
                # Finalizer Controller 已经把 CLB 权重设为 0
                # 这里等待已有连接处理完成
                sleep 300
        
        volumeMounts:
        - name: istio-envoy
          mountPath: /etc/istio/proxy
        - name: config-volume
          mountPath: /etc/istio/config
        - name: istiod-ca-cert
          mountPath: /var/run/secrets/istio
        - name: istio-token
          mountPath: /var/run/secrets/tokens
          readOnly: true
        - name: istio-data
          mountPath: /var/lib/istio/data
      
      volumes:
      - name: istio-envoy
        emptyDir:
          medium: Memory
      - name: config-volume
        configMap:
          name: istio
          optional: true
      - name: istiod-ca-cert
        configMap:
          name: istio-ca-root-cert
      - name: istio-token
        projected:
          sources:
          - serviceAccountToken:
              path: istio-token
              expirationSeconds: 43200
              audience: istio-ca
      - name: istio-data
        emptyDir: {}

3. PodDisruptionBudget
#

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: istio-ingressgateway
  namespace: istio-system
spec:
  minAvailable: 2  # 至少保持 2 个 Pod 可用
  selector:
    matchLabels:
      app: istio-ingressgateway
      istio: ingressgateway

4. CLB 健康检查配置
#

参数推荐值说明
协议HTTP七层健康检查
端口15021Istio 健康检查端口
路径/healthz/readyIstio 就绪端点
检查间隔2s快速感知
不健康阈值24s 后摘除
健康阈值24s 后加回

5. CLB Weight Controller
#

核心机制:Finalizer + Controller

事件Controller 动作CLB 权重
Pod 创建且 Ready加入 CLB 后端100
Pod Ready → NotReady调整权重0
Pod 进入 Terminating调整权重,等待后移除 Finalizer0
Pod 删除从 CLB 移除-

时序图:

sequenceDiagram
    participant User as kubectl
    participant API as API Server
    participant Ctrl as CLB Controller
    participant CLB as CLB
    participant Pod as Gateway Pod

    User->>API: delete Pod
    API->>Pod: DeletionTimestamp
    Note over Pod: Terminating
Finalizer 阻止删除 rect rgb(255, 245, 230) Note over Ctrl,CLB: Finalizer 阶段(2-3s) Ctrl->>CLB: SetWeight(ip, 0) Ctrl->>Ctrl: 等待 3s Note over CLB: 新连接已停止 end Ctrl->>API: 移除 Finalizer API->>Pod: SIGTERM rect rgb(230, 255, 230) Note over Pod: preStop 阶段(300s) Pod->>Pod: drain_listeners Note over Pod: 停止接收新请求
继续处理已有连接 Pod->>Pod: sleep 300 Note over Pod: 等待长连接排空 end Pod->>Pod: 进程退出

Controller 核心代码:

package controller

import (
    "context"
    "time"

    corev1 "k8s.io/api/core/v1"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"
)

const FinalizerName = "clb.example.com/weight-controller"

type CLBWeightReconciler struct {
    client.Client
    CLBClient CLBClient
    Config    Config
}

func (r *CLBWeightReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
    pod := &corev1.Pod{}
    if err := r.Get(ctx, req.NamespacedName, pod); err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }

    // 检查是否是我们关心的 Pod
    if !r.shouldManage(pod) {
        return reconcile.Result{}, nil
    }

    // Pod 正在删除
    if !pod.DeletionTimestamp.IsZero() {
        return r.handleTerminating(ctx, pod)
    }
    // Pod 正常运行
    return r.handleRunning(ctx, pod)
}

func (r *CLBWeightReconciler) handleRunning(ctx context.Context, pod *corev1.Pod) (reconcile.Result, error) {
    // 确保有 Finalizer
    if !controllerutil.ContainsFinalizer(pod, FinalizerName) {
        controllerutil.AddFinalizer(pod, FinalizerName)
        return reconcile.Result{}, r.Update(ctx, pod)
    }

    // 根据 Ready 状态设置权重
    if isPodReady(pod) {
        if err := r.CLBClient.SetWeight(r.Config.CLBID, pod.Status.PodIP, 100); err != nil {
            return reconcile.Result{RequeueAfter: 5 * time.Second}, err
        }
    } else {
        if err := r.CLBClient.SetWeight(r.Config.CLBID, pod.Status.PodIP, 0); err != nil {
            return reconcile.Result{RequeueAfter: 5 * time.Second}, err
        }
    }
    return reconcile.Result{}, nil
}

func (r *CLBWeightReconciler) handleTerminating(ctx context.Context, pod *corev1.Pod) (reconcile.Result, error) {
    if !controllerutil.ContainsFinalizer(pod, FinalizerName) {
        return reconcile.Result{}, nil
    }

    // 1. 设置 CLB 权重为 0
    if err := r.CLBClient.SetWeight(r.Config.CLBID, pod.Status.PodIP, 0); err != nil {
        return reconcile.Result{RequeueAfter: 5 * time.Second}, err
    }

    // 2. 检查是否已等待足够时间(CLB 生效)
    drainStart := getDrainAnnotation(pod)
    if drainStart.IsZero() {
        setDrainAnnotation(ctx, r.Client, pod)
        return reconcile.Result{RequeueAfter: 3 * time.Second}, nil
    }
    if time.Since(drainStart) < 3*time.Second {
        return reconcile.Result{RequeueAfter: 3 * time.Second}, nil
    }

    // 3. 移除 Finalizer,让 Pod 进入 preStop
    // 长连接等待交给 preStop 处理
    controllerutil.RemoveFinalizer(pod, FinalizerName)
    return reconcile.Result{}, r.Update(ctx, pod)
}

func (r *CLBWeightReconciler) shouldManage(pod *corev1.Pod) bool {
    return pod.Labels["app"] == "istio-ingressgateway"
}

func isPodReady(pod *corev1.Pod) bool {
    for _, cond := range pod.Status.Conditions {
        if cond.Type == corev1.PodReady && cond.Status == corev1.ConditionTrue {
            return true
        }
    }
    return false
}

6. Gateway 配置
#

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: istio-ingressgateway
  namespace: istio-system
spec:
  selector:
    app: istio-ingressgateway  # 匹配 DaemonSet Pod
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - "*"
    tls:
      mode: SIMPLE
      credentialName: gateway-tls-secret

检查清单
#

部署前确认:

  • 网关节点已打标签和污点
  • DaemonSet 配置了 hostNetwork: true
  • terminationGracePeriodSeconds >= preStop sleep + 30s
  • CLB 健康检查配置正确(HTTP /healthz/ready
  • PDB 配置了 minAvailable
  • CLB Weight Controller 已部署
  • Gateway 资源的 selector 与 Pod 标签匹配

运维操作
#

滚动更新
#

# 更新镜像
kubectl set image daemonset/istio-ingressgateway \
  istio-proxy=docker.io/istio/proxyv2:1.21.0 \
  -n istio-system

# 监控更新状态
kubectl rollout status daemonset/istio-ingressgateway -n istio-system

节点维护
#

# 1. 排空节点(会触发 Pod 优雅终止)
kubectl drain gateway-node-1 --ignore-daemonsets --delete-emptydir-data

# 2. 维护完成后恢复
kubectl uncordon gateway-node-1

故障排查
#

# 检查 Pod 状态
kubectl get pods -n istio-system -l app=istio-ingressgateway -o wide

# 检查健康检查端点
kubectl exec -it <pod-name> -n istio-system -- curl localhost:15021/healthz/ready

# 检查 Envoy 配置
kubectl exec -it <pod-name> -n istio-system -- pilot-agent request GET config_dump

# 检查 Controller 日志
kubectl logs -n istio-system -l app=clb-weight-controller

总结
#

组件配置要点
DaemonSethostNetwork + 专用节点 + terminationGracePeriodSeconds: 330
preStopdrain_listeners + sleep 300
CLBHTTP 健康检查 /healthz/ready,间隔 2s
ControllerFinalizer 机制,等待 3s 后移除
PDBminAvailable: 2

核心原则:

Finalizer:快进快出(2-3s)→ 负责摘 CLB 流量
preStop:慢慢等(300s)→ 负责排空长连接

更多原理分析请参考:Istio Gateway 部署方式深度分析

Kubernetes 实战 - 这篇文章属于一个选集。
§ 11: 本文

相关文章