ELK日志系统K8S部署

elasticsearch作为搜索的标配，有着很多的使用场景。之前有搭建过单机版的elasticsearch，现在搭建集群版本的elasticsearch

elasticsearch服务

此处不区分elaticsearch data, mater, ingest,如区分的话需要分别创建这几个类型相应的集群，另配置的内存大小等均自行相应设置

elasticsearch yaml文件

## 此处遇到了几个问题，
## 1： volumeClaimTemplates，让其自动创建nas的问题，需要StorageClass的设置 https://help.aliyun.com/document_detail/144398.html
## 2:  elasticsearch.yml配置问题 discovery.seed_hosts  cluster.initial_master_nodes 集群仍503，有问题
## 3:  nas并未挂载到集群中，与老的1.11版本的k8s不一致，需要mountPath类型不能为empty，需要写上名称。 还说重启完之类密码怎么变了呢？

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: elasticsearch
  name: elasticsearch
  namespace: elastic
spec:
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: elasticsearch
  serviceName: elasticsearch
  template:
    metadata:
      annotations:
        redeploy-timestamp: '1640165941583'
      labels:
        app: elasticsearch
    spec:
      containers:
        - env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: ES_JAVA_OPTS
              value: '-Xms14g -Xmx14g'
          image: 'elasticsearch:7.10.1'
          imagePullPolicy: IfNotPresent
          name: elasticsearch-data
          ports:
            - containerPort: 9200
              name: http
              protocol: TCP
            - containerPort: 9300
              name: transport
              protocol: TCP
          resources:
            limits:
              cpu: '4'
              memory: 16Gi
            requests:
              cpu: '4'
              memory: 14Gi
          securityContext:
            capabilities:
              add:
                - IPC_LOCK
                - SYS_RESOURCE
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
              name: volume-1640071534570
              subPath: elasticsearch.yml
            - mountPath: /usr/share/elasticsearch/config/elastic-certificates.p12
              name: volume-1640071564578
              subPath: elastic-certificates.p12
            - mountPath: /usr/share/elasticsearch/data
              name: elasticsearch
            - mountPath: /usr/share/elasticsearch/config/jvm.options
              name: volume-1640159109924
              subPath: jvm.options
      dnsPolicy: ClusterFirst
      initContainers:
        - command:
            - /sbin/sysctl
            - '-w'
            - vm.max_map_count=262144
          image: 'alpine:3.9'
          imagePullPolicy: IfNotPresent
          name: elasticsearch-init
          resources: {}
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      nodeSelector:
        group: d
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1000
      terminationGracePeriodSeconds: 30
      volumes:
        - configMap:
            defaultMode: 420
            name: elasticsearch-config
          name: volume-1640071534570
        - name: volume-1640071564578
          secret:
            defaultMode: 420
            secretName: es-keystore
        - emptyDir: {}
          name: volume-1640071578299
        - configMap:
            defaultMode: 420
            name: elasticsearch-jvm
          name: volume-1640159109924
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: elasticsearch
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: elasticsearch
        volumeMode: Filesystem
      status:
        phase: Pending

elasticsearch.yaml config配置文件：

此种特别注意node.master, node.ingest, node.data,最开始集群状态不正常，不能修改密码，以及discovery.seed_hosts, cluster.initial_master_nodes 参数，最后加上xpack安全密码的

cluster.name: "elasticsearch"
node.name: ${POD_NAME}.elasticsearch.elastic.svc.cluster.local
network.host: 0.0.0.0
http.port: 9200
transport.tcp.port: 9300

node.master: true
node.ingest: true
node.data: true
bootstrap.memory_lock: false

discovery.seed_hosts: ["elasticsearch-0.elasticsearch.elastic.svc.cluster.local","elasticsearch-1.elasticsearch.elastic.svc.cluster.local","elasticsearch-2.elasticsearch.elastic.svc.cluster.local"]
cluster.initial_master_nodes:  ["elasticsearch-0.elasticsearch.elastic.svc.cluster.local","elasticsearch-1.elasticsearch.elastic.svc.cluster.local","elasticsearch-2.elasticsearch.elastic.svc.cluster.local"]

# xpack.security.enabled: false
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: "certificate"
xpack.security.transport.ssl.keystore.path: "/usr/share/elasticsearch/config/elastic-certificates.p12"
xpack.security.transport.ssl.truststore.path: "/usr/share/elasticsearch/config/elastic-certificates.p12"

如果按类型来部署的话，配置文件有些许区别

ingress :

1
2
3

node.master: false 
node.data: false 
node.ingest: true

data

1
2
3

node.master: false 
node.data: true
node.ingest: false

master

1
2
3

node.master: true 
node.data: false 
node.ingest: false

elasticsearch-jvm配置文件：

由于在ES_JAVA_OPTS中写入了-Xms -Xmx，启动时也有默认的1g的，所以这个配置文件也从jvm去读了


8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

-Djava.io.tmpdir=${ES_TMPDIR}


-XX:+HeapDumpOnOutOfMemoryError

-XX:HeapDumpPath=data

-XX:ErrorFile=logs/hs_err_pid%p.log

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m

es-keystore 密钥文件：

用于加密的需要这个配置文件

kibana配置：

较为简单，配置文件，启动deployment

server.name: kibana
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: "http://elasticsearch:9200"
xpack.monitoring.ui.container.elasticsearch.enabled: true
elasticsearch.username: kibana
elasticsearch.password: mypassword

ingress加个 nginx.ingress.kubernetes.io/whitelist-source-range
密码使用elasticsearch创建的密码，可以使用elastic等帐号

logstash配置：

logstash.yaml配置文件：


apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    app: logstash
  name: logstash
  namespace: elastic
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: logstash
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        redeploy-timestamp: '1639469723595'
      labels:
        app: logstash
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: group
                    operator: In
                    values:
                      - d
      containers:
        - env:
            - name: XPACK_MONITORING_ENABLED
              value: 'true'
            - name: xpack.monitoring.elasticsearch.hosts
              value: 'http://elasticsearch:9200'
            - name: xpack.monitoring.elasticsearch.username
              value: elastic
            - name: xpack.monitoring.elasticsearch.password
              value: mypassword
            - name: pipeline.workers
              value: '6'
            - name: pipeline.batch.size
              value: '4000'
          image: 'elastic/logstash:7.10.1'
          imagePullPolicy: IfNotPresent
          name: logstash
          resources:
            limits:
              cpu: '2'
              memory: 5Gi
            requests:
              cpu: '2'
              memory: 4Gi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /usr/share/logstash/pipeline/
              name: volume-1640225693496
            - mountPath: /usr/share/logstash/config/jvm.options
              name: volume-1640234037039
              subPath: jvm.options
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1000
      terminationGracePeriodSeconds: 30
      tolerations:
        - effect: NoSchedule
          key: devops
          operator: Equal
          value: run
      volumes:
        - configMap:
            defaultMode: 420
            name: logstash-pipeline-config
          name: volume-1640225693496
        - configMap:
            defaultMode: 420
            name: logstash-jvm
          name: volume-1640234037039

对应的配置文件：
logstash.yml

http.host: 0.0.0.0
xpack.monitoring.elasticsearch.hosts: http://elasticsearch:9200
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.username: "elastic" 
xpack.monitoring.elasticsearch.password: "mypassword"
```  

pipeline文件 logstash.conf:
```yaml
input {
  redis {
    host => "r-bp1d17217fa77e14756.redis.rds.aliyuncs.com"
    port => 6379
    password => "myredis"
    data_type => "list"
    key => "filebeat2"
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logstash-%{[fields][project]}-%{+YYYY-MM-dd}"
    user => "elastic"
    password => "mypassword"
  }
}

heartbeat URL监控：

较简单，使用elastic/heartbeat:7.10.1镜像

配置文件 /usr/share/heartbeat/heartbeat.yml

heartbeat.config.monitors:
  reload.enabled: true
  reload.period: 60s
output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  username: "elastic"
  password: "mypassword"
setup.kibana:
  host: "kibana:5601"
heartbeat.monitors:
- type: http
  name: '官网'
  schedule: '@every 30s'
  urls: ["https://www.baidu.com", "https://api.xxx.com"]
  check.response.status: 200

过程中问题处理

设置密码报503错误, 此处应为elasticsearch配置文件写得不对; 配置文件对了的时候，可能就是帐号密码不对了；可以先将xpack包的密码认证去掉查看集群是否正常。

# ./bin/elasticsearch-setup-passwords interactive

Failed to determine the health of the cluster running at http://192.168.11.96:9200
Unexpected response code [503] from calling GET http://192.168.11.96:9200/_cluster/health?pretty
Cause: master_not_discovered_exception

服务重启之后，密码不对了，原因是nas并没能挂载上去，data数据写在了empty dir了, 另外密码还是别设置不太好用的字符串如!.@之类，防不好命令行测试

jvm.options文件中写了-Xms -Xmx ，如果配置文件中不注释掉的话，ES_JAVA_OPTS中注入的参数也会有，配置文件中的也会有，有两次这个参数出现

一些基础的组件单独为一个资源池进行调度，打上污点，然后需要的组件进行污点容忍，就不会出现乱调度以及对应的资源情况不情楚的。如此ES我使用了4 cpu, 14g， logstash 2cpu 4g 那3台8c32g机器所剩不多了，k8s集群中request多少都有数的，什么时候加节点，加机器。