k8s與etcd--備份etcd資料到s3
前言
整個k8s諸多元件幾乎都是無狀態的,所有的資料儲存在etcd裡,可以說etcd是整個k8s叢集的資料庫。可想而知,etcd的重要性。因而做好etcd資料備份工作至關重要。這篇主要講一下我司的相關的實踐。
備份etcd資料到s3
能做etcd的備份方案很多,但是大同小異,基本上都是利用了etcdctl命令來完成。
為什麼選擇s3那?
- 因為我們單位對於aws使用比較多,另外我們希望我們備份到一個高可用的儲存中,而不是部署etcd的本機中。
- 此外,s3支援儲存的生命週期的設定。設定一下,就可以aws幫助我們定時刪除舊資料,保留新的備份資料。
具體方案
我們基本上用了ofollow,noindex" target="_blank">etcd-backup 這個專案,當然也fork了,做了稍微的更改,主要是更改了dockerfile。將etcdctl 修改為我們線上實際的版本。
修改之後的dockerfile如下:
FROM alpine:3.8 RUN apk add --no-cache curl # Get etcdctl ENV ETCD_VER=v3.2.24 RUN \ cd /tmp && \ curl -L https://storage.googleapis.com/etcd/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz | \ tar xz -C /usr/local/bin --strip-components=1 COPY ./etcd-backup / ENTRYPOINT ["/etcd-backup"] CMD ["-h"]
之後就是docker build之類了。
k8s部署方案
選擇k8s中的cronjob比較合適,我的備份策略是每三小時備份一次。
cronjob.yaml:
apiVersion: batch/v1beta1 kind: CronJob metadata: name: etcd-backup namespace: kube-system spec: schedule: "0 */4 * * *" successfulJobsHistoryLimit: 2 failedJobsHistoryLimit: 2 jobTemplate: spec: # Job timeout activeDeadlineSeconds: 300 template: spec: tolerations: # Tolerate master taint - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule # Container creates etcd backups. # Run container in host network mode on G8s masters # to be able to use 127.0.0.1 as etcd address. # For etcd v2 backups container should have access # to etcd data directory. To achive that, # mount /var/lib/etcd3 as a volume. nodeSelector: node-role.kubernetes.io/master: "" containers: - name: etcd-backup image: iyacontrol/etcd-backup:0.1 args: # backup guest clusters only on production instalations # testing installation can have many broken guest clusters - -prefix=k8s-prod-1 - -etcd-v2-datadir=/var/lib/etcd - -etcd-v3-endpoints=https://172.xx.xx.221:2379,https://172.xx.xx.83:2379,https://172.xx.xx.246:2379 - -etcd-v3-cacert=/certs/ca.crt - -etcd-v3-cert=/certs/server.crt - -etcd-v3-key=/certs/server.key - -aws-s3-bucket=mybucket - -aws-s3-region=us-east-1 volumeMounts: - mountPath: /var/lib/etcd name: etcd-datadir - mountPath: /certs name: etcd-certs env: - name: ETCDBACKUP_AWS_ACCESS_KEY valueFrom: secretKeyRef: name: etcd-backup key: ETCDBACKUP_AWS_ACCESS_KEY - name: ETCDBACKUP_AWS_SECRET_KEY valueFrom: secretKeyRef: name: etcd-backup key: ETCDBACKUP_AWS_SECRET_KEY - name: ETCDBACKUP_PASSPHRASE valueFrom: secretKeyRef: name: etcd-backup key: ETCDBACKUP_PASSPHRASE volumes: - name: etcd-datadir hostPath: path: /var/lib/etcd - name: etcd-certs hostPath: path: /etc/kubernetes/pki/etcd/ # Do not restart pod, job takes care on restarting failed pod. restartPolicy: Never hostNetwork: true
注意:容忍 和 nodeselector配合,讓pod排程到master節點上。
然後secret.yaml:
apiVersion: v1 kind: Secret metadata: name: etcd-backup namespace: kube-system type: Opaque data: ETCDBACKUP_AWS_ACCESS_KEY: QUtJTI0TktCT0xQRlEK ETCDBACKUP_AWS_SECRET_KEY: aXJ6eThjQnM2MVRaSkdGMGxDeHhoeFZNUDU4ZGRNbgo= ETCDBACKUP_PASSPHRASE: ""
總結
之前我們嘗試過,etcd-operator來完成backup。實際使用過程中,發現並不好,概念很多,元件複雜,程式碼很多寫法太死。
最後選擇etcd-backup。主要是因為簡單,less is more。看原始碼,用golang編寫,擴充套件自己的一些需求,也比較簡單。