通過Prometheus查詢計算Kubernetes叢集中的容器CPU、記憶體使用率等指標
說明
Kubernetes的kubelet元件內建了cadvisor,將Node上容器的指標以Prometheus支援的格式展示,可以通過這些指標計算得到更多有用的資料。
Kubelet的Cadvisor指標獲取
直接訪問Kubelet的10255埠,可以讀取以Prometheus支援的格式呈現的指標:
$ curl http://192.168.88.10:10255/metrics/cadvisor # HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision. # TYPE cadvisor_version_info gauge cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="17.05.0-ce",kernelVersion="3.10.0-693.11.6.el7.x86_64",osVersion="CentOS Linux 7 (Core)"} 1 # HELP container_cpu_load_average_10s Value of container cpu load average over the last 10 seconds. # TYPE container_cpu_load_average_10s gauge container_cpu_load_average_10s{container_name="",id="/",image="",name="",namespace="",pod_name=""} 1 container_cpu_load_average_10s{container_name="POD",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1a666636_a687_11e8_9cc4_525400160f15.slice/docker-e433276784317535e206d33e8e703a7360de86402c8b3e0b335e0d8071edde72.scope",image="registry.aliyuncs.com/archon/pause-amd64:3.0",name="k8s_POD_prometheus-node-exporter-4mck8_default_1a666636-a687-11e8-9cc4-525400160f15_1",namespace="default",pod_name="prometheus-node-exporter-4mck8"} 0 ...
在Prometheus的配置檔案中,配置了相關的Target之後,這些指標就可以從Prometheus中查詢到。見:ofollow,noindex" target="_blank">新型監控告警工具prometheus(普羅米修斯)入門使用(附視訊講解)
容器CPU使用率的計算
從man top
手冊中找到了CPU使用率的定義:
1. %CPU--CPU Usage The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time. In a true SMP environment, if a process is multi-threaded and top is not operating in Threads mode, amounts greater than 100% may be reported.You toggle Threads mode with the `H' inter-active command. Also for multi-processor environments, if Irix mode is Off, top will operate in Solaris mode where a task's cpu usage will be divided by the total number of CPUs.You toggle Irix/Solaris modes with the `I' interactive command.
即在過去的一段時間裡程序佔用的CPU時間與CPU總時間的比率,如果有多個CPU或者多核,需要將每個CPU的時間相加。
kubelet中的cadvisor採集的指標與含義,見:Monitoring cAdvisor with Prometheus 。
其中有一項是:
container_cpu_usage_seconds_total Counter Cumulative cpu time consumed seconds
container_cpu_usage_seconds_total
是container累計使用的CPU時間,用它除以CPU的總時間,就得到了容器的CPU使用率:
先計算出容器的CPU佔用時間,因為Node上的CPU有多個,需要將容器在每個CPU上佔用的時間累加起來:
sum( delta( container_cpu_usage_seconds_total {container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"}[1m] ) )
然後計算CPU的總時間,這裡的CPU數量是容器分配到CPU數量,公式如下:
sum( container_spec_cpu_shares {container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"} ) / 1024 * 60
container_spec_cpu_shares
是容器的CPU配額,它的值是:為容器指定的CPU個數*1024。
將上面兩個公式的結果相除,就得到了容器的CPU使用率:
sum( delta( container_cpu_usage_seconds_total {container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"}[1m] ) ) / ( sum( container_spec_cpu_shares {container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"} ) / 1024 * 60 )
寫成一行就是:
sum(delta(container_cpu_usage_seconds_total{container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"}[1m])) / (sum(container_spec_cpu_shares{container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"}) /1024 * 60)
上面使用delta()
計算增量,還可以用irate()
直接計算比率:
sum(irate(container_cpu_usage_seconds_total{container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"}[1m]))/(sum(container_spec_cpu_shares{container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"})/1024))
容器記憶體使用率的計算
容器記憶體使用率的計算就簡單多了,直接用CPU使用量除以CPU配額即可:
container_memory_rss{container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"} / container_spec_memory_limit_bytes{container_name="store-app-server",pod_name="store-app-server-2959171753-6kgll"}