Linux 檢視檔案被那個程序寫資料
檔案被那個程序使用,寫資料不是用lsof可以找出來嗎,但現實情況是lsof沒找出來T_T
背景
centos7 在某一段時間監控報警磁碟使用率達99%,由於監控屬於概要形式資訊,沒有快照資訊的監控(能發現某程序的I/O,CPU消耗情況),所以需要在伺服器上去定時執行統計命令獲取快照資訊。
需要通過iostat -dx -k去檢視avgqu-sz、await、svctm、%util;
sar -u檢視%iowait、%user;
pidstat -d 檢視程序I/O讀寫的快照資訊
步驟
- 生成統計資訊檔案
cat>/tmp/at_task.sh<<EOF pidstat -d 2 >/tmp/pidstat_\`date +%F_%T\`.log 2>& 1 & sar -u 2>/tmp/sar_\`date +%F_%T\`.log 2>& 1 & while [ 1 ];do echo -n \`date +%T\` >>/tmp/iostat_\`date +%F\` 2>& 1&& iostat -dx -k 1 1 >>/tmp/iostat_\`date +%F\` 2>& 1; sleep 2; done & EOF
在while迴圈中使用iostat的原因是要輸出date +%T
時間,不然只有資料,沒有時間資訊也沒有什麼用
- 使用at 命令定時執行
at 15:14 today -f /tmp/at_task.sh
出現錯誤
Can't open /var/run/atd.pid to signal atd. No atd running?
重啟atd服務
service atd restart
重新開啟at定時任務
at 15:14 today -f /tmp/at_task.shjob 2 at Wed Mar 13 15:14:00 2019
得到如下快照資訊
iostat
15:13:35Linux 3.10.0-862.14.4.el7.x86_64 (ip-xxxxx)03/13/2019_x86_64_(4 CPU) Device:rrqm/swrqm/sr/sw/srkB/swkB/s avgrq-sz avgqu-szawait r_await w_awaitsvctm%util vda0.120.0717.3119.41580.7990.5236.570.092.394.420.570.722.63 scd00.000.000.000.000.000.006.000.000.280.280.000.250.00
sar
03:14:00 PMCPU%user%nice%system%iowait%steal%idle 03:14:02 PMall0.250.000.380.000.0099.37 03:14:04 PMall1.250.130.630.000.0097.99 03:14:06 PMall0.250.130.500.000.0099.12 03:14:08 PMall0.500.000.500.630.0098.37
pidstat
03:14:00 PMUIDPIDkB_rd/skB_wr/s kB_ccwr/sCommand 03:14:02 PM570090890.006.000.00uxxx 03:14:02 PM570091400.006.000.00uxxx 03:14:02 PM570092920.0010.000.00uxxx 03:14:02 PM0180840.002.000.00bash
kill 掉收集資訊的命令
ps -ef | egrep 'iostat|sar|pidstat|while' | grep -v grep | awk '{print $2}' | xargs -l kill
*但ps -ef | egrep 命令沒有獲取到while迴圈的pid,不kill掉該while迴圈,就會一直對/tmp/iostat_2019-03-13寫資料-_-*
通過lsof 沒有定位到開啟檔案的程序
lsof /tmp/iostat_2019-03-13 [root@ip-10-186-60-117 ~]# [root@ip-10-186-60-117 ~]#
通過lsof 可以定位到開啟mysql-error.log的程序
lsof /opt/mysql/data/5690/mysql-error.log COMMANDPIDUSERFDTYPE DEVICE SIZE/OFFNODE NAME mysqld12858 actiontech-universe1wREG253,16345 20083533 /opt/mysql/data/5690/mysql-error.log mysqld12858 actiontech-universe2wREG253,16345 20083533 /opt/mysql/data/5690/mysql-error.log
可見,某程序只有一隻持有某檔案的inode,才可以通過lsof檢視檔案在被那些程序使用
獲取寫檔案的程序號
安裝sysemtap
yum -y install systemtap
SystemTap 是對 Linux 核心監控和跟蹤的工具
利用systemtap中的inodewatch.stp工具來查詢寫檔案的程序號
得到檔案的inode
stat -c '%i' /tmp/iostat_2019-03-13 4210339
獲取檔案所在裝置的major,minor
ls -al /dev/vda1 brw-rw---- 1 root disk 253, 1 Jan 30 13:57 /dev/vda1
得到寫檔案的pid
stap /usr/share/systemtap/examples/io/inodewatch.stp 253 14210339 Checking "/lib/modules/3.10.0-862.14.4.el7.x86_64/build/.config" failed with error: No such file or directory Incorrect version or missing kernel-devel package, use: yum install kernel-devel-3.10.0-862.14.4.el7.x86_64
根據系統核心版本在kernel-devel rpm build for : Scientific Linux 7 網站上下載相應的kernal-devel包
wget ftp://ftp.pbone.net/mirror/ftp.scientificlinux.org/linux/scientific/7.2/x86_64/updates/security/kernel-devel-3.10.0-862.14.4.el7.x86_64.rpm rpm -ivh kernel-devel-3.10.0-862.14.4.el7.x86_64.rpm
再次執行stap
stap /usr/share/systemtap/examples/io/inodewatch.stp 253 1 4210339......Missing separate debuginfos, use: debuginfo-install kernel-3.10.0-862.14.4.el7.x86_64Pass 2: analysis failed. [man error::pass2]Number of similar error messages suppressed: 2.
安裝debuginfo kernal
debuginfo-install kernel-3.10.0-862.14.4.el7.x86_64 Verifying: kernel-debuginfo-common-x86_64-3.10.0-862.14.4.el7.x86_641/3 Verifying: yum-plugin-auto-update-debug-info-1.1.31-50.el7.noarch2/3 Verifying: kernel-debuginfo-3.10.0-862.14.4.el7.x86_643/3 Installed: kernel-debuginfo.x86_64 0:3.10.0-862.14.4.el7 yum-plugin-auto-update-debug-info.noarch 0:1.1.31-50.el7 Dependency Installed: kernel-debuginfo-common-x86_64.x86_64 0:3.10.0-862.14.4.el7
再次執行stap
stap /usr/share/systemtap/examples/io/inodewatch.stp 253 14210339 ERROR: module version mismatch (#1 SMP Tue Sep 25 14:32:52 CDT 2018 vs #1 SMP Wed Sep 26 15:12:11 UTC 2018), release 3.10.0-862.14.4.el7.x86_64 WARNING: /usr/bin/staprun exited with status: 1
新增 -v檢視詳細報錯 stap -v/usr/share/systemtap/examples/io/inodewatch.stp 253 14210339 Pass 1: parsed user script and 471 library scripts using 240276virt/41896res/3368shr/38600data kb, in 300usr/20sys/320real ms. Pass 2: analyzed script: 2 probes, 12 functions, 8 embeds, 0 globals using 399436virt/196284res/4744shr/197760data kb, in 1540usr/560sys/2106real ms. Pass 3: using cached /root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030.c Pass 4: using cached /root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030.ko Pass 5: starting run. ERROR: module version mismatch (#1 SMP Tue Sep 25 14:32:52 CDT 2018 vs #1 SMP Wed Sep 26 15:12:11 UTC 2018), release 3.10.0-862.14.4.el7.x86_64 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run completed in 0usr/10sys/38real ms. Pass 5: run failed.[man error::pass5]
修改
vim /usr/src/kernels/3.10.0-862.14.4.el7.x86_64/include/generated/compile.h #define UTS_VERSION "#1 SMP Tue Sep 25 14:32:52 CDT 2018" 改為 #define UTS_VERSION "#1 SMP Wed Sep 26 15:12:11 UTC 2018" rm -rf/root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030*
再次執行
stap /usr/share/systemtap/examples/io/inodewatch.stp 253 1 4210339
iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4683) vfs_write 0xfd00001/4210339 ............
可見已經得到了寫/tmp/iostat_date +%F
檔案的程序號,但程序號一直在打印出來,因為後臺程序iostat -dx -m 的在while迴圈中的,每隔sleep 2s 後就會執行一次iostat 產生新的pid。
***那要怎樣才能讓iostat -dx -m 停止寫/tmp/iostat_date +%F
檔案了?除了重啟大法好 $_$***
rm -rf 也不能終止後臺的while iostat程序寫檔案,刪除了檔案後,while迴圈又會生成新的檔案
rm -rf/tmp/iostat_2019-03-1* stat /tmp/iostat_2019-03-1* File: ‘/tmp/iostat_2019-03-13’ Size: 146700Blocks: 512IO Block: 4096regular file Device: fd01h/64769dInode: 4210339Links: 1 Access: (0644/-rw-r--r--)Uid: (0/root)Gid: (0/root) Access: 2019-03-14 16:07:26.211888899 +0800 Modify: 2019-03-14 16:18:17.854019793 +0800 Change: 2019-03-14 16:18:17.854019793 +0800
正確做法
cat>/tmp/iostat.sh<<EOF while [ 1 ];do echo -n \`date +%T\` >>/tmp/iostat_\`date +%F\` 2>& 1&& iostat -dx -m 1 1 >>/tmp/iostat_\`date +%F\` 2>& 1; sleep 2; done & EOF atnow + 1 minutetoday bash /tmp/iostat.sh #這樣就能方便的獲取到程序號pid了 ps -ef | grep iostat root859310 16:16 pts/200:00:00 bash /tmp/iostat.sh