I don't know if the zombie process is the reason why the display cannot be accurate.
However, in most cases, the cause of the zombie process is due to the detection of a memory leak or because the process has become unresponsive (i.e., it often occurs when PAN-OS kills the process by detecting them).
For this reason, it is recommended to grep the log on FW with ”fail”.
grep mp-log m*detail* pattern fail
If you are actually using the PA-220 and not in a lab environment, I would suggest that you output the Tech Support File, decompress the archive, and then grep it on your computer (since the PA-220 is poor).
The name of the log file containing the process details should probably correspond to "m*detail*".
I don't remember exactly what it is, so it would be a good idea to ask you to find a file with a similar name.
on the PA220 that may be a display bug in the management interface. I also notice a higher load indicated in the GUI (40-70%) where the actual CPU cores for MP (Cpu0 and Cpu3) are lightly loaded (99.3 and 100% idle)
top - 11:11:37 up 23 days, 11:26, 1 user, load average: 3.16, 3.02, 3.08
If the utilization of Cpu0 and Cpu3 is low, and the utilization of Cpu1 and Cpu2 is high, then in the case of VM-300, the network processing may not be working well.
If you're a PA-220, you can think of it as something like that, and I think it is.
I misunderstood some of the details of the process and function names. To put it simply, it's normal for pan_task to indicate 100%; pan_task represents a process that represents the data plane and specifies the CPU core to be used by setting the CPU affinity. However, in the case of a VM-300, for example, it is not possible to set CPU affinity for physical cores, which can cause latency. This problem requires some tuning on the hypervisor side, to be exact, but in most cases it can be largely solved by simply enabling features such as SR-IOV properly.
In my experience, if pan_task(task_all) is using a lot of CPU resources, it is most likely due to the slow response of the hypervisor (so-called CPU steal) caused by SR-IOV or DPDK not being properly enabled.
Even in the above case, unless there is a fatal problem, such as an obvious lack of memory bands, if the data plane utilization is less than 40% to 50%, the problem is unlikely to occur.
If it is greater than that, the data plane may stop responding. If there is a suspicious event, do a grep against the data plane log and if you find a log with a failing task_all response, you are probably experiencing underperformance due to it.
As for other devices (e.g. PA-220), I think it could be due to some defect.
Well, in many cases, it's no use worrying about it.
You can start by checking which process is using the most cpu by using this command:Show system resources(If you see pan_task, you're on a small factor platform that shares mgmt and dataplane, ignore those)
Perhaps because it is a display issue, it is considered a low priority and may be neglected by Palo Alto.
In the meantime, there are some issues that may be relevant, so if you're worried about them, I think you can upgrade PAN-OS.
PAN-152106
Fixed an issue where a process (genindex.sh) caused the management plane CPU usage to remain high for a longer period of time than expected
https://docs.paloaltonetworks.com/pan-os/9-1/pan-os-release-notes/pan-os-9-1-addressed-issues/pan-os-9-1-6-addressed-issues.html#panos-addressed-issues-9.1.6
There is one zombie I try to fix that high cpu load...
top - 12:07:27 up 72 days, 1:28, 1 user, load average: 2.40, 2.32, 2.29
Tasks: 137 total, 3 running, 133 sleeping, 0 stopped, 1 zombie
%Cpu(s): 51.8 us, 1.1 sy, 0.4 ni, 46.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 4119652 total, 244696 free, 1822364 used, 2052592 buff/cache
KiB Swap: 1972 total, 1972 free, 0 used. 1809056 avail Mem
@Reaper I'm very curious about the existence of the zombie process ( 1 zombie).
on the PA220 that may be a display bug in the management interface. I also notice a higher load indicated in the GUI (40-70%) where the actual CPU cores for MP (Cpu0 and Cpu3) are lightly loaded (99.3 and 100% idle)
top - 11:11:37 up 23 days, 11:26, 1 user, load average: 3.16, 3.02, 3.08
Tasks: 147 total, 4 running, 142 sleeping, 0 stopped, 1 zombie
%Cpu0 : 0.0 us, 0.7 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 4119684 total, 183532 free, 2088872 used, 1847280 buff/cache
KiB Swap: 4097968 total, 4083160 free, 14808 used. 1492260 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4101 root 20 0 91200 37876 9720 R 100.0 0.9 33771:10 pan_task
4102 root 20 0 66160 12644 9540 R 100.0 0.3 33775:12 pan_task
4097 root 20 0 880320 77996 9020 S 0.7 1.9 186:18.45 pan_comm
10 root 20 0 0 0 0 R 0.3 0.0 30:35.03 rcuc/0
3874 nobody 20 0 41684 2328 1316 S 0.3 0.1 90:06.67 redis-server
We dont use a vm it is a pa-220
We also use PA-3250 big brother of 220.... The issues are on the 220
Which FW you are using is VM-300? Or is it PA-220?
Please press "1" after running "show system resources follow" to toggle view to show separate states.
https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PLZZCA4
If the utilization of Cpu0 and Cpu3 is low, and the utilization of Cpu1 and Cpu2 is high, then in the case of VM-300, the network processing may not be working well.
If you're a PA-220, you can think of it as something like that, and I think it is.
I misunderstood some of the details of the process and function names. To put it simply, it's normal for pan_task to indicate 100%; pan_task represents a process that represents the data plane and specifies the CPU core to be used by setting the CPU affinity. However, in the case of a VM-300, for example, it is not possible to set CPU affinity for physical cores, which can cause latency. This problem requires some tuning on the hypervisor side, to be exact, but in most cases it can be largely solved by simply enabling features such as SR-IOV properly.
This is what i can see.
3313 20 0 71944 35496 8216 R 100.0 0.9 18751:26 pan_task
3314 20 0 46976 10608 8216 R 94.7 0.3 18752:09 pan_task
12447 20 0 16916 7004 1904 R 5.3 0.2 0:00.06 top
1 20 0 2320 676 576 S 0.0 0.0 0:12.72 init
2 20 0 0 0 0 S 0.0 0.0 0:00.04 kthreadd
3 20 0 0 0 0 S 0.0 0.0 4:37.99 ksoftirqd/0
5 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:+
7 rt 0 0 0 0 S 0.0 0.0 0:08.97 migration/0
8 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 20 0 0 0 0 S 0.0 0.0 3:59.15 rcu_sched
10 20 0 0 0 0 S 0.0 0.0 9:56.62 rcuc/0
11 20 0 0 0 0 S 0.0 0.0 3:07.29 rcuc/1
12 rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
13 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
14 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0
15 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:+
16 20 0 0 0 0 S 0.0 0.0 2:55.14 rcuc/2
17 rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
18 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2
19 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/2:0
20 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:+
21 20 0 0 0 0 S 0.0 0.0 9:20.96 rcuc/3
22 rt 0 0 0 0 S 0.0 0.0 0:09.45 migration/3
23 20 0 0 0 0 S 0.0 0.0 1:15.94 ksoftirqd/3
24 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/3:0
25 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/3:+
26 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper
214 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback
217 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset
218 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd
224 0 -20 0 0 0 S 0.0 0.0 0:00.00 ata_sff
234 20 0 0 0 0 S 0.0 0.0 0:00.00 khubd
241 0 -20 0 0 0 S 0.0 0.0 0:00.00 edac-poller
262 0 -20 0 0 0 S 0.0 0.0 0:00.00 rpciod
263 20 0 0 0 0 S 0.0 0.0 3:14.50 kworker/3:1
301 20 0 0 0 0 S 0.0 0.0 0:11.13 kswapd0
302 20 0 0 0 0 S 0.0 0.0 0:00.00 fsnotify_m+
303 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsiod
309 0 -20 0 0 0 S 0.0 0.0 0:00.00 kthrotld
881 20 0 0 0 0 S 0.0 0.0 0:39.57 spi32766
1055 0 -20 0 0 0 S 0.0 0.0 0:00.00 deferwq
1057 20 0 0 0 0 S 0.0 0.0 0:53.59 kworker/2:1
1058 20 0 0 0 0 S 0.0 0.0 0:51.62 kworker/1:1
1061 20 0 0 0 0 S 0.0 0.0 1:44.84 mmcqd/0
1062 20 0 0 0 0 S 0.0 0.0 0:00.00 mmcqd/0boo+
1063 20 0 0 0 0 S 0.0 0.0 0:00.00 mmcqd/0boo+
1064 20 0 0 0 0 S 0.0 0.0 0:00.00 mmcqd/0rpmb
1081 0 -20 0 0 0 S 0.0 0.0 0:08.70 kworker/0:+
1082 20 0 0 0 0 S 0.0 0.0 0:22.95 kjournald
1137 16 -4 2672 728 376 S 0.0 0.0 0:00.83 udevd
1361 20 0 0 0 0 S 0.0 0.0 0:00.01 kworker/0:2
1362 20 0 0 0 0 S 0.0 0.0 1:37.82 kworker/0:3
1364 0 -20 0 0 0 S 0.0 0.0 0:00.36 kworker/3:+
1424 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:+
1447 0 -20 0 0 0 S 0.0 0.0 0:00.01 kworker/1:+
1868 20 0 0 0 0 S 0.0 0.0 0:00.06 kworker/u8+
2190 20 0 0 0 0 S 0.0 0.0 0:03.61 kjournald
2191 20 0 0 0 0 S 0.0 0.0 0:00.00 kjournald
2587 20 0 2512 760 616 S 0.0 0.0 0:03.93 syslogd
2590 20 0 2372 448 344 S 0.0 0.0 0:00.02 klogd
2605 rpc 20 0 5444 820 468 S 0.0 0.0 0:01.81 rpcbind
2664 20 0 5856 992 740 S 0.0 0.0 0:00.00 xinetd
2708 20 0 7748 952 188 S 0.0 0.0 0:00.00 rpc.mountd
2713 20 0 0 0 0 S 0.0 0.0 0:00.00 lockd
2714 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd
2715 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd
2716 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd
2717 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd
2718 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd
2719 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd
2720 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd
2721 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd
2936 0 -20 201044 34608 6744 S 0.0 0.8 88:19.56 masterd_ap+
2956 15 -5 119784 8376 3660 S 0.0 0.2 141:55.37 sysd
3095 20 0 353928 44596 17724 S 0.0 1.1 4:04.94 dagger
3096 30 10 196352 17804 6148 S 0.0 0.4 74:36.80 python
3104 20 0 997644 24228 15252 S 0.0 0.6 7:28.02 sysdagent
3118 20 0 337456 5244 3588 S 0.0 0.1 7:29.00 brdagent
3119 20 0 36920 4892 3572 S 0.0 0.1 13:49.62 ehmon
3120 20 0 478352 22244 14680 S 0.0 0.5 5:37.16 chasd
3298 20 0 46928 8444 6108 S 0.0 0.2 8:38.77 sdwand
3299 20 0 55096 8712 6260 S 0.0 0.2 38:02.10 pan_dha
3300 20 0 47440 8908 6384 S 0.0 0.2 7:09.80 mprelay
3301 20 0 647948 69692 8368 S 0.0 1.7 11:13.89 pan_comm
3304 20 0 46940 8444 6108 S 0.0 0.2 22:46.31 tund
3323 30 10 48540 17296 6112 S 0.0 0.4 17:28.54 python
3459 20 0 555124 26132 15980 S 0.0 0.6 4:16.69 cryptod
3472 20 0 9920 2588 2024 S 0.0 0.1 0:00.04 sshd
3494 20 0 9920 2664 2100 S 0.0 0.1 0:00.03 sshd
3506 20 0 0 0 0 S 0.0 0.0 0:16.63 kjournald
3526 0 -20 0 0 0 S 0.0 0.0 0:00.02 loop0
3530 20 0 0 0 0 S 0.0 0.0 0:00.00 kjournald
4417 20 0 982924 280588 164572 S 0.0 6.8 9:22.46 devsrvr
4422 20 0 501672 198224 130140 S 0.0 4.8 22:05.89 useridd
4444 20 0 1693964 404940 22368 S 0.0 9.8 46:31.88 mgmtsrvr
4446 nobody 20 0 25624 1632 876 S 0.0 0.0 25:07.88 redis-serv+
4494 nobody 20 0 296972 32060 18728 S 0.0 0.8 3:35.70 httpd
4498 20 0 493544 26860 16224 S 0.0 0.7 4:51.17 ikemgr
4499 20 0 1410132 326892 19008 S 0.0 7.9 15:54.79 logrcvr
4500 20 0 509212 25852 16416 S 0.0 0.6 4:31.05 rasmgr
4501 20 0 432220 22492 14792 S 0.0 0.5 4:21.58 keymgr
4502 20 0 126548 22620 14812 S 0.0 0.5 3:07.27 pan_ifmgr
4503 20 0 1409612 29152 16484 S 0.0 0.7 7:57.02 varrcvr
4504 17 -3 408920 23428 14952 S 0.0 0.6 13:15.49 l2ctrld
4505 17 -3 127768 23088 14968 S 0.0 0.6 4:41.08 ha_agent
4506 20 0 385244 31236 16280 S 0.0 0.8 8:02.89 satd
4507 20 0 1007276 27360 16148 S 0.0 0.7 12:05.36 sslmgr
4508 20 0 405252 23208 15048 S 0.0 0.6 4:49.10 pan_dhcpd
4509 20 0 543044 61376 16352 S 0.0 1.5 62:45.14 dnsproxyd
4510 20 0 336292 23888 14976 S 0.0 0.6 4:01.48 pppoed
4511 17 -3 819992 43328 15652 S 0.0 1.1 6:27.77 routed
4512 20 0 2062260 37572 17280 S 0.0 0.9 20:22.61 authd
5526 20 0 0 0 0 S 0.0 0.0 0:00.04 kworker/u8+
5819 20 0 203292 20516 6916 S 0.0 0.5 12:10.76 snmpd
6736 20 0 75484 2640 1940 S 0.0 0.1 1:28.03 ntpd
6765 nobody 20 0 263580 24824 15956 S 0.0 0.6 0:01.56 nginx
6817 loguser 20 0 11048 1016 340 S 0.0 0.0 0:00.00 syslog-ng
6818 loguser 20 0 86008 3256 2024 S 0.0 0.1 0:02.92 syslog-ng
6844 nobody 20 0 264380 13704 4168 S 0.0 0.3 9:05.81 nginx
8369 nobody 20 0 260720 52752 18520 S 0.0 1.3 7:22.72 appweb3
8380 nobody 20 0 274052 23704 15572 S 0.0 0.6 0:01.54 nginx
8411 nobody 20 0 274984 11368 2280 S 0.0 0.3 3:35.57 nginx
8413 nobody 20 0 274984 11356 2268 S 0.0 0.3 3:31.35 nginx
8414 nobody 20 0 274516 10272 1772 S 0.0 0.2 2:44.14 nginx
8566 20 0 3976 952 784 S 0.0 0.0 0:00.00 agetty
12162 20 0 13612 4012 3164 S 0.0 0.1 0:00.10 sshd
12201 kilib 20 0 13612 1968 1112 S 0.0 0.0 0:00.08 sshd
12204 kilib 20 0 157316 55180 23588 S 0.0 1.3 0:03.73 cli
12444 kilib 20 0 2724 704 580 S 0.0 0.0 0:00.01 less
12446 20 0 17248 7116 2204 S 0.0 0.2 0:00.03 sh
12448 20 0 16720 6212 1560 S 0.0 0.2 0:00.03 sed
16898 nobody 20 0 2246832 38676 4908 S 0.0 0.9 0:02.89 httpd
25740 20 0 3356 1156 632 S 0.0 0.0 0:06.40 crond
25899 nobody 20 0 262844 24000 15892 S 0.0 0.6 0:01.61 nginx
25965 nobody 20 0 266752 14280 2232 S 0.0 0.3 3:36.34 nginx
25966 nobody 20 0 266752 14280 2232 S 0.0 0.3 3:36.00 nginx
Are you using a VM-series, such as the VM-100?
In my experience, if pan_task(task_all) is using a lot of CPU resources, it is most likely due to the slow response of the hypervisor (so-called CPU steal) caused by SR-IOV or DPDK not being properly enabled.
Even in the above case, unless there is a fatal problem, such as an obvious lack of memory bands, if the data plane utilization is less than 40% to 50%, the problem is unlikely to occur.
If it is greater than that, the data plane may stop responding. If there is a suspicious event, do a grep against the data plane log and if you find a log with a failing task_all response, you are probably experiencing underperformance due to it.
As for other devices (e.g. PA-220), I think it could be due to some defect.
Well, in many cases, it's no use worrying about it.
You can start by checking which process is using the most cpu by using this command: Show system resources (If you see pan_task, you're on a small factor platform that shares mgmt and dataplane, ignore those)