To test this feature, visit your live site.

Edited: Nov 19, 2020

Managment cpu 100

in General discussion

Dear the managment cpu gets all time 100 precent how i can tshoot this? We are running at 9.1.3

9 answers2 replies

Comments (11)

nanashin

Nov 25, 2020

Perhaps because it is a display issue, it is considered a low priority and may be neglected by Palo Alto.

In the meantime, there are some issues that may be relevant, so if you're worried about them, I think you can upgrade PAN-OS.

PAN-152106

Fixed an issue where a process (genindex.sh) caused the management plane CPU usage to remain high for a longer period of time than expected

https://docs.paloaltonetworks.com/pan-os/9-1/pan-os-release-notes/pan-os-9-1-addressed-issues/pan-os-9-1-6-addressed-issues.html#panos-addressed-issues-9.1.6

B K

Nov 25, 2020

There is one zombie I try to fix that high cpu load...

top - 12:07:27 up 72 days, 1:28, 1 user, load average: 2.40, 2.32, 2.29

Tasks: 137 total, 3 running, 133 sleeping, 0 stopped, 1 zombie

%Cpu(s): 51.8 us, 1.1 sy, 0.4 ni, 46.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 4119652 total, 244696 free, 1822364 used, 2052592 buff/cache

KiB Swap: 1972 total, 1972 free, 0 used. 1809056 avail Mem

nanashin

Nov 25, 2020

@Reaper I'm very curious about the existence of the zombie process ( 1 zombie).

Reaper

Nov 25, 2020

Replying to

I am too, this is not an isolated issue however, so may be expected. I don't know if and how to find whch process has become a zombie unfortunately

nanashin

Nov 26, 2020

Replying to

I don't know if the zombie process is the reason why the display cannot be accurate.

However, in most cases, the cause of the zombie process is due to the detection of a memory leak or because the process has become unresponsive (i.e., it often occurs when PAN-OS kills the process by detecting them).

For this reason, it is recommended to grep the log on FW with ”fail”.

grep mp-log m*detail* pattern fail

If you are actually using the PA-220 and not in a lab environment, I would suggest that you output the Tech Support File, decompress the archive, and then grep it on your computer (since the PA-220 is poor).

The name of the log file containing the process details should probably correspond to "m*detail*".

I don't remember exactly what it is, so it would be a good idea to ask you to find a file with a similar name.

Reaper

Nov 25, 2020

on the PA220 that may be a display bug in the management interface. I also notice a higher load indicated in the GUI (40-70%) where the actual CPU cores for MP (Cpu0 and Cpu3) are lightly loaded (99.3 and 100% idle)

top - 11:11:37 up 23 days, 11:26, 1 user, load average: 3.16, 3.02, 3.08

Tasks: 147 total, 4 running, 142 sleeping, 0 stopped, 1 zombie

%Cpu0 : 0.0 us, 0.7 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 4119684 total, 183532 free, 2088872 used, 1847280 buff/cache

KiB Swap: 4097968 total, 4083160 free, 14808 used. 1492260 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4101 root 20 0 91200 37876 9720 R 100.0 0.9 33771:10 pan_task

4102 root 20 0 66160 12644 9540 R 100.0 0.3 33775:12 pan_task

4097 root 20 0 880320 77996 9020 S 0.7 1.9 186:18.45 pan_comm

10 root 20 0 0 0 0 R 0.3 0.0 30:35.03 rcuc/0

3874 nobody 20 0 41684 2328 1316 S 0.3 0.1 90:06.67 redis-server

B K

Nov 25, 2020

We dont use a vm it is a pa-220

We also use PA-3250 big brother of 220.... The issues are on the 220

nanashin

Nov 24, 2020

Which FW you are using is VM-300? Or is it PA-220?

Please press "1" after running "show system resources follow" to toggle view to show separate states.

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PLZZCA4

If the utilization of Cpu0 and Cpu3 is low, and the utilization of Cpu1 and Cpu2 is high, then in the case of VM-300, the network processing may not be working well.

If you're a PA-220, you can think of it as something like that, and I think it is.

I misunderstood some of the details of the process and function names. To put it simply, it's normal for pan_task to indicate 100%; pan_task represents a process that represents the data plane and specifies the CPU core to be used by setting the CPU affinity. However, in the case of a VM-300, for example, it is not possible to set CPU affinity for physical cores, which can cause latency. This problem requires some tuning on the hypervisor side, to be exact, but in most cases it can be largely solved by simply enabling features such as SR-IOV properly.

B K

Nov 24, 2020

This is what i can see.

3313 20 0 71944 35496 8216 R 100.0 0.9 18751:26 pan_task

3314 20 0 46976 10608 8216 R 94.7 0.3 18752:09 pan_task

12447 20 0 16916 7004 1904 R 5.3 0.2 0:00.06 top

1 20 0 2320 676 576 S 0.0 0.0 0:12.72 init

2 20 0 0 0 0 S 0.0 0.0 0:00.04 kthreadd

3 20 0 0 0 0 S 0.0 0.0 4:37.99 ksoftirqd/0

5 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:+

7 rt 0 0 0 0 S 0.0 0.0 0:08.97 migration/0

8 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh

9 20 0 0 0 0 S 0.0 0.0 3:59.15 rcu_sched

10 20 0 0 0 0 S 0.0 0.0 9:56.62 rcuc/0

11 20 0 0 0 0 S 0.0 0.0 3:07.29 rcuc/1

12 rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1

13 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1

14 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0

15 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:+

16 20 0 0 0 0 S 0.0 0.0 2:55.14 rcuc/2

17 rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2

18 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2

19 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/2:0

20 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:+

21 20 0 0 0 0 S 0.0 0.0 9:20.96 rcuc/3

22 rt 0 0 0 0 S 0.0 0.0 0:09.45 migration/3

23 20 0 0 0 0 S 0.0 0.0 1:15.94 ksoftirqd/3

24 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/3:0

25 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/3:+

26 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper

214 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback

217 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset

218 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd

224 0 -20 0 0 0 S 0.0 0.0 0:00.00 ata_sff

234 20 0 0 0 0 S 0.0 0.0 0:00.00 khubd

241 0 -20 0 0 0 S 0.0 0.0 0:00.00 edac-poller

262 0 -20 0 0 0 S 0.0 0.0 0:00.00 rpciod

263 20 0 0 0 0 S 0.0 0.0 3:14.50 kworker/3:1

301 20 0 0 0 0 S 0.0 0.0 0:11.13 kswapd0

302 20 0 0 0 0 S 0.0 0.0 0:00.00 fsnotify_m+

303 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsiod

309 0 -20 0 0 0 S 0.0 0.0 0:00.00 kthrotld

881 20 0 0 0 0 S 0.0 0.0 0:39.57 spi32766

1055 0 -20 0 0 0 S 0.0 0.0 0:00.00 deferwq

1057 20 0 0 0 0 S 0.0 0.0 0:53.59 kworker/2:1

1058 20 0 0 0 0 S 0.0 0.0 0:51.62 kworker/1:1

1061 20 0 0 0 0 S 0.0 0.0 1:44.84 mmcqd/0

1062 20 0 0 0 0 S 0.0 0.0 0:00.00 mmcqd/0boo+

1063 20 0 0 0 0 S 0.0 0.0 0:00.00 mmcqd/0boo+

1064 20 0 0 0 0 S 0.0 0.0 0:00.00 mmcqd/0rpmb

1081 0 -20 0 0 0 S 0.0 0.0 0:08.70 kworker/0:+

1082 20 0 0 0 0 S 0.0 0.0 0:22.95 kjournald

1137 16 -4 2672 728 376 S 0.0 0.0 0:00.83 udevd

1361 20 0 0 0 0 S 0.0 0.0 0:00.01 kworker/0:2

1362 20 0 0 0 0 S 0.0 0.0 1:37.82 kworker/0:3

1364 0 -20 0 0 0 S 0.0 0.0 0:00.36 kworker/3:+

1424 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:+

1447 0 -20 0 0 0 S 0.0 0.0 0:00.01 kworker/1:+

1868 20 0 0 0 0 S 0.0 0.0 0:00.06 kworker/u8+

2190 20 0 0 0 0 S 0.0 0.0 0:03.61 kjournald

2191 20 0 0 0 0 S 0.0 0.0 0:00.00 kjournald

2587 20 0 2512 760 616 S 0.0 0.0 0:03.93 syslogd

2590 20 0 2372 448 344 S 0.0 0.0 0:00.02 klogd

2605 rpc 20 0 5444 820 468 S 0.0 0.0 0:01.81 rpcbind

2664 20 0 5856 992 740 S 0.0 0.0 0:00.00 xinetd

2708 20 0 7748 952 188 S 0.0 0.0 0:00.00 rpc.mountd

2713 20 0 0 0 0 S 0.0 0.0 0:00.00 lockd

2714 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd

2715 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd

2716 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd

2717 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd

2718 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd

2719 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd

2720 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd

2721 1 -19 0 0 0 S 0.0 0.0 0:00.00 nfsd

2936 0 -20 201044 34608 6744 S 0.0 0.8 88:19.56 masterd_ap+

2956 15 -5 119784 8376 3660 S 0.0 0.2 141:55.37 sysd

3095 20 0 353928 44596 17724 S 0.0 1.1 4:04.94 dagger

3096 30 10 196352 17804 6148 S 0.0 0.4 74:36.80 python

3104 20 0 997644 24228 15252 S 0.0 0.6 7:28.02 sysdagent

3118 20 0 337456 5244 3588 S 0.0 0.1 7:29.00 brdagent

3119 20 0 36920 4892 3572 S 0.0 0.1 13:49.62 ehmon

3120 20 0 478352 22244 14680 S 0.0 0.5 5:37.16 chasd

3298 20 0 46928 8444 6108 S 0.0 0.2 8:38.77 sdwand

3299 20 0 55096 8712 6260 S 0.0 0.2 38:02.10 pan_dha

3300 20 0 47440 8908 6384 S 0.0 0.2 7:09.80 mprelay

3301 20 0 647948 69692 8368 S 0.0 1.7 11:13.89 pan_comm

3304 20 0 46940 8444 6108 S 0.0 0.2 22:46.31 tund

3323 30 10 48540 17296 6112 S 0.0 0.4 17:28.54 python

3459 20 0 555124 26132 15980 S 0.0 0.6 4:16.69 cryptod

3472 20 0 9920 2588 2024 S 0.0 0.1 0:00.04 sshd

3494 20 0 9920 2664 2100 S 0.0 0.1 0:00.03 sshd

3506 20 0 0 0 0 S 0.0 0.0 0:16.63 kjournald

3526 0 -20 0 0 0 S 0.0 0.0 0:00.02 loop0

3530 20 0 0 0 0 S 0.0 0.0 0:00.00 kjournald

4417 20 0 982924 280588 164572 S 0.0 6.8 9:22.46 devsrvr

4422 20 0 501672 198224 130140 S 0.0 4.8 22:05.89 useridd

4444 20 0 1693964 404940 22368 S 0.0 9.8 46:31.88 mgmtsrvr

4446 nobody 20 0 25624 1632 876 S 0.0 0.0 25:07.88 redis-serv+

4494 nobody 20 0 296972 32060 18728 S 0.0 0.8 3:35.70 httpd

4498 20 0 493544 26860 16224 S 0.0 0.7 4:51.17 ikemgr

4499 20 0 1410132 326892 19008 S 0.0 7.9 15:54.79 logrcvr

4500 20 0 509212 25852 16416 S 0.0 0.6 4:31.05 rasmgr

4501 20 0 432220 22492 14792 S 0.0 0.5 4:21.58 keymgr

4502 20 0 126548 22620 14812 S 0.0 0.5 3:07.27 pan_ifmgr

4503 20 0 1409612 29152 16484 S 0.0 0.7 7:57.02 varrcvr

4504 17 -3 408920 23428 14952 S 0.0 0.6 13:15.49 l2ctrld

4505 17 -3 127768 23088 14968 S 0.0 0.6 4:41.08 ha_agent

4506 20 0 385244 31236 16280 S 0.0 0.8 8:02.89 satd

4507 20 0 1007276 27360 16148 S 0.0 0.7 12:05.36 sslmgr

4508 20 0 405252 23208 15048 S 0.0 0.6 4:49.10 pan_dhcpd

4509 20 0 543044 61376 16352 S 0.0 1.5 62:45.14 dnsproxyd

4510 20 0 336292 23888 14976 S 0.0 0.6 4:01.48 pppoed

4511 17 -3 819992 43328 15652 S 0.0 1.1 6:27.77 routed

4512 20 0 2062260 37572 17280 S 0.0 0.9 20:22.61 authd

5526 20 0 0 0 0 S 0.0 0.0 0:00.04 kworker/u8+

5819 20 0 203292 20516 6916 S 0.0 0.5 12:10.76 snmpd

6736 20 0 75484 2640 1940 S 0.0 0.1 1:28.03 ntpd

6765 nobody 20 0 263580 24824 15956 S 0.0 0.6 0:01.56 nginx

6817 loguser 20 0 11048 1016 340 S 0.0 0.0 0:00.00 syslog-ng

6818 loguser 20 0 86008 3256 2024 S 0.0 0.1 0:02.92 syslog-ng

6844 nobody 20 0 264380 13704 4168 S 0.0 0.3 9:05.81 nginx

8369 nobody 20 0 260720 52752 18520 S 0.0 1.3 7:22.72 appweb3

8380 nobody 20 0 274052 23704 15572 S 0.0 0.6 0:01.54 nginx

8411 nobody 20 0 274984 11368 2280 S 0.0 0.3 3:35.57 nginx

8413 nobody 20 0 274984 11356 2268 S 0.0 0.3 3:31.35 nginx

8414 nobody 20 0 274516 10272 1772 S 0.0 0.2 2:44.14 nginx

8566 20 0 3976 952 784 S 0.0 0.0 0:00.00 agetty

12162 20 0 13612 4012 3164 S 0.0 0.1 0:00.10 sshd

12201 kilib 20 0 13612 1968 1112 S 0.0 0.0 0:00.08 sshd

12204 kilib 20 0 157316 55180 23588 S 0.0 1.3 0:03.73 cli

12444 kilib 20 0 2724 704 580 S 0.0 0.0 0:00.01 less

12446 20 0 17248 7116 2204 S 0.0 0.2 0:00.03 sh

12448 20 0 16720 6212 1560 S 0.0 0.2 0:00.03 sed

16898 nobody 20 0 2246832 38676 4908 S 0.0 0.9 0:02.89 httpd

25740 20 0 3356 1156 632 S 0.0 0.0 0:06.40 crond

25899 nobody 20 0 262844 24000 15892 S 0.0 0.6 0:01.61 nginx

25965 nobody 20 0 266752 14280 2232 S 0.0 0.3 3:36.34 nginx

25966 nobody 20 0 266752 14280 2232 S 0.0 0.3 3:36.00 nginx

nanashin

Nov 20, 2020

Are you using a VM-series, such as the VM-100?

In my experience, if pan_task(task_all) is using a lot of CPU resources, it is most likely due to the slow response of the hypervisor (so-called CPU steal) caused by SR-IOV or DPDK not being properly enabled.

Even in the above case, unless there is a fatal problem, such as an obvious lack of memory bands, if the data plane utilization is less than 40% to 50%, the problem is unlikely to occur.

If it is greater than that, the data plane may stop responding. If there is a suspicious event, do a grep against the data plane log and if you find a log with a failing task_all response, you are probably experiencing underperformance due to it.

As for other devices (e.g. PA-220), I think it could be due to some defect.

Well, in many cases, it's no use worrying about it.

Reaper

Nov 19, 2020

You can start by checking which process is using the most cpu by using this command: Show system resources (If you see pan_task, you're on a small factor platform that shares mgmt and dataplane, ignore those)

Forum: Forum