How do you design?
Based on the design of the operational command, knowledge that can be read from the documentation, etc., I believe the following
If possible, it is better to have PaloAlt FW do only traffic coloring and leave QoS control to better devices such as core routers.
If this is not possible, the following should be followed.
The "Egress Maximum" of the interface should not be matched to the physical bandwidth.
In other words, set it sufficiently larger than the physical bandwidth, or do not set it per se.
The sum of all classes of "Egress Guaranteed" should match the physical bandwidth.
And the "Egress Maximum" for each class should be set no greater than 10% of it.
Traffic that you do not want to drop should have its priority set to "Real-time".
However, traffic that belongs to Real-time should have a sufficiently low chance of bursting.
If the likelihood of extreme bursts is high, the priority should not be set to "Real-time".
That's my stance.
It seems to me that there is not much material on QoS in PaloAlt FW. For example, "Real-time" uses "own separate queue".
I have not been able to find any documentation from docs.paloaltonetworks.com that would allow me to understand what the term "own separate queue" in the above document means as it is used.
However, some speculation about QoS architecture is possible. I have some evidence, but I can't show it here. If you have the same questions as me and have experience with experiments in the lab, I'd be happy to hear some information.
First of all, the realization of QoS in PaloAlt FW is supposed to be done by policing. Unfortunately, shaping is not possible, in my opinion.
I also believe that the "Egress Guaranteed" realization is implemented using a "Dual Token Bucket".
In other words, by setting CIR in "Egress Guaranteed" and PIR in "Egress Maximum", it seems that the other parameters (Tp, Tc, etc.) required for the "Dual Token Bucket" are automatically set.
However, there is another congestion avoidance mechanism at work besides the "Dual Token Bucket", and that is Weighted Random Early Detection (WRED).
These guesses are not greatly mistaken, as the "WRED drop" and "policing drop" counters are separated:
So, under what circumstances do "WRED drop" and "policing drop" occur?
I believe the following.
First, in the documentation "QOS PROFILE SETTINGS", you will find the following text.
"When contention occurs, traffic that is assigned a lower priority is dropped. Real-time priority uses its own separate queue."
From the above, we can read th