I have not been able to find any documentation from docs.paloaltonetworks.com that would allow me to understand what the term "own separate queue" in the above document means as it is used.
However, some speculation about QoS architecture is possible. I have some evidence, but I can't show it here. If you have the same questions as me and have experience with experiments in the lab, I'd be happy to hear some information.
First of all, the realization of QoS in PaloAlt FW is supposed to be done by policing. Unfortunately, shaping is not possible, in my opinion.
I also believe that the "Egress Guaranteed" realization is implemented using a "Dual Token Bucket".
So, under what circumstances do "WRED drop" and "policing drop" occur?
I believe the following.
First, in the documentation "QOS PROFILE SETTINGS", you will find the following text.
"When contention occurs, traffic that is assigned a lower priority is dropped. Real-time priority uses its own separate queue."
From the above, we can read that for each interface, there are two queues.
It is also presumed that the WRED algorithm works for one of the queues, but not for traffic that belongs to the real-time priority.
Also, each queue length is considered to be automatically set from the "Egress Maximum" set for the interface.
Assuming these conjectures indicate that for traffic with priorities other than real-time, a drop can occur even for traffic volumes below "Egress Guaranteed".
Hence, I believe that it is better for PaloAlt FW to only color the traffic and leave QoS control to core routers and the like, and if that is not possible, to make it sufficiently larger than the physical bandwidth, or to set the interface's "Egress Maximum I believe that you should not set itself (it may depend on the quality of the line, but I believe that it will perform better if you leave it at about 200% of the physical bandwidth).