What consumes GPU?

Dear Jetson GPU Experts,

I am currently investigating the encoding capacity of the Jetson Xavier NX as below:

In my test, there are 4 cameras connected to the Xavier NX:
a. Sony block camera 1080p30/1080p60 via MIPI-CSI2
b. Sony 8MP IMX477 or similar 1080p30/1080p60 via MIPI-CSI2
c. 2x webcam 640x480 via USB

When I streamed all of the cameras over the network, I observed that no GPU percentage used.
For a :

  • Board :
    gst-launch-1.0 v4l2src device = /dev/video1 ! nvvidconv ! nvv4l2h264enc insert-vui=true insert-sps-pps=1 idrinterval=15 maxperf-enable=true bitrate=4000000 ! video/x-h264, stream-format=byte-stream, alignment=au ! h264parse ! rtph264pay name=pay0 pt=96 ! udpsink host=$CLIENT_PC port=5001 sync=false async=false
  • Laptop :
    gst-launch-1.0 udpsrc port=5001 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! queue ! avdec_h264 ! xvimagesink sync=false async=false -e

For B :

  • Board :
    gst-launch-1.0 nvarguscamerasrc sensor-mode=2 ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1' ! nvv4l2h264enc insert-sps-pps=1 idrinterval=15 ! h264parse ! rtph264pay name=pay0 pt=96 ! queue ! udpsink host=$CLIENT_PC port=5002
  • Laptop :
    gst-launch-1.0 udpsrc port=5002 ! 'application/x-rtp,encoding-name=H264,payload=96' ! rtph264depay ! avdec_h264 ! xvimagesink sync=0

For c :

  • Board :
    gst-launch-1.0 v4l2src device=/dev/video3 num-buffers=500 ! 'video/x-raw, width=640,height=480,framerate=(fraction)30/1' ! nvv4l2h264enc ! insert-sps-pps=1 idrinterval=15 ! h264parse ! rtph264pay name=pay0 pt=96 ! udpsink host=$CLIENT_PC port=5003 sync=false

  • Laptop :
    gst-launch-1.0 udpsrc port=5003 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! queue ! avdec_h264 ! xvimagesink sync=false async=false -e

However, if I both streamed and visualized/previewed internally within the Jetson Xavier NX :

For a:

  • Terminal #1a :
    gst-launch-1.0 v4l2src device = /dev/video1 ! nvvidconv ! nvv4l2h264enc insert-vui=true insert-sps-pps=1 idrinterval=15 maxperf-enable=true bitrate=8000000 ! video/x-h264, stream-format=byte-stream, alignment=au ! h264parse ! rtph264pay name=pay0 pt=96 ! udpsink host=127.0.0.1 port=5001 sync=false async=false

  • Terminal #1b :
    gst-launch-1.0 udpsrc address=127.0.0.1 port=5001 ! application/x-rtp,media=video,clock-rate=90000,encoding-name=H264,payload=96 ! rtph264depay ! nvv4l2decoder disable-dpb=true ! nv3dsink sync=false async=false window-width=960 window-height=540

For b :

  • Terminal #2a :
    gst-launch-1.0 nvarguscamerasrc sensor-mode=0 ! 'video/x-raw(memory:NVMM), width=(int)3840, height=(int)2160, format=(string)NV12, framerate=(fraction)30/1' ! nvv4l2h264enc insert-sps-pps=1 idrinterval=15 ! h264parse ! rtph264pay name=pay0 pt=96 ! queue ! udpsink host=127.0.0.1 port=5002

  • Terminal #2b:
    gst-launch-1.0 udpsrc address=127.0.0.1 port=5002 ! application/x-rtp,media=video,clock-rate=90000,encoding-name=H264,payload=96 ! rtph264depay ! nvv4l2decoder disable-dpb=true ! nv3dsink sync=false async=false window-width=960 window-height=540

I observed that there was GPU consumption :

Could you tell if the GPU consumption comes from the on-board H264 decoding or/and the on-board previewing /visualization, please ?

Best Regards,
Khang

Hi,
The nv3dsink is implemented based on EGL so it consums GPU. Please follow steps in developer guide to try nvdrmvideosink:

Accelerated GStreamer — NVIDIA Jetson Linux Developer Guide 1 documentation

1 Like

I also did an extreme test to see if we could stay with the Jetson Xavier NX or to move to the Jetson Orin NX. My test-case :

  1. Sony FCB (1920x1080) → /dev/video0, cloned to /dev/video10 (AI processing), /dev/video20 (H265 streaming at 1280x720), /dev/video30 (recording)
  2. IMX477 (1920x1080) → /dev/video1, cloned to /dev/video11 (AI processing), /dev/video21 (H265 streaming at 1280x720), /dev/video31 (recording)
  3. Webcam simulating USB Thermal Camera A (640x480) → /dev/video3, cloned to /dev/video13 (AI processing), /dev/video23 (H265 streaming), /dev/video33 (recording)
  4. Webcam simulating USB Thermal Camera B (640x480) → /dev/video5, cloned to /dev/video15 (AI processing), /dev/video25 (H265 streaming), /dev/video35 (recording)

We used v4l2loopback to clone the devices. But it seems that the CPU consumption is critical (only cloning and streaming) while little GPU is used :


test_perf.txt (4.0 KB)

It seems that the nvargus-daemon took pretty high CPU percentage compared to the benchmark in the following example : GitHub - NVIDIA-AI-IOT/jetson-multicamera-pipelines

Do you have any advice on this? Could we improve more?

Best Regards,
Khang

Him
It is expected there is certain CPU usage in using Argus stack. We have some profiling data in
High CPU usage streaming from CSI2 cameras on Jetson NX - #19 by JerryChang

Please compare to yours. If you see similar usage with your camera source, it should be expected.

Hi @DaneLLL,

Thanks for the hint, I will compare and update to you about the CPU consumption of the Argus stack. By the I switched to Mode 20W@6CORE and saw that it was less cricitcal that the previous mode 15W@6CORE :

I just wonder why the load was not share nearly equally btw the cores, there’s usually 1 core running at peak load at a time?

Best Regards,
Khang

Hi,
On developer kit, we see loading averaging to each core. Around 6% on each core and total is 34%@1420.

Your stats looks like there’s a process occupying one core in high loading. Are all the 4 cameras going through Argus? Or some are going through v4l2?

Only the number 2 (IMX477) going thru Argus. Aside from that, there’s only Gstreamer commands to map real /dev/videoX device to virtual /dev/videoY devices (created by v4l2loopback) and to stream over the network or to record to local storage with H265 encoding.

Best Regards,
Khang

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.