ForgeAI / CV Payloads
All Guides
Implementation Guides

AI / Computer Vision Payloads

Run object detection, tracking, and targeting on your drone's companion computer. From YOLO install to real-time inference at 30+ FPS on edge hardware.

Jargon Buster

YOLO

You Only Look Once. The dominant real-time object detection model family. YOLOv8/v11 from Ultralytics is the current standard. Runs on GPU (Jetson CUDA) or NPU (APB, Hailo). One forward pass = detect all objects in frame.

TensorRT

NVIDIA's inference optimizer. Takes a trained model (PyTorch/ONNX) and compiles it into a GPU-optimized engine. Gives 2-5x speedup on Jetson. You train on a big GPU, deploy the TensorRT engine on the drone.

NPU

Neural Processing Unit. Dedicated AI accelerator chip. The Orqa APB has a 2.25 TOPS NPU (NXP i.MX8M Plus). Hailo-8 is 26 TOPS. NPUs are more power-efficient than GPUs for inference but less flexible.

ONNX

Open Neural Network Exchange. A model format that works across frameworks. Train in PyTorch → export to ONNX → convert to TensorRT (Jetson) or HailoRT (Hailo) or TFLite (RPi). The universal translator for AI models.

Inference

Running a trained model on new data. "Inference at 30 FPS" = the model processes 30 camera frames per second and outputs detections. Training happens offline on big GPUs. Inference happens on the drone.

GStreamer

Multimedia framework for video pipelines. On drones, it captures camera → resizes → feeds to AI model → overlays detections → streams to GCS. The plumbing that connects camera to brain to screen.

What Runs What
HardwareAI ComputeYOLOv8n FPSYOLOv8s FPSPowerBest For
Jetson AGX Orin 64GB275 TOPS (INT8)~180~9515-60WMulti-model, large resolution
Jetson Orin NX 16GB100 TOPS~120~5510-25WSweet spot for drones
Jetson Orin Nano 8GB40 TOPS~60~307-15WLightweight builds
Hailo-8 (PCIe module)26 TOPS~80~352.5WUltra low power, RPi add-on
Orqa APB (i.MX8M NPU)2.25 TOPS~8~33WSimple detection, classification
Raspberry Pi 5 (CPU)0 (CPU only)~5~25WPrototyping only
ModalAI VOXL 2 (Adreno)~15 TOPS~40~185WIntegrated with PX4
n vs s vs m vs l: YOLOv8 comes in sizes — nano (n), small (s), medium (m), large (l). Nano runs fast but misses small objects. Large catches everything but is slow. For drones at 30+ FPS: use nano on Orin Nano/Hailo/APB, small on Orin NX, medium on AGX Orin.
Install Ultralytics YOLOv8
# ═══ JETSON (JetPack 5.x / 6.x) ═══ # JetPack already has CUDA + cuDNN + TensorRT pip3 install ultralytics # Export model to TensorRT for max speed: yolo export model=yolov8n.pt format=engine device=0 # Creates yolov8n.engine — optimized for YOUR specific Jetson GPU # Run inference: yolo predict model=yolov8n.engine source=0 # 0 = first cameraJetson
# ═══ ORQA APB (i.MX8M Plus NPU) ═══ # The NPU needs models in a specific format # 1. Export to ONNX on your dev machine: yolo export model=yolov8n.pt format=onnx # 2. Convert to TFLite (NPU-compatible) via NXP eIQ toolkit: # See: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment # 3. On APB, run via NNStreamer or GStreamer + TFLite delegate: gst-launch-1.0 v4l2src device=/dev/video2 ! \ video/x-raw,width=640,height=480 ! \ tensor_converter ! \ tensor_filter framework=tensorflow-lite model=yolov8n.tflite \ custom=Delegate:NNAPI ! \ tensor_sink # 2.25 TOPS NPU is enough for classification + simple detection # For heavier models, add a Hailo-8 via PCIe expansion slotAPB
# ═══ HAILO-8 (on RPi 5 or any PCIe host) ═══ # Install HailoRT + Hailo Tappas (GStreamer plugins) # See: https://github.com/hailo-ai/tappas # Convert model: # PyTorch → ONNX → Hailo DFC (via Hailo Dataflow Compiler) hailo optimize yolov8n.onnx --hw-arch hailo8 hailo compile yolov8n.har # Run: gst-launch-1.0 v4l2src ! hailonet hef=yolov8n.hef ! autovideosinkHailo
ROS Integration — Detect → Act
Detection alone is useless if the drone can't act on it. Here's the pipeline: camera → YOLO → detection coordinates → MAVROS → FC.
# ═══ ROS NODE: YOLO DETECTION → MAVROS TARGETING ═══ #!/usr/bin/env python3 import rospy, cv2, numpy as np from sensor_msgs.msg import Image from geometry_msgs.msg import PoseStamped from cv_bridge import CvBridge from ultralytics import YOLO model = YOLO('yolov8n.engine') # TensorRT engine bridge = CvBridge() target_pub = rospy.Publisher('/target/pose', PoseStamped, queue_size=1) def image_callback(msg): frame = bridge.imgmsg_to_cv2(msg, 'bgr8') results = model(frame, verbose=False) for det in results[0].boxes: cls = int(det.cls[0]) conf = float(det.conf[0]) if conf > 0.6: # confidence threshold x1, y1, x2, y2 = det.xyxy[0].cpu().numpy() cx, cy = (x1+x2)/2, (y1+y2)/2 # Convert pixel coords to drone-relative angle # Then publish as target pose for FC rospy.loginfo(f"Detection: class={cls} conf={conf:.2f} " f"center=({cx:.0f},{cy:.0f})") rospy.init_node('yolo_detector') rospy.Subscriber('/camera/image_raw', Image, image_callback) rospy.spin()ROS Node
Thermal + visible fusion: If your drone has both a thermal camera (FLIR) and a visible camera, you can run YOLO on visible for classification and use thermal for detection in low light. ROS makes this easy — subscribe to both camera topics, fuse in your node.
Custom Model Training
Pre-trained YOLO detects 80 COCO classes (person, car, etc.). For custom targets (specific vehicles, equipment, animals), you need to train on your own data.
# ═══ TRAIN CUSTOM YOLO ON YOUR DATA ═══ # Do this on a desktop GPU (RTX 3090, etc.) — NOT on the drone # 1. Label images with Roboflow or CVAT (free) # Export as YOLO format (txt files with bbox coords) # 2. Train: yolo train model=yolov8n.pt data=my_dataset.yaml epochs=100 imgsz=640 # 3. Export for drone: yolo export model=runs/detect/train/weights/best.pt format=engine # Jetson yolo export model=runs/detect/train/weights/best.pt format=onnx # Then convert for APB/Hailo # 4. Copy .engine file to drone and update model path in ROS nodeTraining
The full AI drone pipeline: Camera → GStreamer → ROS Image topic → YOLO node (TensorRT) → Detection topic → Tracking node → MAVROS targeting → FC. Add TAK bridge for live detection feed to ground operators. Add mesh radio for multi-drone collaborative detection.
Back to Implementation Guides