Implementation Guides
AI / Computer Vision Payloads
Run object detection, tracking, and targeting on your drone's companion computer. From YOLO install to real-time inference at 30+ FPS on edge hardware.
Jargon Buster
YOLO
You Only Look Once. The dominant real-time object detection model family. YOLOv8/v11 from Ultralytics is the current standard. Runs on GPU (Jetson CUDA) or NPU (APB, Hailo). One forward pass = detect all objects in frame.
TensorRT
NVIDIA's inference optimizer. Takes a trained model (PyTorch/ONNX) and compiles it into a GPU-optimized engine. Gives 2-5x speedup on Jetson. You train on a big GPU, deploy the TensorRT engine on the drone.
NPU
Neural Processing Unit. Dedicated AI accelerator chip. The Orqa APB has a 2.25 TOPS NPU (NXP i.MX8M Plus). Hailo-8 is 26 TOPS. NPUs are more power-efficient than GPUs for inference but less flexible.
ONNX
Open Neural Network Exchange. A model format that works across frameworks. Train in PyTorch → export to ONNX → convert to TensorRT (Jetson) or HailoRT (Hailo) or TFLite (RPi). The universal translator for AI models.
Inference
Running a trained model on new data. "Inference at 30 FPS" = the model processes 30 camera frames per second and outputs detections. Training happens offline on big GPUs. Inference happens on the drone.
GStreamer
Multimedia framework for video pipelines. On drones, it captures camera → resizes → feeds to AI model → overlays detections → streams to GCS. The plumbing that connects camera to brain to screen.
What Runs What
| Hardware | AI Compute | YOLOv8n FPS | YOLOv8s FPS | Power | Best For |
| Jetson AGX Orin 64GB | 275 TOPS (INT8) | ~180 | ~95 | 15-60W | Multi-model, large resolution |
| Jetson Orin NX 16GB | 100 TOPS | ~120 | ~55 | 10-25W | Sweet spot for drones |
| Jetson Orin Nano 8GB | 40 TOPS | ~60 | ~30 | 7-15W | Lightweight builds |
| Hailo-8 (PCIe module) | 26 TOPS | ~80 | ~35 | 2.5W | Ultra low power, RPi add-on |
| Orqa APB (i.MX8M NPU) | 2.25 TOPS | ~8 | ~3 | 3W | Simple detection, classification |
| Raspberry Pi 5 (CPU) | 0 (CPU only) | ~5 | ~2 | 5W | Prototyping only |
| ModalAI VOXL 2 (Adreno) | ~15 TOPS | ~40 | ~18 | 5W | Integrated with PX4 |
n vs s vs m vs l: YOLOv8 comes in sizes — nano (n), small (s), medium (m), large (l). Nano runs fast but misses small objects. Large catches everything but is slow. For drones at 30+ FPS: use nano on Orin Nano/Hailo/APB, small on Orin NX, medium on AGX Orin.
Install Ultralytics YOLOv8
# ═══ JETSON (JetPack 5.x / 6.x) ═══
# JetPack already has CUDA + cuDNN + TensorRT
pip3 install ultralytics
# Export model to TensorRT for max speed:
yolo export model=yolov8n.pt format=engine device=0
# Creates yolov8n.engine — optimized for YOUR specific Jetson GPU
# Run inference:
yolo predict model=yolov8n.engine source=0 # 0 = first cameraJetson
# ═══ ORQA APB (i.MX8M Plus NPU) ═══
# The NPU needs models in a specific format
# 1. Export to ONNX on your dev machine:
yolo export model=yolov8n.pt format=onnx
# 2. Convert to TFLite (NPU-compatible) via NXP eIQ toolkit:
# See: https://www.nxp.com/design/software/development-software/eiq-ml-development-environment
# 3. On APB, run via NNStreamer or GStreamer + TFLite delegate:
gst-launch-1.0 v4l2src device=/dev/video2 ! \
video/x-raw,width=640,height=480 ! \
tensor_converter ! \
tensor_filter framework=tensorflow-lite model=yolov8n.tflite \
custom=Delegate:NNAPI ! \
tensor_sink
# 2.25 TOPS NPU is enough for classification + simple detection
# For heavier models, add a Hailo-8 via PCIe expansion slotAPB
# ═══ HAILO-8 (on RPi 5 or any PCIe host) ═══
# Install HailoRT + Hailo Tappas (GStreamer plugins)
# See: https://github.com/hailo-ai/tappas
# Convert model:
# PyTorch → ONNX → Hailo DFC (via Hailo Dataflow Compiler)
hailo optimize yolov8n.onnx --hw-arch hailo8
hailo compile yolov8n.har
# Run:
gst-launch-1.0 v4l2src ! hailonet hef=yolov8n.hef ! autovideosinkHailo
ROS Integration — Detect → Act
Detection alone is useless if the drone can't act on it. Here's the pipeline: camera → YOLO → detection coordinates → MAVROS → FC.
# ═══ ROS NODE: YOLO DETECTION → MAVROS TARGETING ═══
#!/usr/bin/env python3
import rospy, cv2, numpy as np
from sensor_msgs.msg import Image
from geometry_msgs.msg import PoseStamped
from cv_bridge import CvBridge
from ultralytics import YOLO
model = YOLO('yolov8n.engine') # TensorRT engine
bridge = CvBridge()
target_pub = rospy.Publisher('/target/pose', PoseStamped, queue_size=1)
def image_callback(msg):
frame = bridge.imgmsg_to_cv2(msg, 'bgr8')
results = model(frame, verbose=False)
for det in results[0].boxes:
cls = int(det.cls[0])
conf = float(det.conf[0])
if conf > 0.6: # confidence threshold
x1, y1, x2, y2 = det.xyxy[0].cpu().numpy()
cx, cy = (x1+x2)/2, (y1+y2)/2
# Convert pixel coords to drone-relative angle
# Then publish as target pose for FC
rospy.loginfo(f"Detection: class={cls} conf={conf:.2f} "
f"center=({cx:.0f},{cy:.0f})")
rospy.init_node('yolo_detector')
rospy.Subscriber('/camera/image_raw', Image, image_callback)
rospy.spin()ROS Node
Thermal + visible fusion: If your drone has both a thermal camera (FLIR) and a visible camera, you can run YOLO on visible for classification and use thermal for detection in low light. ROS makes this easy — subscribe to both camera topics, fuse in your node.
Custom Model Training
Pre-trained YOLO detects 80 COCO classes (person, car, etc.). For custom targets (specific vehicles, equipment, animals), you need to train on your own data.
# ═══ TRAIN CUSTOM YOLO ON YOUR DATA ═══
# Do this on a desktop GPU (RTX 3090, etc.) — NOT on the drone
# 1. Label images with Roboflow or CVAT (free)
# Export as YOLO format (txt files with bbox coords)
# 2. Train:
yolo train model=yolov8n.pt data=my_dataset.yaml epochs=100 imgsz=640
# 3. Export for drone:
yolo export model=runs/detect/train/weights/best.pt format=engine # Jetson
yolo export model=runs/detect/train/weights/best.pt format=onnx # Then convert for APB/Hailo
# 4. Copy .engine file to drone and update model path in ROS nodeTraining
The full AI drone pipeline: Camera → GStreamer → ROS Image topic → YOLO node (TensorRT) → Detection topic → Tracking node → MAVROS targeting → FC. Add TAK bridge for live detection feed to ground operators. Add mesh radio for multi-drone collaborative detection.
Back to Implementation Guides