Prompt Complexity and Plan Stability: Benchmarking On-Device Language Planning for Robots

Lehmann, Dennis; Laas, Roman; Harasic, Marko; Paschke, Adrian

doi:10.1109/ICCR67607.2025.11372073

2025

Conference Paper

Abstract

This paper investigates whether publicly available small-and medium-scale language models, used without task-specific fine-tuning, can serve as a language-based planning layer for robotic manipulation entirely on-device. We present a reproducible pipeline on an embedded GPU platform (NVIDIA Jetson AGX Orin with Ollama) that maps natural language instructions to a minimal action vocabular and incorporates a validation/feedback loop with bounded retries. The framework records completion, latency, and corrective steps to support systematic analysis. For controlled evaluation, we construct a 100-prompt benchmark spanning sorting, stacking, and transport tasks, and introduce a Prompt Complexity Index that aggregates linguistic and procedural factors to stratify instructions into simple, medium, and complex tiers. Our cross-scale analysis characterizes feasibility and the inherent quality-latency trade-offs of language models operating at the edge, and suggests that stronger structural cues in prompts can improve plan stability.