The logistics and construction industries face persistent challenges in automating pallet handling, especially in outdoor environments with variable payloads, inconsistencies in pallet quality and dimensions, and unstructured surroundings. In this paper, we tackle automation of a critical step in pallet transport: the pallet pick-up operation. Our work is motivated by labor shortages, safety concerns, and inefficiencies in manually locating and retrieving pallets under such conditions. We present Lang2Lift, a framework that leverages foundation models for natural language-guided pallet detection and 6D pose estimation, enabling operators to specify targets through intuitive commands such as “pick up the steel beam pallet near the crane.” The perception pipeline integrates Florence-2 and SAM2 for language-grounded segmentation with PoseFoundation for robust pose estimation in cluttered, multi-pallet outdoor scenes under variable lighting. The resulting poses feed into a motion planning module for fully autonomous forklift operation. We validate Lang2Lift on the ADAPT autonomous forklift platform, achieving 0.76 mIoU pallet segmentation accuracy on a real-world test dataset. Timing and error analysis demonstrate the system’s robustness and confirm its feasibility for deployment in operational logistics and construction environments.
@misc{nguyen2025lang2liftframeworklanguageguidedpallet,
title={Lang2Lift: A Framework for Language-Guided Pallet Detection and Pose Estimation Integrated in Autonomous Outdoor Forklift Operation},
author={Huy Hoang Nguyen and Johannes Huemer and Markus Murschitz and Tobias Glueck and Minh Nhat Vu and Andreas Kugi},
year={2025},
eprint={2508.15427},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2508.15427},
}
We borrow the page template from Nerfies project page. Special thanks to them!
This website is licensed under a Creative
Commons Attribution-ShareAlike 4.0 International License.