We apply vision transformers (ViTs) in a distributed fiber optic sensing system to evaluate road traffic parameters in smart cities. Convolutional neural networks (CNN) are also assessed for benchmarking. The experimental setup is based on a direct-detection phase-sensitive optical time-domain reflectometer implemented using a narrow linewidth source. The monitored fibers are buried in the university campus, creating a smart city environment. Backscattered traces are consolidated into space-time matrices, illustrating traffic patterns and enabling analysis through image processing algorithms. The ground truth is established by traffic parameters obtained by processing video camera images monitoring the
same street using the YOLOv8 model. The results indicate that ViTs outperform CNNs for estimating the number of vehicles and the mean vehicle speed. While the ViT necessitates a significantly larger number of parameters, its complexity is similar to that of the CNN when considering multiply-accumulate operations and random access memory usage. The processed dataset has been made publicly available for benchmarking.<p></p>
History
Funder Name
Fundação de Amparo à Pesquisa do Estado de São Paulo (2022/12917-5,2022/07488-8,2022/11596-0,2021/06569-1,2021/11380-5); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (88887.954253/2024-00); Conselho Nacional de Desenvolvimento Científico e Tecnológico (402081/2023-4,405940/2022-0,317133/2023-3)
Preprint ID
117592
Highlighter Commentary
The authors show that vision transformer outperforms MobileNetV2 in vehicle counting and average speed estimation tasks. Over an eight-day test period, ViT reliably captured daily traffic patterns, identifying peak times and event-related fluctuations. These results highlight the potential of optical time-domain reflectometer-based distributed fiber optic sensing systems to efficiently monitor traffic in smart cities.
-- Mousa Moradi, Research Fellow, Harvard Medical School, Harvard Ophthalmology AI Lab