Version 3 2025-02-28, 07:02Version 3 2025-02-28, 07:02
Version 2 2025-02-28, 04:33Version 2 2025-02-28, 04:33
Version 1 2025-02-27, 08:53Version 1 2025-02-27, 08:53
preprint
posted on 2025-02-28, 07:02authored byLin Wang, Hao Deng, sisi li, Cheng Liu
With the success of Transformers in natural language processing and 2D computer vision, their application has extended to 3D point cloud understanding, yielding promising results. However, due to the inherent unordered nature of point cloud data, it cannot be easily partitioned into fixed-size patches as in 2D images. Consequently, most existing point cloud classification methods apply attention locally, neglecting the global context that attention mechanisms can offer. This limitation hinders the model’s ability to capture long-range dependencies within the point cloud.
To overcome this challenge, we propose the Point Patch Transformer (PPT), which applies standard attention mechanisms to the entire point cloud, facilitating the capture of long-range contextual relationships. The proposed method first partitions the irregular point cloud into non-overlapping patches via the proposed novel Points2Patches (P2P) algorithm, treating these patches as tokens. These tokens are then processed through a standard attention block. Additionally, to enhance local feature extraction, we introduce a position encoding based on local attention that aggregates information from neighboring points. Experimental results on the ScanObjectNN and ModelNet40 datasets demonstrate the effectiveness and superiority of the proposed approach.
History
Funder Name
Key Research and Development Program of Shaanxi Province of China ( 2024GX-YBXM-149)