Optica Open
Browse

PPT: A Point Patch Transformer for Point Cloud Classification

Download (1.86 MB)
Version 3 2025-02-28, 07:02
Version 2 2025-02-28, 04:33
Version 1 2025-02-27, 08:53
preprint
posted on 2025-02-28, 07:02 authored by Lin Wang, Hao Deng, sisi li, Cheng Liu
With the success of Transformers in natural language processing and 2D computer vision, their application has extended to 3D point cloud understanding, yielding promising results. However, due to the inherent unordered nature of point cloud data, it cannot be easily partitioned into fixed-size patches as in 2D images. Consequently, most existing point cloud classification methods apply attention locally, neglecting the global context that attention mechanisms can offer. This limitation hinders the model’s ability to capture long-range dependencies within the point cloud. To overcome this challenge, we propose the Point Patch Transformer (PPT), which applies standard attention mechanisms to the entire point cloud, facilitating the capture of long-range contextual relationships. The proposed method first partitions the irregular point cloud into non-overlapping patches via the proposed novel Points2Patches (P2P) algorithm, treating these patches as tokens. These tokens are then processed through a standard attention block. Additionally, to enhance local feature extraction, we introduce a position encoding based on local attention that aggregates information from neighboring points. Experimental results on the ScanObjectNN and ModelNet40 datasets demonstrate the effectiveness and superiority of the proposed approach.

History

Funder Name

Key Research and Development Program of Shaanxi Province of China ( 2024GX-YBXM-149)

Preprint ID

121030

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC