Functional brain network (FBN) analysis based on fMRI has proven effective for neurological/mental disorder classification. Traditional methods usually separate the FBN construction from the subsequent classification tasks, resulting in a suboptimal solution. Recently, transformers, known for their attention mechanisms, have shown strong performance in various tasks, including brain disorder classification. However, existing methods treat subjects independently, limiting the capture of their shared patterns. To address these issues, we propose GSAformer, a group sparse attention-based model for brain disorder diagnosis. Specifically, we first construct brain connectivity matrices for subjects using Pearson's correlation, and then incorporate group sparse prior into the transformer to explicitly model inter-subject relationships. Group sparsity is applied across attention matrices to reduce parameters, improve the generalization, and enhance the interpretability. A maximum mean discrepancy (MMD) constraint is also introduced to ensure consistency between the learned attention matrices and the group sparse brain networks. Our framework integrates population-level prior knowledge, and supports end-to-end adaptive learning, while maintaining computational complexity on par with the standard Transformer and demonstrating enhanced capability in capturing group sparse topological structures among population. We evaluate the GSAformer on three public datasets for brain disorder classification. The classification performance of the proposed method is improved by 3.8%, 4.1% and 14.7% on the three datasets, respectively, compared with the standard Transformer.