BACKGROUNDAntineutrophil cytoplasmic antibody (ANCA)-associated vasculitis is a heterogenous autoimmune disease. While traditionally stratified into two conditions, granulomatosis with polyangiitis (GPA) and microscopic polyangiitis (MPA), the subclassification of ANCA-associated vasculitis is subject to continued debate. Here we aim to identify phenotypically distinct subgroups and develop a data-driven subclassification of ANCA-associated vasculitis, using a large real-world dataset.METHODSIn the collaborative data reuse project FAIRVASC (Findable, Accessible, Interoperable, Reusable, Vasculitis), registry records of patients with ANCA-associated vasculitis were retrieved from six European vasculitis registries: the Czech Registry of ANCA-associated vasculitis (Czech Republic), the French Vasculitis Study Group Registry (FVSG; France), the Joint Vasculitis Registry in German-speaking Countries (GeVas; Germany), the Polish Vasculitis Registry (POLVAS; Poland), the Irish Rare Kidney Disease Registry (RKD; Ireland), and the Skåne Vasculitis Cohort (Sweden). We performed model-based clustering of 17 mixed-type clinical variables using a parsimonious mixture of two latent Gaussian variable models. Clinical validation of the optimal cluster solution was made through summary statistics of the clusters' demography, phenotypic and serological characteristics, and outcome. The predictive value of models featuring the cluster affiliations were compared with classifications based on clinical diagnosis and ANCA specificity. People with lived experience were involved throughout the FAIRVASVC project.FINDINGSA total of 3868 patients diagnosed with ANCA-associated vasculitis between Nov 1, 1966, and March 1, 2023, were included in the study across the six registries (Czech Registry n=371, FVSG n=1780, GeVas n=135, POLVAS n=792, RKD n=439, and Skåne Vasculitis Cohort n=351). There were 2434 (62·9%) patients with GPA and 1434 (37·1%) with MPA. Mean age at diagnosis was 57·2 years (SD 16·4); 2006 (51·9%) of 3867 patients were men and 1861 (48·1%) were women. We identified five clusters, with distinct phenotype, biochemical presentation, and disease outcome. Three clusters were characterised by kidney involvement: one severe kidney cluster (555 [14·3%] of 3868 patients) with high C-reactive protein (CRP) and serum creatinine concentrations, and variable ANCA specificity (SK cluster); one myeloperoxidase (MPO)-ANCA-positive kidney involvement cluster (782 [20·2%]) with limited extrarenal disease (MPO-K cluster); and one proteinase 3 (PR3)-ANCA-positive kidney involvement cluster (683 [17·7%]) with widespread extrarenal disease (PR3-K cluster). Two clusters were characterised by relative absence of kidney involvement: one was a predominantly PR3-ANCA-positive cluster (1202 [31·1%]) with inflammatory multisystem disease (IMS cluster), and one was a cluster (646 [16·7%]) with predominantly ear-nose-throat involvement and low CRP, with mainly younger patients (YR cluster). Compared with models fitted with clinical diagnosis or ANCA status, cluster-assigned models demonstrated improved predictive power with respect to both patient and kidney survival.INTERPRETATIONOur study reinforces the view that ANCA-associated vasculitis is not merely a binary construct. Data-driven subclassification of ANCA-associated vasculitis exhibits higher predictive value than current approaches for key outcomes.FUNDINGEuropean Union's Horizon 2020 research and innovation programme under the European Joint Programme on Rare Diseases.