In recent decades, the environmental detection of various organic compounds (OCs) has highlighted the limitations of conventional soil-water sorption models, which simplify complex experimental conditions and often overlook OCs with polyfunctional and ionizable structures. To address these shortcomings, we compiled a comprehensive soil-water sorption dataset encompassing 20,945 data points for 419 OCs with various functional groups and 1037 different soils. Meta-analysis of the dataset revealed the trends of soil sorption associated with OC substructures, soil properties, and solution conditions. Machine learning models employing the XGBoost algorithm, in conjunction with MACCS fingerprints and experimental conditions, were developed to cover the entire spectrum of speciation for cationic, neutral, and anionic species. Among these, the individual models tailored to each speciation achieved an overall root-mean-square-error value of 0.32 for log Kd. Model interpretation revealed that the models correctly understood the contributions of various substructures, such as multiple aromatic rings and nitrogen or oxygen atoms, to sorption. The models were also found to accurately capture isotherm nonlinearity and the pH effect on the sorption of ionizable OCs. Finally, utilizing soil properties from the Harmonized World Soil Database, the models predicted the sorption of diverse OCs based on global soil properties under simulated environmental scenarios.