Deep learning-based in silico alternatives have been demonstrated to be of significant importance in the acceleration of the drug discovery process and enhancement of success rates. Cyclin-dependent kinase 12 (CDK12) is a transcription-related cyclin-dependent kinase that may act as a biomarker and therapeutic target for cancers. However, currently, there is no high selective CDK12 inhibitor in clinical development and the identification of new specific CDK12 inhibitors has become increasingly challenging due to their similarity with CDK13. In this study, we developed a virtual screening workflow that combines deep learning with virtual screening tools and can be applied rapidly to millions of molecules. We designed a Transformer architecture Drug-Target Interaction (DTI) model with dual-branched self-supervised pre-trained molecular graph models and protein sequence models. Our predictive model produced satisfactory predictions for various targets, including CDK12, with several novel hits. We screened a large compound library consisting of 4.5 million drug-like molecules and recommended a list of potential CDK12 inhibitors for further experimental testing. In kinase assay, compared to the positive CDK12 inhibitor THZ531, the compounds CICAMPA-01, 02, 03 displayed more effective inhibition of CDK12, up to three times as much as THZ531. The compounds CICAMPA-03, 05, 04, 07 showed less inhibition of CDK13 compare to THZ531. In vitro, the IC50 of CICAMPA-01, 04, 05, 06, 09 was less than 3 μM in the HER2 positive CDK12 amplification breast cancer cell line BT-474. Overall, this study provides a highly efficient and end-to-end deep learning protocol, in conjunction with molecular docking, for discovering CDK12 inhibitors in cancers. Additionally, we disclose five novel CDK12 inhibitors. These results may accelerate the discovery of novel chemical-class drugs for cancer treatment.