Drug Target Interaction (DTI) predictions have recently gained widespread popularity with advances in machine learning and publicly available bioassay datasets, such as BindingDB, Pubchem and ChEMBL. These machine learning approaches generally frame DTI predictions as a discriminative supervised learning problem, whereby combined pairs of features derived from the ligand (drug) and protein (target) are classified as a binding (pos.) or non-binding pair (neg.). These technologies however, diverge in data representation, feature embedding strategies, training data quality thresholds, scale of underlying datasets, data balance, use of neg. training examples, testing protocols and optimization targets. While global comparisons of DTI prediction methods may be hampered by divergent goals and data sources, the individual merits of each proposed innovation can be measured and communicated with diligent exptl. design. In this presentation, we review the experiment design strategies to evaluate DTI prediction methods. Addnl., we discuss the unique considerations associated with DTI datasets and the cross-validation best-practices that can better inform real-world performance.