AbstractObjectiveThe aim of this study was to establish when a second‐stage diagnostic test may be of value in cases where a primary diagnostic test has given an uncertain diagnosis of the benign or malignant nature of an adnexal mass.MethodsThe diagnostic performance with regard to discrimination between benign and malignant adnexal masses for mathematical models including ultrasound variables and for subjective evaluation of ultrasound findings by an experienced ultrasound examiner was expressed as area under the receiver–operating characteristics curve (AUC), sensitivity and specificity. These were calculated for the total study population of 1938 patients with an adnexal mass as well as for subpopulations defined by the certainty with which the diagnosis of benignity or malignancy was made. The effect of applying a second‐stage test to the tumors where risk estimation was uncertain was determined.ResultsThe best mathematical model (LR1) had an AUC of 0.95, sensitivity of 92% and specificity of 84% when applied to all tumors. When model LR1 was applied to the 10% of tumors in which the calculated risk fell closest to the risk cut‐off of the model, the AUC was 0.59, sensitivity 90% and specificity 21%. A strategy where subjective evaluation was used to classify these 10% of tumors for which LR1 performed poorly and where LR1 was used in the other 90% of tumors resulted in a sensitivity of 91% and specificity of 90%. Applying subjective evaluation to all tumors yielded an AUC of 0.95, sensitivity of 90% and specificity of 93%. Sensitivity was 81% and specificity 47% for those patients where the ultrasound examiner was uncertain about the diagnosis (n = 115; 5.9%). No mathematical model performed better than did subjective evaluation among the 115 tumors where the ultrasound examiner was uncertain.ConclusionWhen model LR1 is used as a primary test for discriminating between benign and malignant adnexal masses, the use of subjective evaluation of ultrasound findings by an experienced examiner as a second‐stage test in the 10% of cases for which the model yields a risk of malignancy closest to its risk cut‐off will improve specificity without substantially decreasing sensitivity. However, none of the models tested proved suitable as a second‐stage test in tumors where subjective evaluation yielded an uncertain result. Copyright © 2010 ISUOG. Published by John Wiley & Sons, Ltd.