Standard methods apply when all samples are retested by a second diagnostic test. However, valuable sample repositories can often be used more efficiently by selecting a single subsample of the samples for retesting by the new diagnostic test. For example, we will present compliance and symmetry estimation data for two diagnostic tests that demonstrate the presence of Chlamydia Trachomatis, a sexually transmitted pathogen for which there is no gold standard diagnostic test . The standard test has already been performed in the sample and revealed 827 positive and 4998 negative for chlamydia. Re-testing of all 4998 negative samples would have been prohibitively expensive and would not have been necessary for an adequate estimation of compliance statistics. Thus, only a subsample of 8% of the 4998 negative samples was retested, which saved 4596 samples and $230,000 in testing costs. However, as we show, subsampling retained enough statistical information to achieve all the objectives of the study. In diagnostic accuracy simulations, sample design provided efficiencies for estimating specificity and APP (Table 6). Among the observed sampling fractions, 670/2331 = 28.7% of the women had established the gold standard, but specificity and ppV have variances that are inflated by only a factor of 2 compared to all women who go through the gold standard. This is because the over-sampling design of the few women who developed CIN2+.
For the final design, which also looks at all HPV+ women (670 + 102) / 2331 = 33.1%), the variances of specificity and ppV are within 10% of the variance among all women who go through the gold standard. This is also explained by the fact that this design would cover on average 34.2 of the 38.5 CIN2+ estimated in the full study. However, sample enrichment for CIN2+ has little influence on variances in capital sensitivity and value. Instead, sample design, which doubles the sample share for negative people for all three screening tests, offers efficiencies for estimating sensitivity and capital value (but not for specificity and APP). These results illustrate the overall result  that a cost-effective estimation of specificity (or APP) using two-phase designs requires an oversample of test positives (which tend to record most true gold standard positives), but requires an oversample of test negatives for sensitivity (or NPV). In their work, Dunet et al1 use both Bland and Altmans the correspondence limits and the Lin Concordance correlation coefficients to evaluate the concordance between software packages. . . .