In 2006, Gupta et al. published an analysis of vaccine efficacy in humans for H3N2 influenza A. We collected vaccine efficacy data from the epidemiological literature for years between 1971 and 2003. In total, 19 efficacy values were obtained. We determined correlations between vaccine efficacy and four measures of antigenic distance. The first measure is the fraction of amino acid differences between the vaccine strain and the dominant circulating strain in the hemagglutinin HA1 sequence, psequence. The second measure is the fraction of amino acid differences between the dominant epitope of the vaccine and dominant circulating strain, pepitope. The third measure is the logarithm base 2 of the ratio of homologous to heterologous titers, d1 [3]. The fourth measure is the square root of the ratio of the homologous titers to the heterologous titers, d2 [4]. The correlations of the different measures of antigenic distance with vaccine efficacy are shown in Table 1.
Table 1
Correlation of H3N2 Influenza A vaccine efficacy in humans with different measures of antigenic distance
In 2009, Ndifon et al. published an analysis of a subset of these data, 11 data points, with some modifications [2]. Four new data points were added: some early data from 1968 and 1969, data for 1980/1981, and recent data for 2004/2005. Data for 1971/1972, provided in [1], were omitted from [2].
There are a number of discrepancies in the data of Table 1 of Ndifon et al. [2]. In 1972/1973 the vaccine strain was listed as A/Hong Kong/1/68. The correct vaccine strain was A/Aichi/2/68 (also known as X31) [1,5]. A value of pepitope of 0.263 was listed in [2], rather than the correct 0.190 [1]. In 1994/1995, a dominant circulating strain of A/Shangdong/9/93 was listed [2], rather than a mixture of strains that more closely resemble A/Johannesburg/33/94 than A/Shangdong/9/93 [1,6], and which we represent by the former [1]. A value of pepitope of 0 was listed in [2], rather than 0.105 [1]. In 1995/1996 the vaccine efficacy was listed as 42.0% [2], rather than 45% [1]. In 1996, vaccine and circulating strains of A/Wuhan/359/95 and A/Nanchang/933/95 were listed [2] instead of the correct A/Nanchang/933/95 vaccine and A/Wuhan/359/95 US CDC-determined circulating strain [1,7,8]. In 1997, the dominant circulating strain of A/Nanchang/933/95 was listed [2] instead of A/Wuhan/359/95 [1,8], leading to pepitope of 0 [2] instead of 0.095 [1]. In 1997/1998 a pepitope value of 0.227 [2] was listed instead of the correct 0.238 [1](an inconsistent definition of pepitope was used in [2]). The vaccine discrepancies in [2] stem from the incorrect assumption that the WHO “recommended” strain was administered, rather than the “like” vaccine strain that was actually manufactured and administered [5–8]. Finally, in 2003/2004 efficacy data for individuals vaccinated within 2 weeks of illness were removed from the dataset. Removing these 9 individuals from the data of Ref. 35 of [1], decreases the vaccine efficacy from 12% to 0.7%. This change is within the noise of the data, and makes little difference to the results (see below).
Once these amendments are made, the 23 data in aggregate from [1,2] reveal a correlation between vaccine efficacy and the pepitope theory of R2 = 0.76 (see Figure 1). We focus here on the difference between pepitope and the rAHM measure of antigenic distance reported as correlating well with vaccine efficacy in [2]. We note that the definition of rAHM is identical to that of d2. In [2], only half of the data were used, those for which the vaccine and dominant circulating strain were distinct, a total of 11 data points. These 11 data points were used to test the pepitope, psequence, and d2 = rAHM measures of antigenic distance. With the corrections discussed above made, there are 14 data points fitting this criterion. If a large amount of vaccine efficacy data were available, removing a small subset of data would not be problematic. Removing 50% of the data, so that there are no data for small to moderate antigenic distances, led to a number of artifacts in [2]. The first artifact is that the linear fit of rAHM to the 11 data points of vaccine efficacy extrapolates to a vaccine efficacy of 18% when the vaccine is identical to the dominant circulating strain, with R2 = 0.56 (see Figure 1, left insert). While the R2 is sizable, the prediction of 18% is discrepant from the average vaccine efficacy of 43% [1,2] when the vaccine is identical to the dominant circulating strain. That rAHM does not predict moderate antigenic distances well is made clear when the rAHM data are fit to all years, with R2 = 0.54 instead of R2 = 0.76 for the pepitope theory. In [2] was also reported that the correlation coefficient of the standard d1 measure of antigenic distance used by vaccine designers with vaccine efficacy is R2 = 0.01. When pepitope is fit to the amended 14 data points, the linear fit extrapolates to a vaccine efficacy of 27% for an identical vaccine and dominant circulating strain (see Figure 1, right insert), with R2 = 0.27, and not almost zero as reported in [2]. The prediction of the pepitope theory is more accurate than that of the rAHM measure on these out-of-sample data, although the correlation coefficient is lower. We note that 4 out of the 6 points with pepitope > 0.19 have a negative efficacy. This predictive ability is rather similar to that of the rAHM data, for which 4 out of the 5 points with rAHM > 5 have negative vaccine efficacy [2]. In the 2004/2005 season, both A/California/7/2004 and A/Fujian/411/2002 were circulating strains, in addition to a substantial amount of circulating influenza B. The antigenic distance between A/Wyoming/3/2003 and A/California/7/2004 is pepitope = 0.286, and the antigenic distance between A/Wyoming/3/2003 and A/Fujian/411/2002 is pepitope = 0.095. Thus, while the predicted efficacy for the former is not positive, for the latter it is 20%. Antigenic distance for A/California/7/2004 alone cannot predict the expected vaccine efficacy against multiple nearly-dominant circulating strains. Indeed, the reported efficacy of 9.2% [2] is roughly the average of the 0% and 20% predicted efficacies from the pepitope theory. When the 2004/2005 data point is eliminated, the pepitope prediction extrapolates to 37% efficacy for identical vaccine and dominant circulating strain, with R2 = 0.46. If the efficacy of the 2003/2004 data point is changed from 12% to 0.7%, there is little change to this result: R2 = 0.44 with an extrapolation of 37% efficacy for pepitope = 0.
Figure 1
Vaccine efficacy versus the pepitope or rAHM measures of antigenic distance. In inset are the data for which the vaccine and dominant circulating strain are distinct.
In summary, the pepitope theory is more accurate and has a larger R2 value than the rAHM ferret animal model data when all the human H3N2 influenza A vaccine efficacy data are considered. When trained on half of the data, the pepitope theory more accurately predicts the out-of-sample, small and moderate antigenic distance efficacies than does the rAHM data. When the data point for the 2004/2005 season with multiple nearly-dominant circulating strains is removed, the pepitope theory and rAHM data fit have similar R2 values on years for which the vaccine and dominant circulating strains are distinct. Both pepitopeand rAHM predict that vaccine efficacy decreases to zero beyond a critical antigenic distance, given by pepitope* = 0.19. We note that the pepitope theory requires only sequence information, whereas rAHM is constructed from hemagglutinin inhibition data measured in ferrets.