Re-examination of the relationship between marine virus and microbial cell abundances

Marine viruses are critical drivers of ocean biogeochemistry, and their abundances vary spatiotemporally in the global oceans, with upper estimates exceeding 108 per ml. Over many years, a consensus has emerged that virus abundances are typically tenfold higher than microbial cell abundances. However, the true explanatory power of a linear relationship and its robustness across diverse ocean environments is unclear. Here, we compile 5,671 microbial cell and virus abundance estimates from 25 distinct marine surveys and find substantial variation in the virus-to-microbial cell ratio, in which a 10:1 model has either limited or no explanatory power. Instead, virus abundances are better described as nonlinear, power-law functions of microbial cell abundances. The fitted scaling exponents are typically less than 1, implying that the virus-to-microbial cell ratio decreases with microbial cell density, rather than remaining fixed. The observed scaling also implies that viral effect sizes derived from ‘representative’ abundances require substantial refinement to be extrapolated to regional or global scales.

V iruses of microbes have been linked to central processes across the global oceans, including biogeochemical cycling [1][2][3][4][5][6] and the maintenance and generation of microbial diversity 1,4,[7][8][9] . Virus propagation requires that virus particles both contact and subsequently infect cells. The per cell rate at which microbial cells-including bacteria, archaea and microeukaryotes-are contacted by viruses is assumed to be proportional to the product of virus and microbial abundances 10 . If virus and microbe abundances were related in a predictable way it would be possible to infer the rate of virus-cell contacts from estimates of microbial abundance alone.
Virus ecology underwent a transformation in the late 1980s with the recognition that virus abundances, as estimated using cultureindependent methods, were orders of magnitude higher than estimates via culture-based methods 11 . Soon thereafter, researchers began to report the 'virus to bacterium ratio' (VBR) as a statistical proxy for the strength of the relationship between viruses and their potential hosts in both freshwater and marine systems 12 . This ratio is more appropriately termed the 'virus-to-microbial cell ratio' (VMR), a convention that we use here (Supplementary Section 1).
Observations accumulated over the past 25 years have noted a wide variation in the VMR, yet there is a consensus that a suitable first approximation is that the VMR is equal to 10 (Supplementary Table 1). This ratio also reflects a consensus that typical microbial abundances are approximately 10 6 per ml and typical virus abundances are approximately 10 7 per ml 13,14 . Yet, the use of a fixed ratio carries with it another assumption-that of linearity-that is, if microbial abundance were to double, then viruses are expected to double as well. An alternative is that the relationship between virus and microbial abundances is better described in terms of a nonlinear relationship, for example, a power law.
In practice, efforts to predict the regional or global-scale effects of viruses on marine microbial mortality, turnover and even biogeochemical cycles depend critically on the predictability of the relative density of viruses and microbial cells. The expected community-scale contact rate, as inferred from the product of virus and microbial abundances, is a key factor for inferring virus-induced cell lysis rates at a site or sites (see, for example, ref. 15), which also depend on diversity 16 , latent infections 17 and virus-microbe infection networks 18 . Here, we directly query the nature of the relationship between virus and microbial densities via a large-scale compilation and re-analysis of abundance data across marine environments. to 3.2 × 10 6 per ml, and 95% of virus abundances range from roughly 3.6 × 10 5 to 6.5 × 10 7 per ml (Fig. 2a). Both microbial and virus concentrations generally decrease with depth, as reported previously (for example, see ref. 19). We separated the samples according to depth using an operational definition of the nearsurface and sub-surface, corresponding to samples taken at depths ≤100 m and >100 m, respectively. The cutoff of 100 m was chosen as a typical depth scale for the euphotic zone in systems with low to moderate chlorophyll 20 . The precise depth varies spatiotemporally. Our intent was to distinguish zones strongly shaped by active planktonic foodweb dynamics in well-lit waters (the 'near-surface') from dark mesopelagic waters shaped primarily by decaying particle fluxes with greater depth (the 'subsurface'). The median VMR for the near-surface samples (≤100 m) is 11.1, and the median VMR for the sub-surface samples (>100 m) is 16.0. In that sense, the consensus 10:1 ratio does accurately represent the median VMR for the surface data. We also observe substantial variation in the VMR, as has been noted in earlier surveys and reviews (Supplementary Table 1). Figure 2b shows that 95% of the variation in VMR in the near-surface ocean lies between 2.6 and 160, and between 3.9 and 74 in the sub-surface ocean. For the near-surface ocean, 50% of the VMR values are between 5 and 15, 13% are less than 5 and 37.5% exceed 15. This wide distribution, both for the near-surface and the sub-surface, demonstrates the potential limitations in using the 10:1 VMR, or any fixed ratio, as the basis for a predictive model of virus abundance derived from estimates of microbial abundance.
Virus abundance does not vary linearly with microbial abundance. Figure 3 shows two alternative predictive models of the relationship between logarithmically scaled virus and microbial abundances for water column samples. The models correspond to a fixed-ratio model and a power-law model. To clarify the interpretation of fitting in log-log space, consider a fixed-ratio model with a 12:1 ratio between virus and microbial abundance, V = 12 × B. Then, in log-log space the relationship is log 10 (V) = log 10 12 + log 10 B (1) which we interpret as a line with a y intercept of log 10 12 = 1.08 and a slope (change in log 10 V for a one-unit change in log 10 B) of 1. By the same logic, any fixed-ratio model will result in a line with slope 1 in the log-log plot, and the y intercept will vary logarithmically with VMR. The alternative predictive model is that of a power law: V = cB α 1 . In log-log space, the relationship is log 10 V = log 10 c + α 1 log 10 B log 10 V = α 0 + α 1 log 10 B The slope α 1 of a fitted line on log-transformed data denotes the power-law exponent that best describes the relationship between the variables. The intercept α 0 of a fitted line on log-transformed data denotes the logarithmically transformed pre-factor. The 10:1 line has residual squared errors of −6% and −26% in the surface and deep samples, respectively (Supplementary Table 3). In both cases, this result means that a 10:1 line explains less of the variation in virus abundance compared with a model in which virus abundance is predicted by its mean value across the data. To evaluate the generality of this result, we considered an ensemble of fixed-ratio models, each with a different VMR. In the near-surface samples, we find that fixed-ratio models, outside of a range between 11.7 and 15.6, explain less of the variation (that is, have negative values of R 2 ) than a 'model' in which virus abundance is predicted to be the global mean in the data set ( Supplementary Fig. 1). This reflects the failure of constant-ratio (that is, linear) models to capture the cluster of high VMRs at low microbial density apparent in the density contours of Fig. 2a and the shoulder of elevated high VMR frequency in Fig. 2b. The largest contributor to this cluster of points is the ARCTICSBI study (Table 1). In the sub-surface samples, fixed-ratio models in which the VMR varies between 12.4 and 23 do have positive explanatory power, but all perform worse than the power-law model ( Supplementary Fig. 1). In contrast, the best-fitting power-law model explains 19 and 64% of the variation in the data for nearand sub-surface samples, respectively (Supplementary Table 3). The best-fit power-law scaling exponents are 0.51 (with 95% confidence intervals (CIs) of (0.47,0.55)) for near-surface samples and 0.53 (with 95% CIs of (0.52,0.55)) for sub-surface samples. The difference between a linear and a power-law model can be understood, in part, by comparing predictions of viral abundances as a function of variation in microbial abundances. For example, doubling the microbial abundance along either regression line is not expected to lead to a doubling in virus abundance, but rather a 2 0.51 = 1.42-and 2 0.53 = 1.4-fold increase, respectively. The difference between models becomes more apparent with scale, for example, 10-and 100-fold increases in near-surface microbial abundances are predicted to be associated with 10 0.51 = 3.2-and 100 0.53 = 11-fold increases in viral abundances, respectively, given a power-law model. The power-law model is an improvement over the fixedratio model in both the near-and sub-surface, even when accounting for the increase in parameters (Supplementary Table 3). In the nearsurface, refitting the surface data without outliers improves the explanatory power to approximately R 2 = 0.34 from an R 2 = 0.19 for the sub-surface (Methods and Supplementary Fig. 2). The power-law exponents in the near-and sub-surface are qualitatively robust to variation in the choice of depth threshold, for example, as explored over the range between 50 and 150 m ( Supplementary Fig. 3). In summary, the predictive value of a power-law model is much stronger in the sub-surface than in the near-surface, where confidence in the interpretation of power-law exponents is limited.
Study-to-study measurement variation is unlikely to explain the intrinsic variability of virus abundances in the surface ocean. Next, we explored the possibility that the variation in methodologies affected the baseline offset of virus abundance measurements and thereby decreased the explanatory power of predicting virus abundances based on microbial abundances. That is, if V * is the true and unknown abundance of viruses, then it is possible that two studies would estimateV 1 = V * (1 + ϵ 1 ) and V 2 = V * (1 + ϵ 2 ), where |ε 1 | and |ε 2 | denote the relative magnitude of study-specific shifts. We constrain the relative variation in measurement, such that the measurement uncertainty is 50% or less (see 'Materials and methods'). The constrained regression model improves the explanatory power of the model (Supplementary Table 3), but, in doing so, the model forces 18 of the 22 studies to the maximum level of measurement variation permitted ( Supplementary Fig. 4). We do not expect differences in measurement protocols to explain the nearly two orders of magnitude variation in estimating virus abundance, given the same true virus abundance in a sample. Note that when subsurface samples were analysed through the constrained power-law model, there was only a marginal increase of 2% in R 2 and, moreover, 9 of the 12 studies were fit given the maximum level of measurement variation permitted ( Supplementary Fig. 4). The constrained intercept model results suggest that the observed variation in virus abundance in the surface oceans is not well explained strictly by the variation in measurement protocol between studies.
VMR decreases with increasing microbial abundance, a hallmark of power-law relationships. We next evaluate an ensemble of power-law models, V i = c i N α i , where index i denotes the use of distinct intercepts and power-law exponents for each survey. The interpretation of this model is that the nonlinear nature of the virus-to-microbial relationship may differ in distinct oceanic realms or due to underlying differences in sites or systems, rather than due to measurement differences. Figure 4 presents the results of fitting using the study-specific power-law model in the surface ocean samples. Study-specific power-law fits are significant in 16 of 22 cases in the surface ocean. The median power-law exponent for studies in the surface ocean is 0.48. Furthermore, of those significant power-law fits, the 95% distribution of the power-law exponent excludes a slope of one and is entirely less than one in 9 of 16 cases (Fig. 5). This model, in which the power-law exponent varies with study, is a significant improvement in terms of R 2 (Supplementary Table 3). For sub-surface samples, study-specific power-law fits are significant in 10 of 12 cases in the sub-surface ( Supplementary Fig. 5). The median power-law exponent for studies in the sub-surface is 0.67. Of those significant power-law fits, the central 95% distribution of the power-law exponent is less than one in 6 of 10 cases (Supplementary Fig. 6). A power-law exponent of less than one means that the virus abundance increases less than proportionately given increases in microbial abundance. This study-specific analysis extends the findings that nonlinear, rather than linear, models are more suited to describing the relationship between virus and microbial abundances. We find that the dominant trend in both near-surface and sub-surface samples is that the VMR decreases as microbial abundance increases. The increased explanatory power by study is stronger for near-surface than for sub-surface samples. This increase in R 2 comes with a caveat: study-specific models do not enable a priori predictions of virus abundance given a new environment or sample site, without further effort to disentangle the biotic and abiotic factors underlying the different scaling relationships.

Discussion
Viruses are increasingly considered in efforts to describe the factors controlling marine microbial mortality, productivity and biogeochemical cycles 3,4,33-36 . Quantitative estimates of virus-induced effects can be measured directly, but are often inferred indirectly using the relative abundance of viruses to microbial cells. To do so, there is a consensus that assuming the VMR is 10 in the global oceans-despite the observed variation-is a reasonable starting point. Here, we have re-analysed the relationship of virus to microbial abundances in 22 marine survey data sets. We find that  95% of the variation in VMR ranges from 2.6 to 160 in the nearsurface ocean and from 3.9 to 74 in the sub-surface. Although the 10:1 ratio accurately describes the median of the VMR in the surface ocean, the broad distribution of VMR implies that microbial abundance is a poor quantitative predictor of virus abundance. Moreover, increases in microbial abundance do not lead to proportionate increases in virus abundance. Instead, we propose that the virus to microbial abundance relationship is nonlinear and that the degree of nonlinearity-as quantified via a power-law exponent-is typically less than 1. This sublinear relationship can be interpreted to mean that the VMR decreases as an increasing function of microbial abundance and generalizes earlier observations 13 . Power-law relationships between virus and microbial abundance emerge from complex feedbacks involving both exogeneous and endogenous factors. The question of exogenous factors could be addressed, in part, by examining environmental covariates at survey sites. For example, if microbial and virus abundances varied systematically with another environmental co-factor during a transect, then this would potentially influence the inferred relationship between virus and microbial abundances. In that same way, variation in environmental correlates, including temperature and incident radiation, may directly modify virus life history traits 37,38 . Also, some of the marine survey data sets examined here constitute repeated measurements at the same location (for example, in the Bermuda Atlantic Time-series Study (BATS)). Time-varying environmental factors could influence the relative abundance of microbes and viruses. It is also interesting to note that viruses-induced mortality is considered to be more important at eutrophic sites 13 , where microbial abundance is higher, yet the observed decline in VMR with microbial abundance would suggest the opposite. It could also be the case that variation in endogenous factors determines total abundances. Endogenous factors can include the life history traits of viruses and microbes that determine which hosts are infected by which viruses 18 , as well as the quantitative rates of growth, defence and infection. For example, relative strain abundances are predicted to depend on niche differences according to the 'kill-the-winner' theory, which presupposes tradeoffs between growth and defence 1,39 . Similarly, the recent hypothesis of a complementary 'king-of-the-mountain' mechanism suggests that relative abundance relationships may depend on life history trait differences, even when tradeoffs are not strict 40 .
In both examples, total abundances may nonetheless depend on other factors, including the strength of grazing. The analysis of abundance relationships also requires a consideration of variation in time. As is well known, virus-microbe interactions can lead to intrinsic oscillatory dynamics. Indeed, previous observations of a declining relationship between VMR and microbial abundance have been attributed to changing ratios across phytoplankton bloom events, including possible virusinduced termination of blooms 13 . Similar arguments have been proposed in the analysis of tidal sediments 41 . Alternatively, observations of declining VMR with microbial density have been attributed to a variation in underlying diversity 42 . Another factor potentially complicating abundance predictions is that episodic events, including the induction of lysogenic populations, influence total microbial and viral counts. Varying degrees of lysogenic and co-infection relationships have been measured in marine virus-host systems 14,17,43 , the consequences of which may differ from those given interactions with lytic viruses, as is commonly the focus of model-and empirical-based studies. Whatever the mechanism(s), it is striking that virus abundances in some surveys can be strongly predicted via alternative power-law functions of microbial abundances. Mechanistic models are needed to further elucidate these emergent macroecological patterns and relationships, akin to recent efforts to explain emergent power laws between terrestrial predators and prey 44 .
The present analysis first separated the abundance data according to depth and then according to survey as a means to identify different relationships between virus and microbial abundances in the global oceans. The predictive value of total microbial abundance The confidence intervals are plotted using 'violin' plots that include the median (centre black line), 75% distribution (white bars) and 95% distribution (black line), with the distribution overlaid (blue shaded area). The numbers of points included as part of each study are displayed on the rightmost bar plots. Study labels in black indicate those studies for which the regression fit had a P value less than 0.002 = 0.05/22 (accounting for a multiple comparison correction given the analysis of 22 studies). Study labels in grey indicate a P value above this threshold.
is strong when considering sub-surface samples. In contrast, microbial abundance is not a strong predictor of virus abundance in near-surface samples, when using linear or nonlinear models. The predictive power of nonlinear models improved substantially in the near-surface when evaluating each marine survey separately. The minimal predictive value of microbial cell abundances for inferring viral abundances in the near-surface when aggregating across all surveys is problematic given that virus-microbe interactions have significant roles in driving microbial mortality and ecosystem functioning 3,5,33 . Indeed the aggregation of abundance measurements in terms of total microbial abundances may represent part of the problem. At a given site and time of sampling, each microbial cell in the community is potentially targeted by a subset of the total viral pool. In moving forward, understanding the variation in virus abundance and its relationship to microbial abundance requires a critical examination of correlations at functionally relevant temporal and spatial scales, that is, at the scale of interacting pairs of viruses and microbes. These scales will help inform comparisons of virus-microbe contact rates with viral-induced lysis rates, thereby linking abundance and process measurements. We encourage the research community to prioritize examination of these scales of interaction as part of efforts to understand the mechanisms underlying nonlinear virus-microbe abundance relationships in the global oceans.

Methods
Data source. Marine virus abundance data were aggregated from 22 studies (Table 1). A total of 5,508 data points were aggregated. The data collection dates ranged from 1996 to 2012. Data were primarily collected from coastal waters in the northern hemisphere, predominately during the summer months, with the notable exceptions of long-term coastal monthly monitoring sites, that is, the studies USC MO, BATS and MOVE.
Data processing. Analyses of the data were performed using R version 3.1.1. Scripts and original data are provided at https://github.com/WeitzGroup/ Virus_Microbe_Abundance. Power-law model. A power-law regression model used the log 10 of the predictor variable, microbial abundance per ml N and the log 10 of the outcome variable, virus abundance per ml V. The power-law regression was calculated using the equation log 10 V = α 0 + α 1 log 10 N. The α 0 and α 1 parameters were fit via ordinary least squares (OLS) regression to minimize the sum of square error.
Constrained variable-intercept model. The constrained model is a 'mixed-effects' regression model using the same predictor and outcome variables, log 10 of microbial abundance per ml and the log 10 virus abundance per ml, respectively. This model includes study-specific intercepts, which were constrained such that the values for any of the intercepts were restricted to one standard error above or below the intercept value taken from the power-law model. The standard error value for this model came from the power-law model. The equation for this model is V = α (i) 0 +α 1 N, where α (i) 0 is the study-specific intercept and α 1 is the slope common to all studies, N is the predictor variable and V is the outcome variable.
Variable slope and variable intercept model. A power-law model where the exponent and intercept varied with each study was evaluated using the same predictor variable, log 10 microbial abundance per ml, and the same outcome variable, log 10 virus abundance per ml. In this model, there was a study-specific α 0 and α 1 and an OLS regression calculated using the equation V = α (i) 0 +α (i) 1 N.
Bootstrapping model CIs. Bootstrap analyses of the power-law model and mixedeffects models were conducted to derive 95% CIs surrounding the parameters estimated by the models. For all models the original data set was sampled with replacement, by study, to arrive at a bootstrap sample data set; this process was repeated 10,000 times. Distributions for all parameters were generated and the 2.5, 50 and 97.5% points were identified from among the 10,000 parameter estimates.
Outlier identification. Outliers in the data were identified by calculating the top and bottom 2% of estimated VMRs amongst the entire 5,508 samples. The outliers corresponded to ratios below 2.72 and above 131.7. Those samples with VMRs that fell outside these bounds were considered outliers. There were 192 outlier samples taken at depths of ≤100 m and 30 outlier samples taken at depths of >100 m.
Depth cutoff robustness. The cutoff point for which data were partitioned into either the near-surface or the sub-surface was varied from 50 m to 150 m in 1 m increments. For each step, a power-law model was evaluated for both the nearsurface and the sub-surface.