Plot DGB Distribution and Observed Data

Creates a graphical representation of the DGB Distribution (Mansilla et al. (2007) doi:10.1016/j.joi.2007.01.001 ) model. It supports both linear and nls fits done in BC_param. Requires a function generated by BC_model.

Usage

BC_plot(
  df_abundance = NULL,
  column = NULL,
  BC_model_object = NULL,
  obs = TRUE,
  obs_shape = 16,
  obs_col = "#78a7ff",
  obs_size = 1,
  model = TRUE,
  model_col = "#000000",
  model_width = 0.5,
  model_extra = "MSE",
  confint = TRUE,
  confint_col = "#ed8666",
  confint_width = 1,
  confrange = TRUE,
  confrange_col = "#ffd078",
  gfx_alpha = 0.75,
  gfx_title = "Rank-Abundance Diagram",
  gfx_label = TRUE,
  gfx_label_coords = NULL,
  gfx_xy_trans = c("identity", "log10"),
  gfx_theme = ggplot2::theme_gray(),
  plot_silent = FALSE,
  ...
)

Arguments

df_abundance: A data frame that contains abundance data.
column: Either a string with the name of the column or the number of the column that stores the abundances in the data frame.
BC_model_object: Optional. A previous object generated by BC_model.
obs: Logical. Whether to plot the observed abundance data. Defaults to true.
obs_shape: Numerical. The shape of the plotted observed abundance data.
obs_col: The color for the observations.
obs_size: Numeric. The size for the observations.
model: Logical. Whether to show the models predicted data. Defaults to true.
model_col: Specify a color for the model.
model_width: Numeric. Changes the width of the lines to use for the model.
model_extra: String. Has to be one of: "MSE" (Mean Square Error),"S" (Standard error of the Estimate),"R2". Defaults to "MSE".
confint: Logical. Whether to add the confidence interval lines. Defaults to true.
confint_col: Specify a color for the confidence interval lines.
confint_width: Numeric. Changes the width of the confidence interval lines.
confrange: Logical. Whether to shade the area in the confidence interval. Defaults to true.
confrange_col: Specify a color to use for the confidence interval shading.
gfx_alpha: Numeric. Modifies all the graphed objects alpha. Default=0.75.
gfx_title: String. Changes the title of the graph.
gfx_label: Logical. Whether to show the parameters used and model_extra info.
gfx_label_coords: A vector that provides custom x and y values to move the label.
gfx_xy_trans: A vector with 2 strings that define the ggplot2 transformations to be applied to the x and y scales. Defaults to c("identity","log10").
gfx_theme: Provide a ggplot2 theme function to use. Defaults to theme_gray().
plot_silent: Logical. Whether to print to console the output list and plot the graph.
...: passes arguments to BC_model.

Value

A list with the following elements: The input data frame with added processed ranking data, model data and confidence interval data, the adjusted parameters, the confidence interval of the parameters, the linear model, a summary of the model, a generated function for use with raw numeric data and a ggplot2 object that shows the DGBD distribution and observed data, a model_extra vector with 2 elements, model_extra name and value.

Examples

plotWeblinks <- BC_plot(Weblinks, column=2,
 rank_threshold=4,confint=FALSE,confrange=FALSE,plot_silent=TRUE)
head(plotWeblinks[[1]])
#>   pre_numerator pre_denominator       lwr predicted_values       upr BC_rank
#> 1          5306               3  51485548         52114228  52750584       3
#> 2          5308               1 564836866        573002863 581286917       1
#> 3          5307               2 124625256        126250120 127896169       2
#> 4          5305               4  27497261         27816930  28140315       4
#> 5          5304               5  16904524         17093390  17284366       5
#> 6          5303               6  11359769         11482490  11606537       6
#>   degree abundance
#> 1      0  35159835
#> 2      1 106649769
#> 3      2  40711748
#> 4      3  22648832
#> 5      4  12617832
#> 6      5   8188854
plotWeblinks[2:8]
#> [[1]]
#>             A             a             b 
#>  7.357629e+08  2.182266e+00 -2.914983e-02 
#> 
#> [[2]]
#>                     2.5 %        97.5 %
#> (Intercept)  7.152981e+08  7.568132e+08
#> log_den      2.184312e+00  2.180219e+00
#> log_num     -3.119669e-02 -2.710297e-02
#> 
#> [[3]]
#> 
#> Call:
#> stats::lm(formula = log_abundance ~ log_den + log_num)
#> 
#> Coefficients:
#> (Intercept)      log_den      log_num  
#>    20.41642     -2.18227     -0.02915  
#> 
#> 
#> [[4]]
#> 
#> Call:
#> stats::lm(formula = log_abundance ~ log_den + log_num)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.68134 -0.03974 -0.00297  0.03298  0.16958 
#> 
#> Coefficients:
#>              Estimate Std. Error  t value Pr(>|t|)    
#> (Intercept) 20.416419   0.014389  1418.88   <2e-16 ***
#> log_den     -2.182266   0.001044 -2090.10   <2e-16 ***
#> log_num     -0.029150   0.001044   -27.92   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.05765 on 5305 degrees of freedom
#> Multiple R-squared:  0.9993,	Adjusted R-squared:  0.9993 
#> F-statistic: 3.706e+06 on 2 and 5305 DF,  p-value: < 2.2e-16
#> 
#> 
#> [[5]]
#> function (rank) 
#> {
#>     params["A"] * (max(t_frame[, "BC_rank"]) + 1 - rank)^params["b"]/(rank^params["a"])
#> }
#> <bytecode: 0x55894afcfc20>
#> <environment: 0x558959c5c100>
#> 
#> [[6]]

#> 
#> [[7]]
#> [1] "MSE"              "42417404085421.2"
#> 

plothmp_wgs <- BC_plot(DGBD::hmp_wgs,2,obs=FALSE,plot_silent=TRUE)
head(plothmp_wgs[[1]])
#>   pre_numerator pre_denominator          lwr predicted_values          upr
#> 1            12             189   0.00191929       0.00255945   0.00341313
#> 2            91             110  67.60535356      79.64205011  93.82180274
#> 3            52             149   4.28061625       5.03017084   5.91097571
#> 4           126              75 296.89124035     349.15112822 410.60999372
#> 5           199               2 228.90175484     409.49230806 732.55860566
#> 6           160              41 715.69574563     845.63422804 999.16375359
#>   BC_rank    PID    abundance
#> 1     189 M00171 8.052846e-03
#> 2     110 M00175 8.754249e+01
#> 3     149 M00174 1.134126e+01
#> 4      75 M00170 1.973639e+02
#> 5       2 M00178 1.193010e+03
#> 6      41 M00359 6.899719e+02
plothmp_wgs[2:8]
#> [[1]]
#>             A             a             b 
#>  2.018236e-10 -6.208510e-01  5.272359e+00 
#> 
#> [[2]]
#>                     2.5 %        97.5 %
#> (Intercept)  4.591541e-11  8.871265e-10
#> log_den     -4.345076e-01 -8.071944e-01
#> log_num      5.086016e+00  5.458703e+00
#> 
#> [[3]]
#> 
#> Call:
#> stats::lm(formula = log_abundance ~ log_den + log_num)
#> 
#> Coefficients:
#> (Intercept)      log_den      log_num  
#>    -22.3236       0.6209       5.2724  
#> 
#> 
#> [[4]]
#> 
#> Call:
#> stats::lm(formula = log_abundance ~ log_den + log_num)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -10.8372  -0.2644  -0.0805   0.2879   2.0649 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -22.32363    0.75078  -29.73  < 2e-16 ***
#> log_den       0.62085    0.09449    6.57 4.37e-10 ***
#> log_num       5.27236    0.09449   55.80  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.9312 on 197 degrees of freedom
#> Multiple R-squared:  0.9618,	Adjusted R-squared:  0.9614 
#> F-statistic:  2480 on 2 and 197 DF,  p-value: < 2.2e-16
#> 
#> 
#> [[5]]
#> function (rank) 
#> {
#>     params["A"] * (max(t_frame[, "BC_rank"]) + 1 - rank)^params["b"]/(rank^params["a"])
#> }
#> <bytecode: 0x55894afcfc20>
#> <environment: 0x558951347a30>
#> 
#> [[6]]

#> 
#> [[7]]
#> [1] "MSE"              "24311.9761699429"
#> 

plotBillionaires <- BC_plot(Billionaires, column= 2,
 gfx_xy_trans=c("log10","log10"),plot_silent=TRUE)
head(plotBillionaires[[1]])
#>   pre_numerator pre_denominator      lwr predicted_values      upr BC_rank rank
#> 1          2640               1 499598.6         509496.8 519591.1       1    1
#> 2          2639               2 299827.5         305121.5 310509.0       2    2
#> 3          2638               3 222405.7         226053.8 229761.6       3    3
#> 4          2637               4 179927.4         182719.2 185554.3       4    4
#> 5          2636               5 152648.8         154912.8 157210.3       5    5
#> 6          2635               6 133459.1         135363.9 137295.9       6    6
#>   abundance
#> 1    211000
#> 2    180000
#> 3    114000
#> 4    107000
#> 5    106000
#> 6    104000
plotBillionaires[2:8]
#> [[1]]
#>            A            a            b 
#> 1.946909e+05 7.396223e-01 1.221052e-01 
#> 
#> [[2]]
#>                    2.5 %       97.5 %
#> (Intercept) 1.872674e+05 2.024087e+05
#> log_den     7.427259e-01 7.365187e-01
#> log_num     1.190017e-01 1.252088e-01
#> 
#> [[3]]
#> 
#> Call:
#> stats::lm(formula = log_abundance ~ log_den + log_num)
#> 
#> Coefficients:
#> (Intercept)      log_den      log_num  
#>     12.1792      -0.7396       0.1221  
#> 
#> 
#> [[4]]
#> 
#> Call:
#> stats::lm(formula = log_abundance ~ log_den + log_num)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.88157 -0.03868  0.00415  0.03921  0.55573 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 12.179169   0.019826  614.31   <2e-16 ***
#> log_den     -0.739622   0.001583 -467.30   <2e-16 ***
#> log_num      0.122105   0.001583   77.15   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.06127 on 2637 degrees of freedom
#> Multiple R-squared:  0.9944,	Adjusted R-squared:  0.9944 
#> F-statistic: 2.355e+05 on 2 and 2637 DF,  p-value: < 2.2e-16
#> 
#> 
#> [[5]]
#> function (rank) 
#> {
#>     params["A"] * (max(t_frame[, "BC_rank"]) + 1 - rank)^params["b"]/(rank^params["a"])
#> }
#> <bytecode: 0x55894afcfc20>
#> <environment: 0x55895b2c7418>
#> 
#> [[6]]

#> 
#> [[7]]
#> [1] "MSE"              "48507852.7641689"
#>