A CATEGORICAL MODELLING APPROACH TO SITE AND GROWTH OF EUCALYPTUS STANDS IN BRAZILIAN EASTERN AMAZON

Theoretical framework: Site index cannot be generalized to different eucalyptus clonal stands, since each clone has a distinct growth and yield pattern, in which categorical variables may add site-specific effects to assess model's interregional variability. Objective : This study aimed to assess the statistical performance of site index, as well as growth and yield models in different configurations adding categorical variables. Method : The study was carried out in eucalyptus stands in Eastern Brazilian Amazon with three clones of different ages and a different number of trees. Traditional Schumacher’s site model was fitted with the addition of categorical clone variable. Beck-Della Bianca’s model was fitted by ordinary least squares (OLS) and two-stage least squares (2SLS), adding dominant height as site variable and including clone variable. Results and discussion : Schumacher’s clone model presented lower standard estimate error (9.50%) and higher adjusted coefficient of determination (0.61), correcting the lack of normality and homoscedasticity. 2SLS was more accurate than OLS for Beck-Della Bianca’s model. This model validation resulted in root-mean-squared error of 2.82% and bias of 0.03%. Research implications: Additive and multiplicative effects on site index resulted in polymorphism. Clone variable provided more parsimonious and accurate models to estimate site index and forest growth and yield, in which 2SLS was recommended for forest prognosis.


INTRODUCTION
Brazil is a forest production highlight due to its favorable environmental conditions and advanced forest management.Eucalyptus species are the most planted by forest companies, due to their adaptability to the environment (Castro et al., 2016;Colodette et al., 2014).However, appropriate growth and yield models for eucalyptus management are still needed (Carrijo et al., 2019), since this genus has a vast diversity of species and clones, which are constantly being update (De Assis et al., 2015;Ferreira et al., 2017;Freitas et al., 2020).In addition, the application of growth and yield models in pure forest plantations, it is necessary to know variables such as age, number of trees, and site index (Skovsgaard & Vanclay, 2008).
Despite the importance of the site index as a predictor in growth and yield models, it is common that site models present moderate statistical metrics (determination coefficient < 0.7 and standard estimative error > 10%) (Bueis et al., 2016;Magalhães, 2017;M. Wang et al., 2008;Q. Wang et al., 2007;Zhu et al., 2019).The accuracy of these models is affected by the use of site index fitted only by age (Socha et al., 2016;Yue et al., 2014).Therefore, other variables related with forest growth and yield are essential to improve the models' accuracy, such as soil nutrients, climate, species, topographic, and genetic materials (Bila et al., 2012;Skovsgaard & Vanclay, 2008;Q. Wang et al., 2007;Zhu et al., 2019).Another alternative is to replace the site index with the dominant height, in order to ensure that plots of a same age have different productivity values, which decreases the residual sum of squares (Leite & Andrade, 2003).3 The fitted method of growth and yield models is also related to the accuracy of estimates, since two variables -volume and basal area-are estimated, in which the ordinary least squares (OLS) may not be adequate if compared to simultaneous methods.In addition, there are modeling methods that add site-specific effects to assess the model's interregional variability (Fu et al., 2012;M. Wang et al., 2008).One form is the use of qualitative variables, quantified at 0 or 1, that indicate the presence or absence of an attribute, known as categorical, binary, or dummy variable (Anderson, 1984;Li et al., 2020;Suits, 1957;Zeng et al., 2017).
This study aimed to evaluate the statistical performance of site index as well as growth and yield models in different configurations by adding categorical variables.The following hypotheses were considered: a) eucalyptus clones as a categorical variable in site models provide accurate estimates, b) accurate site estimates can improve the growth and yield prognosis, c) dominant height per plot in replacement of site index allows more accurate growth and yield prognosis, and d) simultaneous fit provides higher accuracy of growth and yield estimates than the ordinary least squares method.

Study Area Characterization
The study was carried out in commercial clonal eucalyptus stands located in the State of Pará, Eastern Brazilian Amazon, at coordinates 3°21'00" S e 47°8'00" W. The region's climate is Aw (Köppen), with average annual rainfall of 1,800 mm and average annual temperature of 26 °C (Alvares et al., 2013).
Data from 39 permanent plots and 301 temporary plots with an area of 504 m² (21 m x 24 m) were used.Permanent plots were created in 2016 and measured in 2018, while temporary plots were sampled in 2018.The circumference at 1.3 m above the ground (CBH) and the total height of trees were measured by means of a diameter tape and a Haglöf digital clinometer, respectively.The trees volume was obtained with volume equations for each clone (Table 1), then the plot volume was calculated and converted to volume per hectares.

Fitting Site Index Models
The guide-curve method was applied to fit two configurations of Schumacher's site model (Schumacher, 1939) as in traditional form (C)equation 1and by adding a categorical clone variable (CL)equation 2with additive and multiplicative effects.The additive effect was the insertion of a new dummy variable into the model, while the multiplicative effect tested the interaction between clone variable and plot age.(1) Schumacher CL: Where: ℎ  is dominant height,  is age (years),   is clonal group, and   is regression coefficient.
The statistical metrics of adjusted coefficient of determination (R 2 adj.), standard estimative error (SEE%), Akaike information criterion (Akaike, 1974), and graphical analysis of residuals were applied to choose the best site index model.Also, normality and heteroscedasticity of residues were assessed by the Kolmogorov-Smirnov's and Goldfeld-Quandt's tests at the 5% significance level.The site models were fitted using data from 70% of plots, while 30% were applied for validation with the statistics root-mean-square error (RMSE) and Chi-square test at the 5% significance level.

Development of Site Index Curves
Site index curves were developed by the guide-curve method, considering three site classes.Class I plots were considered high productivity, reaching the asymptotic point earlier (stabilization of dominant height curve), while the other site classes represented moderate (II) and low (III) productivities.It should be noted that the traditional model (C) provides only one guide-curve, while the clonal model (CL) results in three guide-curves, with one for each clonal group (C1, C2, and C3).We highlight that the reference age was not standardized due to the different growth patterns of clones.

Fitting Growth and Yield Models
We chose the Beck-Della Bianca's growth and yield model, because it is a compatible model in volume and basal area estimates, and it is the Clutter's model with few modifications (Beck & Della-Bianca, 1972).Beck-Della Bianca's model was fitted for the permanent plots database using two fit methods: ordinary least squares (OLS) and simultaneous fit by two-stage ordinary least squares (2SLS).OLS did not consider a basal area model in the volume estimates, while 2SLS allowed the volume and basal area to be estimated simultaneously, considering the variables' error terms were correlated.In 2SLS, the future basal area ( 2 ) was fitted from the present basal area ( 1 ) and assumed as an independent variable for volume estimation ( 2 ).OLS: Where: 2 is future volume (m³ ha -1 ),  1 is present age (years),  2 is future age (years),  1 is present basal area (m² ha -1 ),  2 is future basal area (m² ha -1 ),  is site index conception,   is volume regression coefficients, and   is basal area regression coefficients.
Different site index conceptions ( ) were considered to fit Beck-Della Bianca's model, such as site index by traditional Schumacher's model (), site index by Schumacher's model with categorical clone variable () and dominant height value at reference age (ℎ  ).Also, the additive effect was added as a dummy variable, while the interaction between clone variable with site index conception and future basal area were evaluated by multiplicative effects.Thus, three site conception and six model configurations were tested, which were assessed using two fit methods that resulted in twelve Beck-Della Bianca's model configurations (Table 2 and Table 3).

A1
GroupC GroupC: Note:  is clonal group, colon punctuation (:) is multiplicative effect between variables, ℎ  is dominant height (m) at reference age,  2 is future basal area (m² ha -1 ),  is site index for all clones, and  is site index for each clone.Source: Prepared by the authors.
Source: Prepared by the authors.
The statistical metrics used for site models were also applied for OLS growth and yield models, while McElroy R 2 was implemented as a measure of good-of-fit of the volume and basal area system-equation fitted by 2SLS.Graphical analysis of volume and basal area residuals was additionally used to assess the 2SLS growth and yield models. )) (5) 7 Where: ⊗ is Kronecker product,  ̂ is fitted vector of error terms,  is vector of error terms, Ω ̂ is matrix of the terms,  is number of observations in each model,   is identity matrix of T dimension,  is vector of dependent variable, and ∑ ̂−1 is estimated covariance matrix of error terms.
Bootstrap resampling for regression was applied to validate the best growth and yield model, in which the data were resampled 1,000 times using R version 3.6.1 (R Core Team, 2019).In the bootstrap process, a random sample with replacement and an identical number of elements of original data are obtained and repeated  times (Efron, 1979;Stępień, 2016).This resampling method is recommended for data with few samples.
Volume variable (m³ ha -1 ) was determined for each sub-population (resample) based on the chosen model.Histograms of volume, confidence interval, McElroy R 2 , standard estimative error and bootstrap error were also performed to evaluate their distributions.Therefore, growth and yield volume curves were developed based on the chosen model.CAI (current annual increment), MAI (mean annual increment) and volume production curves considered three classes of site index conception used in the model.

RESULTS
Mean and median values and distributions of dominant height (ℎ  ) and volume were different for each clone and general data (Figure 1).ℎ  showed symmetric distribution for each dataset, with extreme values in C3 and general data, although with similar amplitude for all datasets (Figure 1a).In contrast, volume showed asymmetric distribution for C1, C2, and general data, with extreme values in C1 and C3 (Figure 1b).Although the Schumacher's model with categorical clone variable (CL) has more regression coefficients (  ), it presented more accurate estimates of dominant height (Table 2), with less standard estimate error (SEE) and higher adjusted coefficient of determination (R²adj).Traditional Schumacher's model © did not show normality of residues and homoscedasticity by Kolmogorov-Smirnov (KS) and Goldfelf-Quandt tests (GQ), in which the categorical clone variable allowed these assumptions of linear regression to be satisfied.
Additionally, CL model presented better statistical quality, considering the lower value of Akaike Information Criterion (Table 4).In the validation, CL model also showed superiority in relation to C model, with root-mean-square error (RMSE) of 10.59% and Bias (%) equal to 0.77%.However, both models do not show statistical differences between observed and estimated dominant heights by the Chi-square test (Table 4).Regression coefficients of categorical clone variable were statistically significant at the 5% level for additive ( 0 to  2 ) and multiplicative ( 3 to  5 ) effects (Table 4), in which each clone curve represents a different level and slope in a polymorphic behavior.The residual distributions showed that the traditional Schumacher's model (C) was not able to estimate dominant height values greater than 22 m (Figure 2), while the clonal model (CL) resulted in better residual dispersion and homoscedasticity.The categorical clone variable allowed a single model to represent three guide curves.The site index showed different values (Table 5) and curves for each clone (Figure 3).Beck-Della Bianca's model fitted by 2SLS method was more accurate than by OLS.In addition, the model with ℎ  was also more accurate than  and  site index conception.For ℎ  as site index conception, the model with clone variable (Model A1) showed higher accuracy compared to the same configuration without this variable (Model B1) (Table 6).-208.85 -68.75 -194.25 -66.98 -112.19 -31.51 -131.16 -38.83 -133.25 -42.79 -111.04 -31 The significance of regression coefficients at the 5% level showed additive effect for clone variables in Model A1, without multiplicative effect (Table 6).In contrast, the same configuration of this model estimated by OLS (A2) did not present clone variable effects.Models D1, D2, E1, and E2 with the clone variable did not show additive and multiplicative effects (Table 6).
The residuals demonstrated that the models fitted by 2SLS were more accurate than by OLS (Figure 4).These residual dispersions showed the lack of outliers and the presence of homoscedasticity in all models.The fit methods result more accurate were models A and B, however the volume estimated from 2SLS was more accurate than basal area estimate.Model A1 was chosen and validated, presenting root-mean-square error (RMSE) of 2.82% and bias equal to 0.03%.Estimates for the model A1 were statistically similar to those observed in bootstrap resampling by Chi-square test ( 2 = 4,167.95ns ).McElroy R² value of 0.97 and standard estimative error (SEE%) of 3.41% showed that the model A1 was accurate, with low bootstrap error (2.72%) and bootstrap mean (135.98 m³ ha-1) consistent with general mean.In addition, bootstrap volume showed similar distribution of volume estimated by model A1 (Figure 5a), with confidence interval between -3 and 3 deviations (Figure 5b).McElroy R² presented values between 0.92 and 0.99 (Figure 5c), while the standard estimative error ranged between 1.8% and 5% (Figure 5d).The maximum current annual increment (CAI) for all clones was observed at the second year, in which C3 had the highest production (Figure 6).At this same age, C1 presented 36.14 m³ ha -1 , while C2 and C3 showed 33.78 m³ ha -1 and 55.89 m³ ha -1 , respectively.The maximum mean annual increment (MAI) was obtained at the third year, with 25.52 m³ ha -1 for C1, 23.83 m³ ha -1 for C2, and 39.43 m³ ha -1 for C3.We observed technical cutoff point at 3.5 years-old, when volume growth and yield curves were projected up to the tenth year (Figure 6).In addition, different growth and yield patterns were obtained by eucalyptus clone through the configuration A of Beck-Della Bianca's model (Figure 6).

DISCUSSIONS
The addition of categorical variables often results in models with better statistical parameters and increased accuracy (Gonçalves et al., 2018;Zeng et al., 2011).Adding categorical clone variable improves the accuracy of site model to estimate dominant heights (Table 4 and Figure 2), in which only the covariate age was not able to estimate different site values for a stand age.Since genotype is a factor that also influences the tree growth and yield (Adams et al., 2006;Buford & Burkhart, 1987;Socha et al., 2016), categorical clone variable can be used without restriction to estimate site productivity.
Other studies obtained higher accuracy by adding variables in site models, in which qualitative variables were indicated for this purpose, since they are easily applied and directly include site variability in regression models (Magalhães, 2017).Wang et al. (2007) added categorical species variable in site index models and obtained R 2 adj. of 0.60, corroborating the results of this study.Also, Wang et al. (2008) developed an accurate model to estimate tree dominant height adding plot variables.
The accuracy of categorical models in Li et al. (2020) study was between 24% to 33% greater than traditional models, in which dummy variables resulted in RMSE reduction of 9% for biomass estimation.Also, Gonçalves et al. ( 2018) observed that dummy models increased R 2 adj. in 5.3%, in which lowest bias and highest accuracy of biomass estimates were supported by the categorical models.These results denote higher consistency by adding categorical variables compared to traditional models (Fu et al., 2012;M. Wang et al., 2008).
Additive and multiplicative effects in site model (Figure 2) showed the influence on the levels ( 0 ) and site curves slopes ( 1 ), which resulted in specific guide-curves for clone and polymorphic behavior.Polymorphism reflects the introduction of variables in the model (climate, topography, soil, and genotype, for example) in such a way that polymorphic models are more flexible to site variability (Cieszewski & Strub, 2000;Curt et al., 2001;Zhu et al., 2019).Polymorphic curves did not show the same pattern for site index (Clutter et al., 1983;Machado, 1980), in which site curves were more convex to site I, while for sites II and III they showed sigmoid form (Figure 3).In addition, polymorphic curves have a greater ability to model the real growth pattern for different sites, resulting in less biased estimates for site class (Schuchovski et al., 2019).These results were also observed by Zeng et al. (2011) andFu et al. (2012) for Pinus massoniana D. Don.
A general site index indicates a general reference age of 5 years, which did not occur when the analysis per clone was performed.The reference ages for C1 and C2 were 6 years, while for C3 it was 7 year (Figure 3).Therefore, a general site index implies in reducing the stand yield, while the clonal site index conception makes it possible to estimate the maximum yield for each clone (Table 5).
Beck-Della Bianca's model fitted by 2SLS method was more accurate than by OLS (Table 6 and Figure 4).Fitting growth and yield models by 2SLS is a simple method that can be used in forest management, since it is a technique on reduce-form equations from OLS, which allows to control the endogeneity and simultaneous equation bias (Parajuli et al., 2016).2SLS improves models' accuracy in relation to OLS, in which the simultaneous fitting is indicated for interrelated variables (Zeng et al., 2017) and considers the correlation between the system of equations' error (Ritchie & Hamann, 2008).
Other studies also presented successful results using the simultaneous fitting, such as for biomass estimation (Sanquetta et al. 2015, Coutinho et al. 2018, Sanquetta et al. 2019).The main conclusion is that the simultaneous models are thriftier and more consistent, although they considered the additivity of biomass compartments.Also, the 2SLS method is optimal to growth and yield, because it uses a time-series data as the continuous forest inventory (Parajuli et al., 2016).
Beck-Della Bianca's model A1 resulted in volume curves with the same levels for clones, however, with different curve slopes (Figure 6).The three genetic materials had the same initial growth, although the growth rate is different, and these results became feasible by adding the categorical clone variable (Figure 6).Therefore, the inclusion of these variable was effective, since genetic effects in a growth and yield model should indicate the species variability in the site (Adams et al., 2006).
In addition, the dominant height is a site index conception that provided the most accurate configuration of Beck-Della Bianca's model to represent growth and yield (Table 4 and Figure 6).The reason for this outcome is the correlation between site index residuals and stand age, which implies systematic errors in site index and results in biased estimates of site and growth and yield (Socha et al., 2016).Thus, dominant height can remove this bias and increase the accuracy of Beck-Della Bianca's model.

CONCLUSIONS
Categorical clone variable provides more accurate site index and growth and yield estimates for eucalyptus stands, which results in parsimonious models that satisfy the assumptions of normality of residues and homoscedasticity of linear regression.
The most recommended configuration of Beck-Della Bianca's model for modeling eucalyptus growth and yield is the two-stage least-square fit method with dominant height as the site productivity variable.

Figure 1 .
Figure 1.Distribution of dominant height and volume of Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.Source: Prepared by the authors.

Figure 2 .
Figure 2. Residuals dispersion of site index model and their guide-curves for Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.Source: Prepared by the authors.

Figure 3 .
Figure 3. Site curves of Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.Source: Prepared by the authors.

Figure 4 .
Figure 4. Residuals of Beck-Della Bianca's model configurations for volume prognosis of Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.Source: Prepared by the authors.

Figure 5 .
Figure 5. Bootstrap validation for Beck-Della Bianca's model A configuration of Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.Source: Prepared by the authors.

Figure 6 .
Figure 6.Growth and yield volume curves by dominant height class (site index) of Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.Source: Prepared by the authors.

Table 1 .
Volume equations to Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.

Table 2 .
Modified variables in the Beck-Della Bianca's model fitted to Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.

Table 3 .
Beck-Della Bianca's model configurations fitted to Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.

Table 4 .
Schumacher's site model fitted to Eucalyptus spp.Clonal stands in Eastern Brazilian Amazon.

Table 5 .
Site index by stratification of Eucalyptus spp.clonal stands in Eastern Brazilian Amazon.Prepared by the authors.Class I is site index for highest productivity class, Class II is site index for moderate productivity class, Class III is site index for lowest productivity class, and ℎ  is dominant height on reference age (m).
Dataset Class I Class II Class III Class Interval Reference age (years)

Table 6 .
Beck-Della Bianca's model configurations to Eucalyptus spp.clonal stands in Eastern Brazilian R² is coefficient of determination (McElroy R² for 2SLS and R²adj.for OLS), SEE is standard estimative error, AIC is Akaike information criterion,   is regression coefficient for volume model;   is regression coefficient for basal area model, ns is non-significance, and * is significance at the 5% level.Source: Prepared by the authors.