Title: | OLS, Moderated, Logistic, and Count Regressions Made Simple |
---|---|
Description: | Provides SPSS- and SAS-like output for least squares multiple regression, logistic regression, and count variable regressions. Detailed output is also provided for OLS moderated regression, interaction plots, and Johnson-Neyman regions of significance. The output includes standardized coefficients, partial and semi-partial correlations, collinearity diagnostics, plots of residuals, and detailed information about simple slopes for interactions. The output for some functions includes Bayes Factors and, if requested, regression coefficients from Bayesian Markov Chain Monte Carlo analyses. There are numerous options for model plots. The REGIONS_OF_SIGNIFICANCE function also provides Johnson-Neyman regions of significance and plots of interactions for both lm and lme models. |
Authors: | Brian P. O'Connor [aut, cre] |
Maintainer: | Brian P. O'Connor <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.1 |
Built: | 2025-02-11 03:12:57 UTC |
Source: | https://github.com/cran/SIMPLE.REGRESSION |
Provides SPSS- and SAS-like output for least squares multiple regression,
logistic regression, and count variable regressions. Detailed output is also provided for
OLS moderated regression, interaction plots, and Johnson-Neyman
regions of significance. The output includes standardized
coefficients, partial and semi-partial correlations, collinearity diagnostics,
plots of residuals, and detailed information about simple slopes for interactions.
The output for some functions includes Bayes Factors and, if requested,
regression coefficients from Bayesian Markov Chain Monte Carlo (MCMC) analyses.
There are numerous options for model plots.
The REGIONS_OF_SIGNIFICANCE function also provides
Johnson-Neyman regions of significance and plots of interactions for both lm
and lme models (lme models are from the nlme package).
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel
regression: Inferential and graphical techniques. Multivariate Behavioral
Research, 40(3), 373-400.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied
multiple regression/correlation analysis for the behavioral sciences (3rd ed.).
Lawrence Erlbaum Associates.
Darlington, R. B., & Hayes, A. F. (2017). Regression analysis and linear models:
Concepts, applications, and implementation. Guilford Press.
Dunn, P. K., & Smyth, G. K. (2018). Generalized linear models
with examples in R. Springer.
Hayes, A. F. (2018a). Introduction to mediation, moderation, and conditional
process analysis: A regression-based approach (2nd ed.). Guilford Press.
Huitema, B. (2011). The analysis of covariance and alternatives: Statistical
methods for experiments, quasi-experiments, and single-case studies. John Wiley & Sons.
Johnson, P. O., & Fey, L. C. (1950). The Johnson-Neyman technique, its theory, and
application. Psychometrika, 15, 349-367.
Lorah, J. A. & Wong, Y. J. (2018). Contemporary applications of moderation
analysis in counseling psychology. Counseling Psychology, 65(5), 629-640.
Orme, J. G., & Combs-Orme, T. (2009). Multiple regression with discrete
dependent variables. Oxford University Press.
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation
and prediction. (3rd ed.). Wadsworth Thomson Learning.
Provides SPSS- and SAS-like output for count data regression, including Poisson, quasi-Poisson, negative binomial, zero-inflated poisson, and zero-inflated negative binomial models. The output includes model summaries, classification tables, omnibus tests of the model coefficients, overdispersion tests, model effect sizes, the model coefficients, correlation matrix for the model coefficients, collinearity statistics, and casewise regression diagnostics.
COUNT_REGRESSION(data, DV, forced = NULL, hierarchical = NULL, family = 'poisson', offset = NULL, plot_type = 'residuals', CI_level = 95, MCMC = FALSE, Nsamples = 4000, verbose = TRUE )
COUNT_REGRESSION(data, DV, forced = NULL, hierarchical = NULL, family = 'poisson', offset = NULL, plot_type = 'residuals', CI_level = 95, MCMC = FALSE, Nsamples = 4000, verbose = TRUE )
data |
A dataframe where the rows are cases and the columns are the variables. |
DV |
The name of the dependent variable.
|
forced |
(optional) A vector of the names of the predictor variables for a forced/simultaneous
entry regression. The variables can be numeric or factors.
|
hierarchical |
(optional) A list with the names of the predictor variables for each step of a
hierarchical regression. The variables can be numeric or factors.
|
family |
(optional) The name of the error distribution to be used in the model. The options are:
Example: family = 'quasipoisson' |
offset |
(optional) The name of the offset variable, if there is one. This variable
should be in the desired metric (e.g., log). No transformation of an
offset variable is performed internally.
|
plot_type |
(optional) The kind of plots, if any. The options are:
Example: plot_type = 'diagnostics' |
CI_level |
(optional) The confidence interval for the output, in whole numbers. The default is 95. |
MCMC |
(logical) Should Bayesian MCMC analyses be conducted? The default is FALSE. |
Nsamples |
(optional) The number of samples for MCMC analyses. The default is 10000. |
verbose |
(optional) Should detailed results be displayed in console? |
This function uses the glm function from the stats package, and the negative.binomial function from the MASS package, and supplements the output with additional statistics and in formats that resembles SPSS and SAS output. The predictor variables can be numeric or factors.
The analyses for the zero-inflated poisson and zero-inflated negative binomial analyses are conducted using the pscl package (Zeileis, Kleiber, & Jackman, 2008).
Predicted values, for selected levels of the predictor variables, can be produced and plotted using the PLOT_MODEL funtion in this package.
The Bayesian MCMC analyses can be time-consuming for larger datasets. The MCMC analyses are conducted using functions, and their default settings, from the rstanarm package (Goodrich, Gabry, Ali, & Brilleman, 2024). Family = 'quasibinomial' analyses are currently not possible for the MCMC analyses. family = 'binomial' is therefore used instead. The Bayesian MCMC analyses are also currently not available for zero-inflated poisson and zero-inflated negative binomial models.
The MCMC results can be verified using the model checking functions in the rstanarm package (e.g., Muth, Oravecz, & Gabry, 2018).
Good sources for interpreting count data regression residuals and diagnostics plots:
An object of class "COUNT_REGRESSION". The object is a list containing the following possible components:
modelMAIN |
All of the glm function output for the regression model. |
modelMAINsum |
All of the summary.glm function output for the regression model. |
modeldata |
All of the predictor and outcome raw data that were used in the model, along with regression diagnostic statistics for each case. |
collin_diags |
Collinearity diagnostic coefficients for models without interaction terms. |
Brian P. O'Connor
Atkins, D. C., & Gallop, R. J. (2007). Rethinking how family researchers
model infrequent outcomes: A tutorial on count regression and zero-inflated
models. Journal of Family Psychology, 21(4), 726-735.
Beaujean, A. A., & Grant, M. B. (2019). Tutorial on using regression
models with count outcomes using R. Practical Assessment,
Research, and Evaluation: Vol. 21, Article 2.
Coxe, S., West, S.G., & Aiken, L.S. (2009). The analysis of count data:
A gentle introduction to Poisson regression and its alternatives.
Journal of Personality Assessment, 91, 121-136.
Dunn, P. K., & Smyth, G. K. (2018). Generalized linear models
with examples in R. Springer.
Hardin, J. W., & Hilbe, J. M. (2007). Generalized linear models
and extensions. Stata Press.
Muth, C., Oravecz, Z., & Gabry, J. (2018). User-friendly Bayesian regression
modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods
for Psychology, 14(2), 99119.
https://doi.org/10.20982/tqmp.14.2.p099
Orme, J. G., & Combs-Orme, T. (2009). Multiple regression with discrete
dependent variables. Oxford University Press.
Rindskopf, D. (2023). Generalized linear models. In H. Cooper, M. N.
Coutanche, L. M. McMullen, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.),
APA handbook of research methods in psychology: Data analysis and
research publication, (2nd ed., pp. 201-218). American Psychological Association.
Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression Models for Count Data in R.
Journal of Statistical Software, 27(8). https://www.jstatsoft.org/v27/i08/.
COUNT_REGRESSION(data=data_Kremelburg_2011, DV='OVRJOYED', forced=c('AGE','EDUC','REALRINC','SEX_factor')) COUNT_REGRESSION(data=data_Kremelburg_2011, DV='OVRJOYED', forced=c('AGE','EDUC','REALRINC','SEX_factor'), family = 'negbin') # negative binomial regression COUNT_REGRESSION(data=data_Kremelburg_2011, DV='HURTATWK', forced=c('AGE','EDUC','REALRINC','SEX_factor'), family = 'negbin', plot_type = 'diagnostics') # with an offset variable COUNT_REGRESSION(data=data_Orme_2009_5, DV='NumberAdopted', forced=c('Married'), offset='lnYearsFostered') # zero-inflated poisson regression COUNT_REGRESSION(data=data_Kremelburg_2011, DV='HURTATWK', forced=c('AGE','EDUC','REALRINC','SEX_factor'), family = 'zinfl_poisson', plot_type = 'diagnostics') # zero-inflated negative binomial regression COUNT_REGRESSION(data=data_Kremelburg_2011, DV='HURTATWK', forced=c('AGE','EDUC','REALRINC','SEX_factor'), family = 'zinfl_negbin', plot_type = 'diagnostics')
COUNT_REGRESSION(data=data_Kremelburg_2011, DV='OVRJOYED', forced=c('AGE','EDUC','REALRINC','SEX_factor')) COUNT_REGRESSION(data=data_Kremelburg_2011, DV='OVRJOYED', forced=c('AGE','EDUC','REALRINC','SEX_factor'), family = 'negbin') # negative binomial regression COUNT_REGRESSION(data=data_Kremelburg_2011, DV='HURTATWK', forced=c('AGE','EDUC','REALRINC','SEX_factor'), family = 'negbin', plot_type = 'diagnostics') # with an offset variable COUNT_REGRESSION(data=data_Orme_2009_5, DV='NumberAdopted', forced=c('Married'), offset='lnYearsFostered') # zero-inflated poisson regression COUNT_REGRESSION(data=data_Kremelburg_2011, DV='HURTATWK', forced=c('AGE','EDUC','REALRINC','SEX_factor'), family = 'zinfl_poisson', plot_type = 'diagnostics') # zero-inflated negative binomial regression COUNT_REGRESSION(data=data_Kremelburg_2011, DV='HURTATWK', forced=c('AGE','EDUC','REALRINC','SEX_factor'), family = 'zinfl_negbin', plot_type = 'diagnostics')
Multilevel moderated regression data from Bauer and Curran (2005).
data(data_Bauer_Curran_2005)
data(data_Bauer_Curran_2005)
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate Behavioral Research, 40(3), 373-400.
head(data_Bauer_Curran_2005) HSBmod <-nlme::lme(MathAch ~ Sector + CSES + CSES:Sector, data = data_Bauer_Curran_2005, random = ~1 + CSES|School, method = "ML") summary(HSBmod) REGIONS_OF_SIGNIFICANCE(model=HSBmod, plot_title='Johnson-Neyman Regions of Significance', Xaxis_label='Child SES', Yaxis_label='Slopes of School Sector on Math achievement')
head(data_Bauer_Curran_2005) HSBmod <-nlme::lme(MathAch ~ Sector + CSES + CSES:Sector, data = data_Bauer_Curran_2005, random = ~1 + CSES|School, method = "ML") summary(HSBmod) REGIONS_OF_SIGNIFICANCE(model=HSBmod, plot_title='Johnson-Neyman Regions of Significance', Xaxis_label='Child SES', Yaxis_label='Slopes of School Sector on Math achievement')
Moderated regression data used by Bodner (2016) to illustrate the tumble graphs method of plotting interactions. The data were also used by Bauer and Curran (2005).
data(data_Bodner_2016)
data(data_Bodner_2016)
Bodner, T. E. (2016). Tumble Graphs: Avoiding misleading end point
extrapolation when graphing interactions from a moderated multiple
regression analysis.
Journal of Educational and Behavioral Statistics, 41(6), 593-604.
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and
multilevel regression: Inferential and graphical techniques.
Multivariate Behavioral Research, 40(3), 373-400.
head(data_Bodner_2016) # replicates p 599 of Bodner (2016) MODERATED_REGRESSION(data=data_Bodner_2016, DV='math90', IV='Anti90', IV_range='tumble', MOD='Hyper90', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), COVARS=c('age90month','female','grade90','minority'), center = FALSE, plot_type = 'interaction')
head(data_Bodner_2016) # replicates p 599 of Bodner (2016) MODERATED_REGRESSION(data=data_Bodner_2016, DV='math90', IV='Anti90', IV_range='tumble', MOD='Hyper90', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), COVARS=c('age90month','female','grade90','minority'), center = FALSE, plot_type = 'interaction')
Moderated regression data from Chapman and Little (2016).
data(data_Chapman_Little_2016)
data(data_Chapman_Little_2016)
Chapman, D. A., & Little, B. (2016). Climate change and disasters: How framing affects justifications for giving or withholding aid to disaster victims. Social Psychological and Personality Science, 7, 13-20.
head(data_Chapman_Little_2016) # the data used by Hayes (2018, Introduction to Mediation, Moderation, and # Conditional Process Analysis: A Regression-Based Approach), replicating p. 239 MODERATED_REGRESSION(data=data_Chapman_Little_2016, DV='justify', IV='frame', IV_range='tumble', MOD='skeptic', MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'regions')
head(data_Chapman_Little_2016) # the data used by Hayes (2018, Introduction to Mediation, Moderation, and # Conditional Process Analysis: A Regression-Based Approach), replicating p. 239 MODERATED_REGRESSION(data=data_Chapman_Little_2016, DV='justify', IV='frame', IV_range='tumble', MOD='skeptic', MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'regions')
Moderated regression data for a continuous predictor and a continuous moderator from Cohen, Cohen, West, & Aiken (2003, Chapter 7).
data(data_Cohen_Aiken_West_2003_7)
data(data_Cohen_Aiken_West_2003_7)
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Lawrence Erlbaum Associates.
head(data_Cohen_Aiken_West_2003_7) # replicates p 276 of Chapter 7 of Cohen, Cohen, West, & Aiken (2003) MODERATED_REGRESSION(data=data_Cohen_Aiken_West_2003_7, DV='yendu', IV='xage', IV_range='tumble', MOD='zexer', MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = TRUE, plot_type = 'regions')
head(data_Cohen_Aiken_West_2003_7) # replicates p 276 of Chapter 7 of Cohen, Cohen, West, & Aiken (2003) MODERATED_REGRESSION(data=data_Cohen_Aiken_West_2003_7, DV='yendu', IV='xage', IV_range='tumble', MOD='zexer', MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = TRUE, plot_type = 'regions')
Moderated regression data for a continuous predictor and a categorical moderator from Cohen, Cohen, West, & Aiken (2003, Chapter 9).
data(data_Cohen_Aiken_West_2003_9)
data(data_Cohen_Aiken_West_2003_9)
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Lawrence Erlbaum Associates.
head(data_Cohen_Aiken_West_2003_9) # replicates p 376 of Chapter 9 of Cohen, Cohen, West, & Aiken (2003) MODERATED_REGRESSION(data=data_Cohen_Aiken_West_2003_9, DV='SALARY', IV='PUB', IV_range='tumble', MOD='DEPART_f', MOD_type = 'factor', MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = TRUE, plot_type = 'regions')
head(data_Cohen_Aiken_West_2003_9) # replicates p 376 of Chapter 9 of Cohen, Cohen, West, & Aiken (2003) MODERATED_REGRESSION(data=data_Cohen_Aiken_West_2003_9, DV='SALARY', IV='PUB', IV_range='tumble', MOD='DEPART_f', MOD_type = 'factor', MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = TRUE, plot_type = 'regions')
Mutiple regression data from Green and Salkind (2018).
data(data_Green_Salkind_2014)
data(data_Green_Salkind_2014)
Green, S. B., & Salkind, N. J. (2014). Lesson 34: Multiple linear regression (pp. 257-269). In, Using SPSS for Windows and Macintosh: Analyzing and understanding data. New York, NY: Pearson.
head(data_Green_Salkind_2014) # forced (simultaneous) entry; replicating the output on p. 263 OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', forced=c('quads','gluts','abdoms','arms','grip')) # hierarchical entry; replicating the output on p. 265-266 OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', hierarchical = list( step1=c('quads','gluts','abdoms'), step2=c('arms','grip')) )
head(data_Green_Salkind_2014) # forced (simultaneous) entry; replicating the output on p. 263 OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', forced=c('quads','gluts','abdoms','arms','grip')) # hierarchical entry; replicating the output on p. 265-266 OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', hierarchical = list( step1=c('quads','gluts','abdoms'), step2=c('arms','grip')) )
Logistic regression data from Halvorson et al. (2022, p. 291).
data(data_Halvorson_2022_log)
data(data_Halvorson_2022_log)
Halvorson, M. A., McCabe, C. J., Kim, D. S., Cao, X., & King, K. M. (2022). Making sense of some odd ratios: A tutorial and improvements to present practices in reporting and visualizing quantities of interest for binary and count outcome models. Psychology of Addictive Behaviors, 36(3), 284-295.
head(data_Halvorson_2022_log) log_Halvorson <- LOGISTIC_REGRESSION(data=data_Halvorson_2022_log, DV='Y', forced=c('x1','x2'), plot_type = 'diagnostics') # high & low values for x2 x2_high <- mean(data_Halvorson_2022_log$x1) + sd(data_Halvorson_2022_log$x1) x2_low <- mean(data_Halvorson_2022_log$x1) - sd(data_Halvorson_2022_log$x1) PLOT_MODEL(model = log_Halvorson, IV_focal_1 = 'x1', IV_focal_2 = 'x2', IV_focal_2_values = c(x2_low, x2_high), bootstrap=FALSE, N_sims=1000, CI_level=95, ylim = c(0, 1), xlab = 'x1', ylab = 'Expected Probability', title = 'Probability of Y by x1 and x2 for Simulated Data Example')
head(data_Halvorson_2022_log) log_Halvorson <- LOGISTIC_REGRESSION(data=data_Halvorson_2022_log, DV='Y', forced=c('x1','x2'), plot_type = 'diagnostics') # high & low values for x2 x2_high <- mean(data_Halvorson_2022_log$x1) + sd(data_Halvorson_2022_log$x1) x2_low <- mean(data_Halvorson_2022_log$x1) - sd(data_Halvorson_2022_log$x1) PLOT_MODEL(model = log_Halvorson, IV_focal_1 = 'x1', IV_focal_2 = 'x2', IV_focal_2_values = c(x2_low, x2_high), bootstrap=FALSE, N_sims=1000, CI_level=95, ylim = c(0, 1), xlab = 'x1', ylab = 'Expected Probability', title = 'Probability of Y by x1 and x2 for Simulated Data Example')
Poisson regression data from Halvorson et al. (2022, p. 293).
data(data_Halvorson_2022_pois)
data(data_Halvorson_2022_pois)
Halvorson, M. A., McCabe, C. J., Kim, D. S., Cao, X., & King, K. M. (2022). Making sense of some odd ratios: A tutorial and improvements to present practices in reporting and visualizing quantities of interest for binary and count outcome models. Psychology of Addictive Behaviors, 36(3), 284-295.
head(data_Halvorson_2022_pois) # replicating Table 3, p 293 pois_Halvorson <- COUNT_REGRESSION(data=data_Halvorson_2022_pois, DV='Neg_OH_conseqs', forced=c('Gender_factor','Positive_Urgency_new','Planning','Sensation_seeking'), plot_type = 'diagnostics') # replicating Figure 4, p 294 PLOT_MODEL(model = pois_Halvorson, IV_focal_1 = 'Positive_Urgency_new', IV_focal_2 = 'Gender_factor', bootstrap=FALSE, N_sims=1000, CI_level=95, ylim = c(0, 20), xlab = 'Positive Urgency', ylab = 'Expected Count of Alcohol Consequences', title = 'Expected Count of Alcohol Consequences by Positive Urgency and Gender')
head(data_Halvorson_2022_pois) # replicating Table 3, p 293 pois_Halvorson <- COUNT_REGRESSION(data=data_Halvorson_2022_pois, DV='Neg_OH_conseqs', forced=c('Gender_factor','Positive_Urgency_new','Planning','Sensation_seeking'), plot_type = 'diagnostics') # replicating Figure 4, p 294 PLOT_MODEL(model = pois_Halvorson, IV_focal_1 = 'Positive_Urgency_new', IV_focal_2 = 'Gender_factor', bootstrap=FALSE, N_sims=1000, CI_level=95, ylim = c(0, 20), xlab = 'Positive Urgency', ylab = 'Expected Count of Alcohol Consequences', title = 'Expected Count of Alcohol Consequences by Positive Urgency and Gender')
Moderated regression data for a continuous predictor and a dichotomous moderator from Huitema (2011, p. 253).
data(data_Huitema_2011)
data(data_Huitema_2011)
Huitema, B. (2011). The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies. Hoboken, NJ: Wiley.
head(data_Huitema_2011) # replicating results on p. 255 for the Johnson-Neyman technique for a categorical moderator MODERATED_REGRESSION(data=data_Huitema_2011, DV='Y', IV='X', IV_range='tumble', MOD='D', MOD_type = 'factor', center = FALSE, plot_type = 'interaction', JN_type = 'Huitema')
head(data_Huitema_2011) # replicating results on p. 255 for the Johnson-Neyman technique for a categorical moderator MODERATED_REGRESSION(data=data_Huitema_2011, DV='Y', IV='X', IV_range='tumble', MOD='D', MOD_type = 'factor', center = FALSE, plot_type = 'interaction', JN_type = 'Huitema')
Logistic and Poisson regression data from Kremelburg (2011).
data(data_Kremelburg_2011)
data(data_Kremelburg_2011)
Kremelburg, D. (2011). Chapter 6: Logistic, ordered, multinomial, negative binomial, and Poisson regression. Practical statistics: A quick and easy guide to IBM SPSS Statistics, STATA, and other statistical software. Sage.
head(data_Kremelburg_2011) LOGISTIC_REGRESSION(data = data_Kremelburg_2011, DV='OCCTRAIN', hierarchical=list( step1=c('AGE'), step2=c('EDUC','REALRINC')) ) COUNT_REGRESSION(data=data_Kremelburg_2011, DV='OVRJOYED', forced=c('AGE','EDUC','REALRINC','SEX_factor'))
head(data_Kremelburg_2011) LOGISTIC_REGRESSION(data = data_Kremelburg_2011, DV='OCCTRAIN', hierarchical=list( step1=c('AGE'), step2=c('EDUC','REALRINC')) ) COUNT_REGRESSION(data=data_Kremelburg_2011, DV='OVRJOYED', forced=c('AGE','EDUC','REALRINC','SEX_factor'))
Moderated regression data from Lorah and Wong (2018).
data(data_Lorah_Wong_2018)
data(data_Lorah_Wong_2018)
Lorah, J. A. & Wong, Y. J. (2018). Contemporary applications of moderation analysis in counseling psychology. Journal of Counseling Psychology, 65(5), 629-640.
head(data_Lorah_Wong_2018) model_Lorah <- MODERATED_REGRESSION(data=data_Lorah_Wong_2018, DV='suicidal', IV='burden', IV_range='tumble', MOD='belong_thwarted', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), COVARS='depression', center = TRUE, plot_type = 'regions') REGIONS_OF_SIGNIFICANCE(model=model_Lorah, plot_title='Johnson-Neyman Regions of Significance', Xaxis_label='Thwarted Belongingness', Yaxis_label='Slopes of Burdensomeness on Suicical Ideation', legend_label=NULL)
head(data_Lorah_Wong_2018) model_Lorah <- MODERATED_REGRESSION(data=data_Lorah_Wong_2018, DV='suicidal', IV='burden', IV_range='tumble', MOD='belong_thwarted', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), COVARS='depression', center = TRUE, plot_type = 'regions') REGIONS_OF_SIGNIFICANCE(model=model_Lorah, plot_title='Johnson-Neyman Regions of Significance', Xaxis_label='Thwarted Belongingness', Yaxis_label='Slopes of Burdensomeness on Suicical Ideation', legend_label=NULL)
Logistic regression data from Myers et al. (2013).
data(data_Meyers_2013)
data(data_Meyers_2013)
Meyers, L. S., Gamst, G. C., & Guarino, A. J. (2013). Chapter 30: Binary logistic regression. Performing data analysis using IBM SPSS. Hoboken, NJ: Wiley.
head(data_Meyers_2013) LOGISTIC_REGRESSION(data= data_Meyers_2013, DV='graduated', forced= c('sex','family_encouragement'))
head(data_Meyers_2013) LOGISTIC_REGRESSION(data= data_Meyers_2013, DV='graduated', forced= c('sex','family_encouragement'))
Moderated regression data from O'Connor and Dvorak (2001)
A data frame with scores for 131 male adolescents on resiliency, maternal harshness, and aggressive behavior. The data are from O'Connor and Dvorak (2001, p. 17) and are provided as trial moderated regression data for the MODERATED_REGRESSION and REGIONS_OF_SIGNIFICANCE functions.
O'Connor, B. P., & Dvorak, T. (2001). Conditional associations between parental behavior and adolescent problems: A search for personality-environment interactions. Journal of Research in Personality, 35, 1-26.
head(data_OConnor_Dvorak_2001) mharsh_agg <- MODERATED_REGRESSION(data=data_OConnor_Dvorak_2001, DV='Aggressive_Behavior', IV='Maternal_Harshness', IV_range=c(1,7.7), MOD='Resiliency',MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'interaction', DV_range = c(1,6), Xaxis_label='Maternal Harshness', Yaxis_label='Adolescent Aggressive Behavior', legend_label='Resiliency') REGIONS_OF_SIGNIFICANCE(model=mharsh_agg, plot_title='Slopes of Maternal Harshness on Aggression by Resiliency', Xaxis_label='Resiliency', Yaxis_label='Slopes of Maternal Harshness on Aggressive Behavior ')
head(data_OConnor_Dvorak_2001) mharsh_agg <- MODERATED_REGRESSION(data=data_OConnor_Dvorak_2001, DV='Aggressive_Behavior', IV='Maternal_Harshness', IV_range=c(1,7.7), MOD='Resiliency',MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'interaction', DV_range = c(1,6), Xaxis_label='Maternal Harshness', Yaxis_label='Adolescent Aggressive Behavior', legend_label='Resiliency') REGIONS_OF_SIGNIFICANCE(model=mharsh_agg, plot_title='Slopes of Maternal Harshness on Aggression by Resiliency', Xaxis_label='Resiliency', Yaxis_label='Slopes of Maternal Harshness on Aggressive Behavior ')
Logistic regression data from Orme and Combs-Orme (2009), Chapter 2.
data(data_Orme_2009_2)
data(data_Orme_2009_2)
Orme, J. G., & Combs-Orme, T. (2009). Multiple Regression With Discrete Dependent Variables. Oxford University Press.
LOGISTIC_REGRESSION(data = data_Orme_2009_2, DV='ContinueFostering', forced= c('zResources', 'Married'))
LOGISTIC_REGRESSION(data = data_Orme_2009_2, DV='ContinueFostering', forced= c('zResources', 'Married'))
Data for count regression from Orme and Combs-Orme (2009), Chapter 5.
data(data_Orme_2009_5)
data(data_Orme_2009_5)
Orme, J. G., & Combs-Orme, T. (2009). Multiple Regression With Discrete Dependent Variables. Oxford University Press.
COUNT_REGRESSION(data=data_Orme_2009_5, DV='NumberAdopted', forced=c('Married','zParentRole'))
COUNT_REGRESSION(data=data_Orme_2009_5, DV='NumberAdopted', forced=c('Married','zParentRole'))
Moderated regression data for a continuous predictor and a dichotomous moderator from Pedhazur (1997, p. 588).
data(data_Pedhazur_1997)
data(data_Pedhazur_1997)
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction. (3rd ed.). Fort Worth, Texas: Wadsworth Thomson Learning.
head(data_Pedhazur_1997) # replicating results on p. 594 for the Johnson-Neyman technique for a categorical moderator MODERATED_REGRESSION(data=data_Pedhazur_1997, DV='Y', IV='X', IV_range='tumble', MOD='Directive', MOD_type = 'factor', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'interaction', JN_type = 'Pedhazur')
head(data_Pedhazur_1997) # replicating results on p. 594 for the Johnson-Neyman technique for a categorical moderator MODERATED_REGRESSION(data=data_Pedhazur_1997, DV='Y', IV='X', IV_range='tumble', MOD='Directive', MOD_type = 'factor', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'interaction', JN_type = 'Pedhazur')
Logistic regression data from Pituch and Stevens (2016), Chapter 11.
data(data_Pituch_Stevens_2016)
data(data_Pituch_Stevens_2016)
Pituch, K. A., & Stevens, J. P. (2016). Applied multivariate statistics for the social sciences : Analyses with SAS and IBMs SPSS, (6th ed.). Routledge.
LOGISTIC_REGRESSION(data = data_Pituch_Stevens_2016, DV='Health', forced= c('Treatment','Motivation'))
LOGISTIC_REGRESSION(data = data_Pituch_Stevens_2016, DV='Health', forced= c('Treatment','Motivation'))
Logistic regression analyses with SPSS- and SAS-like output. The output includes model summaries, classification tables, omnibus tests of model coefficients, the model coefficients, likelihood ratio tests for the predictors, overdispersion tests, model effect sizes, the correlation matrix for the model coefficients, collinearity statistics, and casewise regression diagnostics.
LOGISTIC_REGRESSION(data, DV, forced = NULL, hierarchical = NULL, ref_category = NULL, family = 'binomial', plot_type = 'residuals', CI_level = 95, MCMC = FALSE, Nsamples = 4000, verbose = TRUE)
LOGISTIC_REGRESSION(data, DV, forced = NULL, hierarchical = NULL, ref_category = NULL, family = 'binomial', plot_type = 'residuals', CI_level = 95, MCMC = FALSE, Nsamples = 4000, verbose = TRUE)
data |
A dataframe where the rows are cases and the columns are the variables. |
DV |
The name of the dependent variable.
|
forced |
(optional) A vector of the names of the predictor variables for a forced/simultaneous
entry regression. The variables can be numeric or factors.
|
hierarchical |
(optional) A list with the names of the predictor variables for each step of a
hierarchical regression. The variables can be numeric or factors.
|
ref_category |
(optional) The reference category for DV.
|
family |
(optional) The name of the error distribution to be used in the model. The options are:
Example: family = 'quasibinomial' |
plot_type |
(optional) The kind of plots, if any. The options are:
Example: plot_type = 'diagnostics' |
CI_level |
(optional) The confidence interval for the output, in whole numbers. The default is 95. |
MCMC |
(logical) Should Bayesian MCMC analyses be conducted? The default is FALSE. |
Nsamples |
(optional) The number of samples for MCMC analyses. The default is 10000. |
verbose |
(optional) Should detailed results be displayed in console? |
This function uses the glm function from the stats package and supplements the output with additional statistics and in formats that resembles SPSS and SAS output. The predictor variables can be numeric or factors.
Predicted values for this model, for selected levels of the predictor variables, can be produced and plotted using the PLOT_MODEL funtion in this package.
The Bayesian MCMC analyses can be time-consuming for larger datasets. The MCMC analyses are conducted using functions, and their default settings, from the rstanarm package (Goodrich, Gabry, Ali, & Brilleman, 2024). The MCMC results can be verified using the model checking functions in the rstanarm package (e.g., Muth, Oravecz, & Gabry, 201).
Good sources for interpreting logistic regression residuals and diagnostics plots:
An object of class "LOGISTIC_REGRESSION". The object is a list containing the following possible components:
modelMAIN |
All of the glm function output for the regression model. |
modelMAINsum |
All of the summary.glm function output for the regression model. |
modeldata |
All of the predictor and outcome raw data that were used in the model, along with regression diagnostic statistics for each case. |
collin_diags |
Collinearity diagnostic coefficients for models without interaction terms. |
Brian P. O'Connor
Dunn, P. K., & Smyth, G. K. (2018). Generalized linear models
with examples in R. Springer.
Field, A., Miles, J., & Field, Z. (2012).
Discovering statistics using R. Los Angeles, CA: Sage.
Goodrich, B., Gabry, J., Ali, I., & Brilleman, S. (2024). rstanarm:
Bayesian applied regression modeling via Stan. R package version 2.32.1,
https://mc-stan.org/rstanarm/.
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2014).
Multivariate data analysis, (8th ed.).
Lawrence Erlbaum Associates.
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013)
Applied logistic regression. (3rd ed.). John Wiley & Sons.
Muth, C., Oravecz, Z., & Gabry, J. (2018). User-friendly Bayesian regression
modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods
for Psychology, 14(2), 99119.
https://doi.org/10.20982/tqmp.14.2.p099
Orme, J. G., & Combs-Orme, T. (2009). Multiple regression with discrete
dependent variables. Oxford University Press.
Pituch, K. A., & Stevens, J. P. (2016).
Applied multivariate statistics for the social sciences: Analyses with
SAS and IBM's SPSS, (6th ed.). Routledge.
Rindskopf, D. (2023). Generalized linear models. In H. Cooper, M. N.
Coutanche, L. M. McMullen, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.),
APA handbook of research methods in psychology: Data analysis and
research publication, (2nd ed., pp. 201-218). American Psychological Association.
# forced (simultaneous) entry LOGISTIC_REGRESSION(data = data_Meyers_2013, DV='graduated', forced=c('sex','family_encouragement'), plot_type = 'diagnostics') # hierarchical entry, and using family = "quasibinomial" LOGISTIC_REGRESSION(data = data_Kremelburg_2011, DV='OCCTRAIN', hierarchical=list( step1=c('AGE'), step2=c('EDUC','REALRINC')), family = "quasibinomial")
# forced (simultaneous) entry LOGISTIC_REGRESSION(data = data_Meyers_2013, DV='graduated', forced=c('sex','family_encouragement'), plot_type = 'diagnostics') # hierarchical entry, and using family = "quasibinomial" LOGISTIC_REGRESSION(data = data_Kremelburg_2011, DV='OCCTRAIN', hierarchical=list( step1=c('AGE'), step2=c('EDUC','REALRINC')), family = "quasibinomial")
Conducts moderated regression analyses for two-way interactions with extensive options for interaction plots, including Johnson-Neyman regions of significance. The output includes the Anova Table (Type III tests), standardized coefficients, partial and semi-partial correlations, collinearity statistics, casewise regression diagnostics, plots of residuals and regression diagnostics, and detailed information about simple slopes. The output includes Bayes Factors and, if requested, regression coefficients from Bayesian Markov Chain Monte Carlo (MCMC) analyses.
MODERATED_REGRESSION(data, DV, IV, MOD, IV_type = 'numeric', IV_range = 'tumble', MOD_type='numeric', MOD_levels='quantiles', MOD_range=NULL, quantiles_IV = c(.1, .9), quantiles_MOD = c(.25, .5, .75), COVARS = NULL, center = TRUE, CI_level = 95, MCMC = FALSE, Nsamples = 10000, plot_type = 'residuals', plot_title = NULL, DV_range = NULL, Xaxis_label = NULL, Yaxis_label = NULL, legend_label = NULL, JN_type = 'Huitema', verbose = TRUE )
MODERATED_REGRESSION(data, DV, IV, MOD, IV_type = 'numeric', IV_range = 'tumble', MOD_type='numeric', MOD_levels='quantiles', MOD_range=NULL, quantiles_IV = c(.1, .9), quantiles_MOD = c(.25, .5, .75), COVARS = NULL, center = TRUE, CI_level = 95, MCMC = FALSE, Nsamples = 10000, plot_type = 'residuals', plot_title = NULL, DV_range = NULL, Xaxis_label = NULL, Yaxis_label = NULL, legend_label = NULL, JN_type = 'Huitema', verbose = TRUE )
data |
A dataframe where the rows are cases and the columns are the variables. |
DV |
The name of the dependent variable.
|
IV |
The name of the independent variable.
|
MOD |
The name of the moderator variable
|
IV_type |
(optional) The type of independent variable. The
options are 'numeric' (the default) or 'factor'.
|
IV_range |
(optional) The independent variable range for a moderated regression plot. The options are:
Example: IV_range = 'AikenWest' |
MOD_type |
(optional) The type of moderator variable. The
options are 'numeric' (the default) or 'factor'.
|
MOD_levels |
(optional) The levels of the moderator variable to be used if MOD is continuous. The options are:
Example: MOD_levels = c(1, 10) |
MOD_range |
(optional) The range of the MOD values to be used in the Johnson-Neyman regions
of significance analyses. The options are:
NULL (the default), in which case the minimum and maximum MOD values will be used; and
a vector of two user-provided values.
|
quantiles_IV |
(optional) The quantiles of the independent variable to be used as the IV range for
a moderated regression plot.
|
quantiles_MOD |
(optional) The quantiles the moderator variable to be used as the MOD simple slope
values in the moderated regression analyses.
|
COVARS |
(optional) The name(s) of possible covariates.
|
center |
(optional) Logical, indicating whether the IV and MOD variables should be centered
(default = TRUE).
|
CI_level |
(optional) The confidence interval for the output, in whole numbers. The default is 95. |
MCMC |
(logical) Should Bayesian MCMC analyses be conducted? The default is FALSE. |
Nsamples |
(optional) The number of samples for MCMC analyses. The default is 10000. |
plot_type |
(optional) The kind of plot, if any. The options are:
Example: plot_type = 'diagnostics' |
plot_title |
(optional) The plot title.
|
DV_range |
(optional) The range of Y-axis values for the plot.
|
Xaxis_label |
(optional) A label for the X axis to be used in the requested plot.
|
Yaxis_label |
(optional) A label for the Y axis to be used in the requested plot.
|
legend_label |
(optional) A legend label for the plot.
|
JN_type |
(optional) The formula to be used in computing the critical F value for the
Johnson-Neyman regions of significance analyses. The options are 'Huitema' (the default),
or 'Pedhazur'.
|
verbose |
Should detailed results be displayed in console? The options are: TRUE (default) or FALSE. If TRUE, plots of residuals are also produced. |
The Bayesian MCMC analyses can be time-consuming for larger datasets. The MCMC analyses are conducted using functions, and their default settings, from the BayesFactor package (Morey & Rouder, 2024). The MCMC results can be verified using the model checking functions in the rstanarm package (e.g., Muth, Oravecz, & Gabry, 201).
An object of class "MODERATED_REGRESSION". The object is a list containing the following possible components:
modelMAINsum |
All of the summary.lm function output for the regression model without interaction terms. |
anova_table |
Anova Table (Type III tests). |
mainRcoefs |
Predictor coefficients for the model without interaction terms. |
modeldata |
All of the predictor and outcome raw data that were used in the model, along with regression diagnostic statistics for each case. |
collin_diags |
Collinearity diagnostic coefficients for models without interaction terms. |
modelXNsum |
Regression model statistics with interaction terms. |
RsqchXn |
Rsquared change for the interaction. |
fsquaredXN |
fsquared change for the interaction. |
xnRcoefs |
Predictor coefficients for the model with interaction terms. |
simslop |
The simple slopes. |
simslopZ |
The standardized simple slopes. |
plotdon |
The plot data for a moderated regression. |
JN.data |
The Johnson-Neyman results for a moderated regression. |
ros |
The Johnson-Neyman regions of significance for a moderated regression. |
Brian P. O'Connor
Bodner, T. E. (2016). Tumble graphs: Avoiding misleading end point extrapolation when
graphing interactions from a moderated multiple regression analysis.
Journal of Educational and Behavioral Statistics, 41, 593-604.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied
multiple regression/correlation analysis for the behavioral sciences (3rd ed.).
Lawrence Erlbaum Associates.
Darlington, R. B., & Hayes, A. F. (2017). Regression analysis and linear models:
Concepts, applications, and implementation. Guilford Press.
Hayes, A. F. (2018a). Introduction to mediation, moderation, and conditional process
analysis: A regression-based approach (2nd ed.). Guilford Press.
Hayes, A. F., & Montoya, A. K. (2016). A tutorial on testing, visualizing, and probing
an interaction involving a multicategorical variable in linear regression analysis.
Communication Methods and Measures, 11, 1-30.
Lee M. D., & Wagenmakers, E. J. (2014) Bayesian cognitive modeling: A practical
course. Cambridge University Press.
Morey, R. & Rouder, J. (2024). BayesFactor: Computation of Bayes Factors for
Common Designs. R package version 0.9.12-4.7,
https://github.com/richarddmorey/bayesfactor.
Muth, C., Oravecz, Z., & Gabry, J. (2018). User-friendly Bayesian regression
modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods
for Psychology, 14(2), 99119.
https://doi.org/10.20982/tqmp.14.2.p099
O'Connor, B. P. (1998). All-in-one programs for exploring interactions in moderated
multiple regression. Educational and Psychological Measurement, 58, 833-837.
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation
and prediction. (3rd ed.). Wadsworth Thomson Learning.
# moderated regression -- with IV_range = 'AikenWest' MODERATED_REGRESSION(data=data_Lorah_Wong_2018, DV='suicidal', IV='burden', MOD='belong_thwarted', IV_range='AikenWest', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = TRUE, COVARS='depression', plot_type = 'interaction', plot_title=NULL, DV_range = c(1,1.25)) # moderated regression -- with IV_range = 'tumble' MODERATED_REGRESSION(data=data_Lorah_Wong_2018, DV='suicidal', IV='burden', MOD='belong_thwarted', IV_range='tumble', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = TRUE, COVARS='depression', plot_type = 'interaction', plot_title=NULL, DV_range = c(1,1.25)) # moderated regression -- with numeric values for IV_range & MOD_levels='AikenWest' MODERATED_REGRESSION(data=data_OConnor_Dvorak_2001, DV='Aggressive_Behavior', IV='Maternal_Harshness', MOD='Resiliency', IV_range=c(1,7.7), MOD_levels='AikenWest', MOD_range=NULL, quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'interaction', DV_range = c(1,6), Xaxis_label='Maternal Harshness', Yaxis_label='Adolescent Aggressive Behavior', legend_label='Resiliency')
# moderated regression -- with IV_range = 'AikenWest' MODERATED_REGRESSION(data=data_Lorah_Wong_2018, DV='suicidal', IV='burden', MOD='belong_thwarted', IV_range='AikenWest', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = TRUE, COVARS='depression', plot_type = 'interaction', plot_title=NULL, DV_range = c(1,1.25)) # moderated regression -- with IV_range = 'tumble' MODERATED_REGRESSION(data=data_Lorah_Wong_2018, DV='suicidal', IV='burden', MOD='belong_thwarted', IV_range='tumble', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = TRUE, COVARS='depression', plot_type = 'interaction', plot_title=NULL, DV_range = c(1,1.25)) # moderated regression -- with numeric values for IV_range & MOD_levels='AikenWest' MODERATED_REGRESSION(data=data_OConnor_Dvorak_2001, DV='Aggressive_Behavior', IV='Maternal_Harshness', MOD='Resiliency', IV_range=c(1,7.7), MOD_levels='AikenWest', MOD_range=NULL, quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'interaction', DV_range = c(1,6), Xaxis_label='Maternal Harshness', Yaxis_label='Adolescent Aggressive Behavior', legend_label='Resiliency')
Provides SPSS- and SAS-like output for ordinary least squares simultaneous entry regression and hierarchical entry regression. The output includes the Anova Table (Type III tests), standardized coefficients, partial and semi-partial correlations, collinearity statistics, casewise regression diagnostics, plots of residuals and regression diagnostics. The output includes Bayes Factors and, if requested, regression coefficients from Bayesian Markov Chain Monte Carlo (MCMC) analyses.
OLS_REGRESSION(data, DV, forced=NULL, hierarchical=NULL, COVARS=NULL, plot_type = 'residuals', CI_level = 95, MCMC = FALSE, Nsamples = 10000, verbose=TRUE, ...)
OLS_REGRESSION(data, DV, forced=NULL, hierarchical=NULL, COVARS=NULL, plot_type = 'residuals', CI_level = 95, MCMC = FALSE, Nsamples = 10000, verbose=TRUE, ...)
data |
A dataframe where the rows are cases and the columns are the variables. |
DV |
The name of the dependent variable.
|
forced |
(optional) A vector of the names of the predictor variables for a forced/simultaneous
entry regression. The variables can be numeric or factors.
|
hierarchical |
(optional) A list with the names of the predictor variables for each step of
a hierarchical regression. The variables can be numeric or factors.
|
COVARS |
(optional) The name(s) of possible covariates variable for a moderated regression
analysis.
|
plot_type |
(optional) The kind of plots, if any. The options are:
Example: plot_type = 'diagnostics' |
CI_level |
(optional) The confidence interval for the output, in whole numbers. The default is 95. |
MCMC |
(logical) Should Bayesian MCMC analyses be conducted? The default is FALSE. |
Nsamples |
(optional) The number of samples for MCMC analyses. The default is 10000. |
verbose |
Should detailed results be displayed in console? The options are: TRUE (default) or FALSE. If TRUE, plots of residuals are also produced. |
... |
(dots, for internal purposes only at this time.) |
This function uses the lm function from the stats package, supplements the output with additional statistics, and it formats the output so that it resembles SPSS and SAS regression output. The predictor variables can be numeric or factors.
The Bayesian MCMC analyses can be time-consuming for larger datasets. The MCMC analyses are conducted using functions, and their default settings, from the BayesFactor package (Morey & Rouder, 2024). The MCMC results can be verified using the model checking functions in the rstanarm package (e.g., Muth, Oravecz, & Gabry, 2018).
Good sources for interpreting residuals and diagnostics plots:
An object of class "OLS_REGRESSION". The object is a list containing the following possible components:
modelMAIN |
All of the lm function output for the regression model without interaction terms. |
modelMAINsum |
All of the summary.lm function output for the regression model without interaction terms. |
anova_table |
Anova Table (Type III tests). |
mainRcoefs |
Predictor coefficients for the model without interaction terms. |
modeldata |
All of the predictor and outcome raw data that were used in the model, along with regression diagnostic statistics for each case. |
collin_diags |
Collinearity diagnostic coefficients for models without interaction terms. |
Brian P. O'Connor
Bodner, T. E. (2016). Tumble graphs: Avoiding misleading end point extrapolation when
graphing interactions from a moderated multiple regression analysis.
Journal of Educational and Behavioral Statistics, 41, 593-604.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied
multiple regression/correlation analysis for the behavioral sciences (3rd ed.).
Lawrence Erlbaum Associates.
Darlington, R. B., & Hayes, A. F. (2017). Regression analysis and linear models:
Concepts, applications, and implementation. Guilford Press.
Hayes, A. F. (2018a). Introduction to mediation, moderation, and conditional process
analysis: A regression-based approach (2nd ed.). Guilford Press.
Hayes, A. F., & Montoya, A. K. (2016). A tutorial on testing, visualizing, and probing
an interaction involving a multicategorical variable in linear regression analysis.
Communication Methods and Measures, 11, 1-30.
Lee M. D., & Wagenmakers, E. J. (2014) Bayesian cognitive modeling: A practical
course. Cambridge University Press.
Morey, R. & Rouder, J. (2024). BayesFactor: Computation of Bayes Factors for
Common Designs. R package version 0.9.12-4.7,
https://github.com/richarddmorey/bayesfactor.
Muth, C., Oravecz, Z., & Gabry, J. (2018). User-friendly Bayesian regression
modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods
for Psychology, 14(2), 99119.
https://doi.org/10.20982/tqmp.14.2.p099
O'Connor, B. P. (1998). All-in-one programs for exploring interactions in moderated
multiple regression. Educational and Psychological Measurement, 58, 833-837.
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation
and prediction. (3rd ed.). Wadsworth Thomson Learning.
# forced (simultaneous) entry head(data_Green_Salkind_2014) OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', forced = c('quads','gluts','abdoms','arms','grip')) # hierarchical entry OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', hierarchical = list( step1=c('quads','gluts','abdoms'), step2=c('arms','grip')) )
# forced (simultaneous) entry head(data_Green_Salkind_2014) OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', forced = c('quads','gluts','abdoms','arms','grip')) # hierarchical entry OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', hierarchical = list( step1=c('quads','gluts','abdoms'), step2=c('arms','grip')) )
Produces standardized regression coefficients, partial correlations, and semi-partial correlations for a correlation matrix in which one variable is a dependent or outcome variable and the other variables are independent or predictor variables.
PARTIAL_COEFS(cormat, modelRsq=NULL, verbose=TRUE)
PARTIAL_COEFS(cormat, modelRsq=NULL, verbose=TRUE)
cormat |
A correlation matrix. The DV (the dependent or outcome variable) must be in
the first row/column of cormat.
|
modelRsq |
(optional) The model Rsquared, which makes the computations slightly faster
when it is available.
|
verbose |
Should detailed results be displayed in console? |
A data.frame containing the standardized regression coefficients (betas), the Pearson correlations, the partial correlations, and the semi-partial correlations for each variable with the DV.
Brian P. O'Connor
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Lawrence Erlbaum Associates.
PARTIAL_COEFS(cormat = cor(data_Green_Salkind_2014))
PARTIAL_COEFS(cormat = cor(data_Green_Salkind_2014))
Plots predicted values of the outcome variable for specified levels of predictor variables for OLS_REGRESSION, MODERATED_REGRESSION, LOGISTIC_REGRESSION, and COUNT_REGRESSION models from this package.
PLOT_MODEL(model, IV_focal_1, IV_focal_1_values=NULL, IV_focal_2=NULL, IV_focal_2_values=NULL, IVs_nonfocal_values = NULL, bootstrap=FALSE, N_sims=100, CI_level=95, xlim=NULL, xlab=NULL, ylim=NULL, ylab=NULL, title = NULL, verbose=TRUE)
PLOT_MODEL(model, IV_focal_1, IV_focal_1_values=NULL, IV_focal_2=NULL, IV_focal_2_values=NULL, IVs_nonfocal_values = NULL, bootstrap=FALSE, N_sims=100, CI_level=95, xlim=NULL, xlab=NULL, ylim=NULL, ylab=NULL, title = NULL, verbose=TRUE)
model |
The returned output from the OLS_REGRESSION, MODERATED_REGRESSION, LOGISTIC_REGRESSION, or COUNT_REGRESSION functions in this package. |
IV_focal_1 |
The name of the focal, varying predictor variable.
|
IV_focal_1_values |
(optional) Values for IV_focal_1, for which predictions of the
outcome will be produced and plotted.
IV_focal_1_values will appear on the x-axis in the plot.
If IV_focal_1 is numeric and IV_focal_1_values is not provided,
then a sequence based on the range of the model data values for IV_focal_1 will be used.
If IV_focal_1 is a factor & IV_focal_1_values is not provided, then the
factor levels from the model data values for IV_focal_1 will be used.
|
IV_focal_2 |
(optional) If desired, the name of a second focal predictor variable for the plot.
|
IV_focal_2_values |
(optional) Values for IV_focal_2 for which predictions of the
outcome will be produced and plotted.
If IV_focal_2 is numeric and IV_focal_2_values is not provided, then
the following three values for IV_focal_2_values, derived from the model data,
will be used for plotting: the mean, one SD below the mean, and one SD above the mean.
If IV_focal_2 is a factor & IV_focal_2_values is not provided, then the
factor levels from the model data values for IV_focal_2 will be used.
|
IVs_nonfocal_values |
(optional) A list with the desired constant values for the non focal predictors,
if any. If IVs_nonfocal_values is not provided, then the mean values of numeric non focal
predictors and the baseline values of factors will be used as the defaults.
It is also possible to specify values for only some of the IVs_nonfocal variables
on this argument.
|
bootstrap |
(optional) Should bootstrapping be used for the confidence intervals? The options are TRUE or FALSE (the default). |
N_sims |
(optional) The number of bootstrap simulations.
|
CI_level |
(optional) The desired confidence interval, in whole numbers.
|
xlim |
(optional) The x-axis limits for the plot.
|
xlab |
(optional) A x-axis label for the plot.
|
ylim |
(optional) The y-axis limits for the plot.
|
ylab |
(optional) A y-axis label for the plot.
|
title |
(optional) A title for the plot.
|
verbose |
Should detailed results be displayed in console? |
Plots predicted values of the outcome variable for specified levels of predictor variables for OLS_REGRESSION, MODERATED_REGRESSION, LOGISTIC_REGRESSION, and COUNT_REGRESSION models from this package.
A plot with both IV_focal_1 and IV_focal_2 predictor variables will look like an interaction plot. But it is only a true interaction plot if the required product term(s) was entered as a predictor when the model was created.
A matrix with the levels of the variables that were used for the plot along with the predicted values, confidence intervals, and se.fit values.
Brian P. O'Connor
ols_GS <- OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', hierarchical = list( step1=c('age','quads','gluts','abdoms'), step2=c('arms','grip')) ) PLOT_MODEL(model = ols_GS, IV_focal_1 = 'gluts', IV_focal_1_values=NULL, IV_focal_2='age', IV_focal_2_values=NULL, IVs_nonfocal_values = NULL, bootstrap=TRUE, N_sims=100, CI_level=95, ylim=NULL, ylab=NULL, title=NULL, verbose=TRUE) ols_LW <- MODERATED_REGRESSION(data=data_Lorah_Wong_2018, DV='suicidal', IV='burden', MOD='belong_thwarted', IV_range='tumble', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), COVARS='depression', plot_type = 'interaction', DV_range = c(1,1.25)) PLOT_MODEL(model = ols_LW, IV_focal_1 = 'burden', IV_focal_1_values=NULL, IV_focal_2='belong_thwarted', IV_focal_2_values=NULL, bootstrap=TRUE, N_sims=100, CI_level=95) logmod_Meyers <- LOGISTIC_REGRESSION(data= data_Meyers_2013, DV='graduated', forced= c('sex','family_encouragement') ) PLOT_MODEL(model = logmod_Meyers, IV_focal_1 = 'family_encouragement', IV_focal_1_values=NULL, IV_focal_2=NULL, IV_focal_2_values=NULL, bootstrap=FALSE, N_sims=100, CI_level=95) pois_Krem <- COUNT_REGRESSION(data=data_Kremelburg_2011, DV='OVRJOYED', forced=NULL, hierarchical= list( step1=c('AGE','SEX_factor'), step2=c('EDUC','REALRINC','DEGREE')) ) PLOT_MODEL(model = pois_Krem, IV_focal_1 = 'AGE', IV_focal_2='DEGREE', IVs_nonfocal_values = list( EDUC = 5, SEX_factor = '2'), bootstrap=FALSE, N_sims=100, CI_level=95)
ols_GS <- OLS_REGRESSION(data=data_Green_Salkind_2014, DV='injury', hierarchical = list( step1=c('age','quads','gluts','abdoms'), step2=c('arms','grip')) ) PLOT_MODEL(model = ols_GS, IV_focal_1 = 'gluts', IV_focal_1_values=NULL, IV_focal_2='age', IV_focal_2_values=NULL, IVs_nonfocal_values = NULL, bootstrap=TRUE, N_sims=100, CI_level=95, ylim=NULL, ylab=NULL, title=NULL, verbose=TRUE) ols_LW <- MODERATED_REGRESSION(data=data_Lorah_Wong_2018, DV='suicidal', IV='burden', MOD='belong_thwarted', IV_range='tumble', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), COVARS='depression', plot_type = 'interaction', DV_range = c(1,1.25)) PLOT_MODEL(model = ols_LW, IV_focal_1 = 'burden', IV_focal_1_values=NULL, IV_focal_2='belong_thwarted', IV_focal_2_values=NULL, bootstrap=TRUE, N_sims=100, CI_level=95) logmod_Meyers <- LOGISTIC_REGRESSION(data= data_Meyers_2013, DV='graduated', forced= c('sex','family_encouragement') ) PLOT_MODEL(model = logmod_Meyers, IV_focal_1 = 'family_encouragement', IV_focal_1_values=NULL, IV_focal_2=NULL, IV_focal_2_values=NULL, bootstrap=FALSE, N_sims=100, CI_level=95) pois_Krem <- COUNT_REGRESSION(data=data_Kremelburg_2011, DV='OVRJOYED', forced=NULL, hierarchical= list( step1=c('AGE','SEX_factor'), step2=c('EDUC','REALRINC','DEGREE')) ) PLOT_MODEL(model = pois_Krem, IV_focal_1 = 'AGE', IV_focal_2='DEGREE', IVs_nonfocal_values = list( EDUC = 5, SEX_factor = '2'), bootstrap=FALSE, N_sims=100, CI_level=95)
Plots of Johnson-Neyman regions of significance for interactions in moderated multiple regression, for both MODERATED_REGRESSION models (which are produced by this package) and for lme models (from the nlme package).
REGIONS_OF_SIGNIFICANCE(model, IV_range=NULL, MOD_range=NULL, plot_title=NULL, Xaxis_label=NULL, Yaxis_label=NULL, legend_label=NULL, names_IV_MOD=NULL)
REGIONS_OF_SIGNIFICANCE(model, IV_range=NULL, MOD_range=NULL, plot_title=NULL, Xaxis_label=NULL, Yaxis_label=NULL, legend_label=NULL, names_IV_MOD=NULL)
model |
The name of a MODERATED_REGRESSION model, or of an lme model from the nlme package. |
IV_range |
(optional) The range of the IV to be used in the plot.
|
MOD_range |
(optional) The range of the MOD values to be used in the plot.
|
plot_title |
(optional) The plot title.
|
Xaxis_label |
(optional) A label for the X axis to be used in the plot.
|
Yaxis_label |
(optional) A label for the Y axis to be used in the plot.
|
legend_label |
(optional) The legend label.
|
names_IV_MOD |
(optional) and for lme/nlme models only. Use this argument to ensure that the IV and MOD variables are correctly identified for the plot. There are three scenarios in particular that may require specification of this argument:
Example: names_IV_MOD = c('IV name', 'MOD name') |
A list with the following possible components:
JN.data |
The Johnson-Neyman results for a moderated regression. |
ros |
The Johnson-Neyman regions of significance for a moderated regression. |
Brian P. O'Connor
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel
regression: Inferential and graphical techniques. Multivariate Behavioral
Research, 40(3), 373-400.
Huitema, B. (2011). The analysis of covariance and alternatives: Statistical
methods for experiments, quasi-experiments, and single-case studies. John Wiley & Sons.
Johnson, P. O., & Neyman, J. (1936). Tests of certain linear hypotheses and their
application to some educational problems. Statistical Research Memoirs, 1, 57-93.
Johnson, P. O., & Fey, L. C. (1950). The Johnson-Neyman technique, its theory, and
application. Psychometrika, 15, 349-367.
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation
and prediction. (3rd ed.). Wadsworth Thomson Learning
Rast, P., Rush, J., Piccinin, A. M., & Hofer, S. M. (2014). The identification of regions of
significance in the effect of multimorbidity on depressive symptoms using longitudinal data: An
application of the Johnson-Neyman technique. Gerontology, 60, 274-281.
head(data_Cohen_Aiken_West_2003_7) CAW_7 <- MODERATED_REGRESSION(data=data_Cohen_Aiken_West_2003_7, DV='yendu', IV='xage',IV_range='tumble', MOD='zexer', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), plot_type = 'interaction') REGIONS_OF_SIGNIFICANCE(model=CAW_7) head(data_Bauer_Curran_2005) HSBmod <-nlme::lme(MathAch ~ Sector + CSES + CSES:Sector, data = data_Bauer_Curran_2005, random = ~1 + CSES|School, method = "ML") summary(HSBmod) REGIONS_OF_SIGNIFICANCE(model=HSBmod, plot_title='Johnson-Neyman Regions of Significance', Xaxis_label='Child SES', Yaxis_label='Slopes of School Sector on Math achievement') # moderated regression -- with numeric values for IV_range & MOD_levels='AikenWest' mharsh_agg <- MODERATED_REGRESSION(data=data_OConnor_Dvorak_2001, DV='Aggressive_Behavior', IV='Maternal_Harshness', IV_range=c(1,7.7), MOD='Resiliency', MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'interaction', DV_range = c(1,6), Xaxis_label='Maternal Harshness', Yaxis_label='Adolescent Aggressive Behavior', legend_label='Resiliency') REGIONS_OF_SIGNIFICANCE(model=mharsh_agg, plot_title='Johnson-Neyman Regions of Significance', Xaxis_label='Resiliency', Yaxis_label='Slopes of Maternal Harshness on Aggressive Behavior')
head(data_Cohen_Aiken_West_2003_7) CAW_7 <- MODERATED_REGRESSION(data=data_Cohen_Aiken_West_2003_7, DV='yendu', IV='xage',IV_range='tumble', MOD='zexer', MOD_levels='quantiles', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), plot_type = 'interaction') REGIONS_OF_SIGNIFICANCE(model=CAW_7) head(data_Bauer_Curran_2005) HSBmod <-nlme::lme(MathAch ~ Sector + CSES + CSES:Sector, data = data_Bauer_Curran_2005, random = ~1 + CSES|School, method = "ML") summary(HSBmod) REGIONS_OF_SIGNIFICANCE(model=HSBmod, plot_title='Johnson-Neyman Regions of Significance', Xaxis_label='Child SES', Yaxis_label='Slopes of School Sector on Math achievement') # moderated regression -- with numeric values for IV_range & MOD_levels='AikenWest' mharsh_agg <- MODERATED_REGRESSION(data=data_OConnor_Dvorak_2001, DV='Aggressive_Behavior', IV='Maternal_Harshness', IV_range=c(1,7.7), MOD='Resiliency', MOD_levels='AikenWest', quantiles_IV=c(.1, .9), quantiles_MOD=c(.25, .5, .75), center = FALSE, plot_type = 'interaction', DV_range = c(1,6), Xaxis_label='Maternal Harshness', Yaxis_label='Adolescent Aggressive Behavior', legend_label='Resiliency') REGIONS_OF_SIGNIFICANCE(model=mharsh_agg, plot_title='Johnson-Neyman Regions of Significance', Xaxis_label='Resiliency', Yaxis_label='Slopes of Maternal Harshness on Aggressive Behavior')