# QBA Posted by
 Simple linear regression
 Analysis of the equational relationship between X and Y. Regression SS/Total SS
 Slope, B1
 has a t distribution which standardizes its value to see if it is significantly different from 0. When the p-value of the slope is greater than the level of significance, one should assume the correlation coefficient will e close to 0.
 Y-intercept
 B0, is not interpreted
 Correlation Coefficient
 R, indicates nature and strength of the linear relationship variables
 Coefficient of determination
 R2, the ration of explained variation in Y to the total variation.
 Least Squares Criteria
 minimizes the squared vertical distances between the points and the regression line resulting in the line of best fit.
 Standard Error
 will be smaller for better predictive equation. if the override value of y varies widely about the regression line, the standard error of the slope will be large. Square root of the MS residual
 Correlation analysis
 is the study of the nature and degree of the relationship between variables. A correlation coefficient of +1 or -1 means x and y are perfectly, linearly related. An r value of 0 indicates absolutely no relationship
 Extrapolation
 is using Xs beyond the range of the given Xs to predict Y. THis can cause large errors in prediction. Relationship of slope to the correlation coefficient. signs are the same.
 Multicollinearity
 when Xs are highly correlated-this gives redundant information.
 Heteroscedasticity
 non-constant variance in the residuals
 Homoscedasticity
 constant variance in the residuals
 Outliers
 atypical values in a data set (anomalies)
 Non-Linear relationships
 CURVILINEAR patterns or LOGARITHMIC relationships
 Multiple regression analysis includes
 one dependent variable and more than one independent
 Stepwise
 tries all combinations of variables and produces the best predictors in order of their predictive power.
 Artificially inflated R-squared occurs when..
 there are too many predictors and not enough samples.
 Rule of thumb..
 you should have at least 10 times the number of observations as predictor variables.
 Normal Probability plots
 should produce a nearly straight line without outliers.
 T distribution vs. F Distribution
 T is usedto test the individual coefficients where F tests the overall or “global” model.
 Residuals
 are the differences in the observed value of Y at a given X and the predicted value. Absolute values between 2 and 3 are usually just suspicious while those over the absolute value of 3 are severe.
 Studentized Residuals
 should fall within +/-3 in order to be considered normal values.
 Transform Y and/or X when…
 any of the assumptions are violated
 In simple linear regression the use of regression lines is to …
 predict the average value of y that can be expected to occur at a given value of x.
 A high correlation between x and y..
 does NOT prove that x causes y
 dependent variable plotted..independent variable plotted…
 vertical axishorizontal axis
 If the confidence interval on the slope contains 0…
 there is no significant relationship between x and y
 When R^2 is positive…
 you CANNOT assume that the slope is also positive.
 The slope of the regression line represents…
 the amount of change that is expected to take place in y when x increases by one unit.
 Extrapolation
 using values beyond the range of the given Xs to predict Y
 if null hypothesis is rejected…
 there is a relationship between x and y.
 if no correlation between two variables…
 the regression line will be horizontal
 A large value for the slope does not necessarily imply a large value for the…
 correlation coefficient.
 Test the individual coefficients to see which Xs are good predictors.
 only test these if the overall model had at least one good predictor.
 A we add more predictors…
 R2 increases
 When you re-run a model after taking out the poor predictor variables…
 you have reduced the model
 When choosing between two models, both with good predictors for y…
 choose the one with the smallest standard error.
 Check the correlation matrix to make sure the X variables…
 are not correlated with each other
 check the signs of the coefficients…
 to make sure they are logical.
 Never say x causes y unless it was…
 a designed experiment.
 Qualitative variables in multiple regression are called..
 dummy variables. do not interpret their coefficients
 If there is a curve in the scatter diagram for any x,y chart or the residuals use…
 a quadratic equation… use x and x^2
 If you think two x variables may work together at different levels to affect y…
 then try an interaction term.
 Only interpret the coefficients of…
 good predictors and first order terms. First order terms are linear terms.
 Squared Xs and interacted Xs are called…
 second order terms.
 The us of regression lines is to..
 predict the average value of y that can be expected to occur at a given value of x.
 The study of the equational relationship between variables is called…
 regression analysis.