Posted by
Simple linear regression
Analysis of the equational relationship between X and Y. Regression SS/Total SS
Slope, B1
has a t distribution which standardizes its value to see if it is significantly different from 0. When the p-value of the slope is greater than the level of significance, one should assume the correlation coefficient will e close to 0.
B0, is not interpreted
Correlation Coefficient
R, indicates nature and strength of the linear relationship variables
Coefficient of determination
R2, the ration of explained variation in Y to the total variation.
Least Squares Criteria
minimizes the squared vertical distances between the points and the regression line resulting in the line of best fit.
Standard Error
will be smaller for better predictive equation. if the override value of y varies widely about the regression line, the standard error of the slope will be large.
Square root of the MS residual
Correlation analysis
is the study of the nature and degree of the relationship between variables. A correlation coefficient of +1 or -1 means x and y are perfectly, linearly related. An r value of 0 indicates absolutely no relationship
is using Xs beyond the range of the given Xs to predict Y. THis can cause large errors in prediction. Relationship of slope to the correlation coefficient. signs are the same.
when Xs are highly correlated-this gives redundant information.
non-constant variance in the residuals
constant variance in the residuals
atypical values in a data set (anomalies)
Non-Linear relationships
CURVILINEAR patterns or LOGARITHMIC relationships
Multiple regression analysis includes
one dependent variable and more than one independent
tries all combinations of variables and produces the best predictors in order of their predictive power.
Artificially inflated R-squared occurs when..
there are too many predictors and not enough samples.
Rule of thumb..
you should have at least 10 times the number of observations as predictor variables.
Normal Probability plots
should produce a nearly straight line without outliers.
T distribution vs. F Distribution
T is usedto test the individual coefficients where F tests the overall or “global” model.
are the differences in the observed value of Y at a given X and the predicted value. Absolute values between 2 and 3 are usually just suspicious while those over the absolute value of 3 are severe.
Studentized Residuals
should fall within +/-3 in order to be considered normal values.
Transform Y and/or X when…
any of the assumptions are violated
In simple linear regression the use of regression lines is to …
predict the average value of y that can be expected to occur at a given value of x.
A high correlation between x and y..
does NOT prove that x causes y
dependent variable plotted..
independent variable plotted…
vertical axis
horizontal axis
If the confidence interval on the slope contains 0…
there is no significant relationship between x and y
When R^2 is positive…
you CANNOT assume that the slope is also positive.
The slope of the regression line represents…
the amount of change that is expected to take place in y when x increases by one unit.
using values beyond the range of the given Xs to predict Y
if null hypothesis is rejected…
there is a relationship between x and y.
if no correlation between two variables…
the regression line will be horizontal
A large value for the slope does not necessarily imply a large value for the…
correlation coefficient.
Test the individual coefficients to see which Xs are good predictors.
only test these if the overall model had at least one good predictor.
A we add more predictors…
R2 increases
When you re-run a model after taking out the poor predictor variables…
you have reduced the model
When choosing between two models, both with good predictors for y…
choose the one with the smallest standard error.
Check the correlation matrix to make sure the X variables…
are not correlated with each other
check the signs of the coefficients…
to make sure they are logical.
Never say x causes y unless it was…
a designed experiment.
Qualitative variables in multiple regression are called..
dummy variables. do not interpret their coefficients
If there is a curve in the scatter diagram for any x,y chart or the residuals use…
a quadratic equation… use x and x^2
If you think two x variables may work together at different levels to affect y…
then try an interaction term.
Only interpret the coefficients of…
good predictors and first order terms. First order terms are linear terms.
Squared Xs and interacted Xs are called…
second order terms.
The us of regression lines is to..
predict the average value of y that can be expected to occur at a given value of x.
The study of the equational relationship between variables is called…
regression analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *