Site Loader
Get a Quote
Rock Street, San Francisco
Rock Street, San Francisco

CWRU Regression Project Report OPRE 433 Tianao Zhang 12/5/2011 Introduction According to the data I’ve received, there are 6578 observations. The data base is composed by 13 columns and 506 rows. All the explanatory variables are continuous as well as the dependent variable and there are no categorical variables. My goal is to build a regression model to predict the average of Y or particular Y by a given X. 1. Do the regression assumptions such as Constant Variance, Normality and Independence and the correct functional hold for the model? By performing residual analysis, I can test the model. . Is there any relationship between the explanatory variables? I do multicollinearity test to test this condition. 3. I want to find out the confidence interval and prediction interval for the average Y and particular Y value. 4. In order to check the usefulness of the model and the relationship between X and Y, I consider several variables: i. Multiple Coefficient of Determination R2 and Radj2) ii. DWT iii. F Ratio iv. VIF value v. P Probability value. Method of analysis 1. Find the important variables Use “Stepwise” to eliminate unimportant independent variables.

Analysis—Fit Model—Stepwise After using “Stepwise”, JMP shows me that column 3 and column 7 should be deleted. So the rest of the columns have strong relationship with the dependent variables. 2. Checking VIF value If some variables’ are greater than 10, it means there is multicollinearity in the model. Fortunately there are no strong correlation exists between two independent variable. In this step, I will keep all the independent variables in the model. 3. Building model with the selected variables I get the model y=668. 1274-0. 108416*X1+0. 0458433*X2+2. 7188168*X4-17. 37683*X5+3. 015829*X6-1 . 492708*X8+0. 2996025*X9-0. 011777*X10-0. 946554*X11+0. 0092905*X12-0. 52255*X13 4. Check violation of the regression assumption According to Durbin-Watson test, the assumption of independence is violated. So my model is not the right model. I need to build a new one. 5. Build new model through independent variable transformation Let X*=? X=Xn+1-Xn and find the related variables, return to Step 1. The result shows me that the value of Radj2 is not acceptable. This new model is still not good enough. 6. Build new model through dependent variable transformation Let Y*=?

Y=Yn+1-Yn and find the related variables, return to Step 1. The Radj2 is also too low to be accepted. 7. Build new model through dependent and independent variable transformation Let X*=? X=Xn+1-Xn, Y*=? Y=Yn+1-Yn and find the related variables, return to Step 1. “Stepwise” delete two columns (X2 and X3). No sign of multicollinearity. 8. Check violation of the regression assumption I. Check the Durbin-Watson test. The value is 2. 67 which is acceptable. II. III. IV. Check the “Residual by predicted plot” Check the “Residual by row plot” Check the residual distribution to find whether it is normal distributed.

There is no significant violation, so the assumption holds. 9. Check the influence of outliers Cook’s D value shows no influential outliers exist. Conclusion The final model Y* is ? y=-0. 041005*? X1-1. 60423*? X4-19. 45753*? X5+5. 0317852*? X6-0. 044223*? X7-1. 251392 * ? X8+0. 2991917*? X9-0. 024105*? X10-0. 369488*? X11+0. 010828*? X12-0. 266692*? X13 Where ? Xn=Xn+1-Xn and ? Y=Yn+1-Yn Appendix 1. We can use “Stepwise” as a preliminary tool to identify which independent variables have strong relationship with dependent variables. SSE DFE RMSE RSquare RSquare Adj 0. 348 Cp p AICc BIC 5891. 676 5945. 882 3202503. 2 494 80. 515837 0. 7406 Current Estimates Lock X Entered X X X X X X X X X X X X Step History Step 1 2 3 4 5 6 7 8 9 Parameter Column 13 Column 6 Column 11 Column 8 Column 5 Column 4 Column 12 Column 2 Column 1 Action Entered Entered Entered Entered Entered Entered Entered Entered Entered Parameter Intercept Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8 Column 9 Column 10 Column 11 Column 12 Column 13 10. 114485 12 Estimate 668. 127404 -0. 1084159 0. 04584328 0 2. 71881677 -17. 76832 3. 80158287 0 -1. 4927079 0. 29960252 -0. 0117774 -0. 9465544 0. 00929048 -0. 5225496 nDF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SS 0 70915. 94 74504. 93 727. 1941 65669. 55 156624. 3 567500. 6 18. 08276 418750. 2 144760 79068. 23 348680. 7 78260. 83 787078. 3 “F Ratio” 0. 000 10. 939 11. 493 0. 112 10. 130 24. 160 87. 539 0. 003 64. 594 22. 330 12. 197 53. 786 12. 072 121. 410 “Prob>F” 1 0. 00101 0. 00075 0. 73805 0. 00155 1. 21e-6 2. 9e-19 0. 95794 6. 8e-15 3e-6 0. 00052 9. 2e-13 0. 00056 2. 1e-25 “Sig Prob” Seq SS 0. 0000 0. 0000 0. 0000 0. 000 0. 0000 0. 0003 0. 0008 0. 0047 0. 0446 6717491 1165558 RSquare 0. 5441 0. 6386 Cp p AICc BIC 2 3 4 5 6 7 8 9 10 6156. 23 6040. 83 5983. 43 5966. 74 5938. 87 5927. 44 5918. 01 5911. 93 5909. 9 6168. 87 6057. 66 6004. 44 5991. 93 5968. 23 5960. 96 5955. 69 5953. 75 5955. 86 362. 76 185. 65 111. 65 91. 487 59. 752 47. 173 37. 058 30. 623 28. 417 494572. 6 0. 6786 144235. 5 0. 6903 219531. 7 0. 7081 94874. 88 0. 7158 78843. 4 27373. 3 0. 7222 0. 7288 54888. 16 0. 7266 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Column 9 Column 10 All Column 13 Column 6 Column 11 Column 8 Column 5 Column 4 Column 12 Column 2 Column 1 Column 9 Column 10 All Column 13 Column 6 Column 11 Column 8 Column 5 Column 4 Column 12 Column 2 Column 1 Column 9 Column 10 Entered Entered Removed Entered Entered Entered Entered Entered Entered Entered Entered Entered Entered Entered Removed Entered Entered Entered Entered Entered Entered Entered Entered Entered Entered Entered 0. 0017 0. 0005 . 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0003 0. 0008 0. 0047 0. 0446 0. 0017 0. 0005 . 0. 0000 0. 000 0. 0000 0. 0000 0. 0000 0. 0003 0. 0008 0. 0047 0. 0446 0. 0017 0. 0005 66069. 41 0. 7342 79068. 23 0. 7406 . 6717491 1165558 0. 0000 0. 5441 0. 6386 20. 265 10. 114 1393 362. 76 185. 65 111. 65 91. 487 59. 752 47. 173 37. 058 30. 623 28. 417 20. 265 10. 114 1393 362. 76 185. 65 111. 65 91. 487 59. 752 47. 173 37. 058 30. 623 28. 417 20. 265 10. 114 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 5901. 91 5891. 68 6551. 72 6156. 23 6040. 83 5983. 43 5966. 74 5938. 87 5927. 44 5918. 01 5911. 93 5909. 9 5901. 91 5891. 68 6551. 72 6156. 3 6040. 83 5983. 43 5966. 74 5938. 87 5927. 44 5918. 01 5911. 93 5909. 9 5901. 91 5891. 68 5952 5945. 88 6560. 15 6168. 87 6057. 66 6004. 44 5991. 93 5968. 23 5960. 96 5955. 69 5953. 75 5955. 86 5952 5945. 88 6560. 15 6168. 87 6057. 66 6004. 44 5991. 93 5968. 23 5960. 96 5955. 69 5953. 75 5955. 86 5952 5945. 88 494572. 6 0. 6786 144235. 5 0. 6903 219531. 7 0. 7081 94874. 88 0. 7158 78843. 4 27373. 3 0. 7222 0. 7288 54888. 16 0. 7266 66069. 41 0. 7342 79068. 23 0. 7406 . 6717491 1165558 0. 0000 0. 5441 0. 6386 494572. 6 0. 6786 144235. 5 0. 6903 219531. 7 0. 081 94874. 88 0. 7158 78843. 4 27373. 3 0. 7222 0. 7288 54888. 16 0. 7266 66069. 41 0. 7342 79068. 23 0. 7406 According to the result from “Stepwise”, it eliminates two independent variables (column 3, column 7). So the rest of the independent variables are strongly related to dependent variables. 2. Checking VIF value Watching VIF value of parameter and decide whether there is multicollinearity. Parameter Estimates Term Intercept Column 1 Estimate 668. 1274 -0. 108416 Std Error 94. 98709 0. 032779 t Ratio 7. 03 -3. 31 Prob>|t| |t| 0. 0008* 0. 0016*

Post Author: admin