搜索
您的当前位置:首页正文

(金融保险类)金融数据挖掘与应用课程作业

来源:小奈知识网
《金融数据挖掘与应用》课程作业

基于GLM(广义线性模型)的数据分析

SAS里的GLM应用在实际中比较广泛,对数据的分析具有比较强的普适性。趋势面回归分析(Trend Analysis) 是以多元回归分析为理论基础的一种预测与统计技术。它用空间坐标法进行多项式回归,从中估计出最佳的回归模型,因此也被称为趋势面分析,当不知道手中的数据呈线性还是非线性相关时,可以采用趋势面数据分析方法,以便找出拟合数据的最佳统计预测模型。

本文运用GLM对一定的数据进行GLM分析。

一、 数据与要求

此处选取15名吧不同程度的烟民的每日饮酒(啤酒)量与心电图指标(zb)

的对应数据。然后设法建立zb与日抽烟量(X)/支和日饮酒量(y)/升之间的关系。

序号 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 组别 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 日抽烟量(x)/支 30 25 35 40 45 20 18 25 25 23 40 45 48 50 55 日饮酒量(y)/升 10 11 13 14 14 12 11 12 13 13 14 15 16 18 19 心电图指标(zb) 280 260 330 400 410 270 210 280 300 290 410 420 425 450 470 二、 运用GLM过程进行趋势面分析

1. 趋势分析的GLM程序

data beer;

input obsn x y zb; cards;

01 30 10 280 02 25 11 260

1

《金融数据挖掘与应用》课程作业

03 35 13 330 04 40 14 400 05 45 14 410 06 20 12 270 07 18 11 210 08 25 12 280 09 25 13 300 10 23 13 290 11 40 14 410 12 45 15 420 13 48 16 425 14 50 18 450 15 55 19 470 ;

proc glm;

model zb=x y/p; proc glm;

model zb=x y x*x x*y y*y/p; proc glm;

model zb=x y x*x*x x*x*y x*y*y y*y*y/p; proc glm;

model zb=x y x*x*x x*x*y x*y*y y*y*y x*x*x*x x*x*x*y x*x*y*y x*y*y*y y*y*y*y/p; run;

2. 四种分析模型结果 (1)一阶趋势模型

Dependent Variable: zb

源变量 自由度 平方和 均值 F值 概率值 Sum of

Source DF Squares Mean Square F Value Pr > F Model 2 90615.20993 45307.60497 127.19 <.0001 Error 12 4274.79007 356.23251 Corrected Total

14 94890.00000

R-Square Coeff Var Root MSE zb Mean 0.954950 5.439228 18.87412 347.000

--------------------------------------------------------------------------------------------------------------------------------- Source DF Type I SS Mean Square F Value Pr > F x 1 89541.56558 89541.56558 251.36 <.0001 y 1 1073.64435 1073.64435 3.01 0.1081

--------------------------------------------------------------------------------------------------------------------------------- Source DF Type III SS Mean Square F Value Pr > F

2

《金融数据挖掘与应用》课程作业

x 1 14652.24351 14652.24351 41.13 <.0001 y 1 1073.64435 1073.64435 3.01 0.1081

---------------------------------------------------------------------------------------------------------------------------------

Standard

Parameter Estimate Error t Value Pr > |t| Intercept 64.04999380 33.06539919 1.94 0.0766 x 5.38385565 0.83947567 6.41 <.0001 y 6.94199869 3.99872078 1.74 0.1081

Observation Observed Predicted Residual

1 280.0000000 294.9856503 -14.9856503

2 260.0000000 275.0083707 -15.0083707 3 330.0000000 342.7309246 -12.7309246 4 400.0000000 376.5922015 23.4077985 5 410.0000000 403.5114798 6.4885202 6 270.0000000 255.0310911 14.9689089 7 210.0000000 237.3213811 -27.3213811 8 280.0000000 281.9503694 -1.9503694 9 300.0000000 288.8923681 11.1076319 10 290.0000000 278.1246568 11.8753432 11 410.0000000 376.5922015 33.4077985 12 420.0000000 410.4534785 9.5465215 13 425.0000000 433.5470441 -8.5470441 14 450.0000000 458.1987528 -8.1987528 15 470.0000000 492.0600298 -22.0600298

--------------------------------------------------------------------------------------------------------------------------------- Sum of Residuals -0.000000 Sum of Squared Residuals 4274.790069 Sum of Squared Residuals - Error SS -0.000000 First Order Autocorrelation 0.235461 Durbin-Watson D 1.362704

(2)二阶趋势模型

Dependent Variable: zb

源变量 自由度 平方和 均值 F值 概率值

Sum of

Source DF Squares Mean Square F Value Pr > F Model 5 93330.83580 18666.16716 107.75 <.0001 Error 9 1559.16420 173.24047 Corrected Total

14 94890.00000

R-Square Coeff Var Root MSE zb Mean 0.983569 3.793108 13.16208 347.0000

--------------------------------------------------------------------------------------------------------------------------------

3

《金融数据挖掘与应用》课程作业

Source DF Type I SS Mean Square F Value Pr > F X 1 89541.56558 89541.56558 516.86 <.0001 y 1 1073.64435 1073.64435 6.20 0.0345 x*x 1 1892.86626 1892.86626 10.93 0.0091 x*y 1 772.91658 772.91658 4.46 0.0638 y*y 1 49.84303 49.84303 0.29 0.6047 Source DF Type III SS Mean Square F Value Pr > F x 1 965.2913631 965.2913631 5.57 0.0426 y 1 127.4395437 127.4395437 0.74 0.4133 x*x 1 43.6622972 43.6622972 0.25 0.6277 x*y 1 242.0343234 242.0343234 1.40 0.2675 y*y 1 49.8430316 49.8430316 0.29 0.6047

Standard

Parameter Estimate Error t Value Pr > |t| Intercept -262.7664793 109.1074817 -2.41 0.0394 x 16.0699779 6.8078620 2.36 0.0426 y 23.5391327 27.4449867 0.86 0.4133 x*x 0.0638773 0.1272383 0.50 0.6277 x*y -1.1651016 0.9857119 -1.18 0.2675 y*y 1.1673362 2.1762982 0.54 0.6047

--------------------------------------------------------------------------------------------------------------------------------- Observation Observed Predicted Residual

1 280.0000000 279.4168700 0.5831300

2 260.0000000 258.6814596 1.3185404 3 330.0000000 351.0997183 -21.0997183 4 400.0000000 388.1251282 11.8748718 5 410.0000000 414.0657505 -4.0657505 6 270.0000000 255.1256024 14.8743976 7 210.0000000 216.6773768 -6.6773768 8 280.0000000 279.9417834 0.0582166 9 300.0000000 303.5367795 -3.5367795 10 290.0000000 295.5572467 -5.5572467 11 410.0000000 388.1251282 21.8748718 12 420.0000000 419.0280585 0.9719415 13 425.0000000 436.4318573 -11.4318573 14 450.0000000 453.7554706 -3.7554706 15 470.0000000 465.4317699 4.5682301

--------------------------------------------------------------------------------------------------------------------------------- Sum of Residuals -0.000000 Sum of Squared Residuals 1559.164195 Sum of Squared Residuals - Error SS -0.000000 First Order Autocorrelation -0.354205 Durbin-Watson D 2.694808

4

《金融数据挖掘与应用》课程作业

(3)三阶趋势模型

Dependent Variable: zb

源变量 自由度 平方和 均值 F值 概率值 Sum of

Source DF Squares Mean Square F Value Pr > F Model 6 93393.46414 15565.57736 83.21 <.0001 Error 8 1496.53586 187.06698 Corrected Total

14 94890.00000

R-Square Coeff Var Root MSE zb Mean 0.984229 3.941569 13.67724 347.0000

Source DF Type I SS Mean Square F Value Pr > F x 1 89541.56558 89541.56558 478.66 <.0001 y 1 1073.64435 1073.64435 5.74 0.0435 x*x*x 1 2078.77664 2078.77664 11.11 0.0103 x*x*y 1 508.85526 508.85526 2.72 0.1377 x*y*y 1 17.50614 17.50614 0.09 0.7675 y*y*y 1 173.11616 173.11616 0.93 0.3642

--------------------------------------------------------------------------------------------------------------------------------- Source DF Type III SS Mean Square F Value Pr > F x 1 1643.347081 1643.347081 8.78 0.0180 y 1 197.474017 197.474017 1.06 0.3343 x*x*x 1 105.516422 105.516422 0.56 0.4741 x*x*y 1 113.710330 113.710330 0.61 0.4580 x*y*y 1 146.610010 146.610010 0.78 0.4018 y*y*y 1 173.116161 173.116161 0.93 0.3642 Standard

Parameter Estimate Error t Value Pr > |t|

Intercept -166.0074589 82.37772231 -2.02 0.0786 x 11.1382598 3.75795233 2.96 0.0180 y 15.7784340 15.35703905 1.03 0.3343 x*x*x -0.0154132 0.02052250 -0.75 0.4741 x*x*y 0.1203187 0.15432333 0.78 0.4580 x*y*y -0.3416786 0.38595313 -0.89 0.4018 y*y*y 0.3134894 0.32587614 0.96 0.3642 Observation Observed Predicted Residual

1 280.0000000 281.0906363 -1.0906363

2 260.0000000 256.0483783 3.9516217 3 330.0000000 351.8935219 -21.8935219 4 400.0000000 390.5707896 9.4292104 5 410.0000000 409.2309652 0.7690348 6 270.0000000 257.9983490 12.0016510 7 210.0000000 220.0483966 -10.0483966

5

《金融数据挖掘与应用》课程作业

8 280.0000000 275.0160368 4.9839632 9 300.0000000 299.4709973 0.5290027 10 290.0000000 295.8228899 -5.8228899 11 410.0000000 390.5707896 19.4292104 12 420.0000000 420.5758580 -0.5758580 13 425.0000000 437.4437284 -12.4437284 14 450.0000000 455.6875798 -5.6875798 15 470.0000000 463.5310833 6.4689167

--------------------------------------------------------------------------------------------------------------------------------- Sum of Residuals -0.000000 Sum of Squared Residuals 1496.535862 Sum of Squared Residuals - Error SS -0.000000 First Order Autocorrelation -0.357545 Durbin-Watson D 2.686333

--------------------------------------------------------------------------------------------------------------------------------

(4) 四阶趋势模型

Dependent Variable: zb

源变量 自由度 平方和 均值 F值 概率值 Sum of

Source DF Squares Mean Square F Value Pr > F Model 11 94480.31919 8589.11993 62.90 0.0029 Error 3 409.68081 136.56027 Corrected Total

14 94890.00000

R-Square Coeff Var Root MSE zb Mean 0.995683 3.367695 11.68590 347.0000

Source DF Type I SS Mean Square F Value Pr > F x 1 89541.56558 89541.56558 655.69 0.0001 y 1 1073.64435 1073.64435 7.86 0.0676 x*x*x 1 2078.77664 2078.77664 15.22 0.0299 x*x*y 1 508.85526 508.85526 3.73 0.1491 x*y*y 1 17.50614 17.50614 0.13 0.7440 y*y*y 1 173.11616 173.11616 1.27 0.3421 x*x*x*x 1 52.91566 52.91566 0.39 0.5777 x*x*x*y 1 193.81980 193.81980 1.42 0.3192 x*x*y*y 1 452.42798 452.42798 3.31 0.1663 x*y*y*y 1 40.32879 40.32879 0.30 0.6246 y*y*y*y 1 347.36281 347.36281 2.54 0.2090

--------------------------------------------------------------------------------------------------------------------------------- Source DF Type III SS Mean Square F Value Pr > F x 1 53.8347354 53.8347354 0.39 0.5746 y 1 18.4422458 18.4422458 0.14 0.7376

6

《金融数据挖掘与应用》课程作业

x*x*x 1 707.3985134 707.3985134 5.18 0.1073 x*x*y 1 688.7276032 688.7276032 5.04 0.1104 x*y*y 1 669.2155979 669.2155979 4.90 0.1137 y*y*y 1 614.9897506 614.9897506 4.50 0.1239 x*x*x*x 1 73.5254957 73.5254957 0.54 0.5162 x*x*x*y 1 21.5720987 21.5720987 0.16 0.7176 x*x*y*y 1 150.8940383 150.8940383 1.10 0.3704 x*y*y*y 1 264.7516451 264.7516451 1.94 0.2581 y*y*y*y 1 347.3628138 347.3628138 2.54 0.2090 Standard

Parameter Estimate Error t Value Pr > |t| Intercept -748.5352475 602.9093096 -1.24 0.3026 x 21.5268501 34.2855706 0.63 0.5746 y 63.4532525 172.6669316 0.37 0.7376 x*x*x 1.1129083 0.4889782 2.28 0.1073 x*x*y -7.8466442 3.4939960 -2.25 0.1104 x*y*y 17.6919599 7.9919932 2.21 0.1137 y*y*y -12.8173180 6.0398396 -2.12 0.1239 x*x*x*x -0.0052895 0.0072088 -0.73 0.5162 x*x*x*y -0.0339628 0.0854515 -0.40 0.7176 x*x*y*y 0.4218127 0.4012785 1.05 0.3704 x*y*y*y -1.0952733 0.7866207 -1.39 0.2581 y*y*y*y 0.8411079 0.5273783 1.59 0.2090

Observation Observed Predicted Residual

1 280.0000000 280.6428697 -0.6428697

2 260.0000000 254.9148649 5.0851351 3 330.0000000 336.2353148 -6.2353148 4 400.0000000 399.8451524 0.1548476 5 410.0000000 409.0029100 0.9970900 6 270.0000000 265.5623644 4.4376356 7 210.0000000 212.0079405 -2.0079405 8 280.0000000 287.4716063 -7.4716063 9 300.0000000 292.6701245 7.3298755 10 290.0000000 295.8090433 -5.8090433 11 410.0000000 399.8451524 10.1548476 12 420.0000000 428.1747562 -8.1747562 13 425.0000000 422.5228478 2.4771522 14 450.0000000 450.5733972 -0.5733972 15 470.0000000 469.7216557 0.2783443

--------------------------------------------------------------------------------------------------------------------------------- Sum of Residuals 0.0000000 Sum of Squared Residuals 409.6807042 Sum of Squared Residuals - Error SS -0.0001104 First Order Autocorrelation -0.6992027

7

《金融数据挖掘与应用》课程作业

Durbin-Watson D 3.3972074

---------------------------------------------------------------------------------------------------------------------------------

三、 结果分析

将四种分析结果的主要统计量列于下表: 趋势分析 概率值 判定系数误差均方根 偏态系数 残差独立性 R-Square Root MSE Coeff Var D-W D Pr〉F 0.954950 18.87412 5.439228 1.362704 一阶 〈0.0001 0.983569 13.16208 3.793108 2.694808 二阶 〈0.0001 0.984229 13.67724 3.941569 2.686333 三阶 〈0.0001 0.0029 0.995683 11.68590 3.367695 3.3972074 四阶 当概率P值都显著(〈a=0.05)时,首先观察概率P值最小者,此处将排除四阶,然后取判定系数较大者,此处选取三阶。

显然,三阶的判定系数比二阶要大,不足之处是误差均方根和偏态系数都相对大一些,而且残差独立性检验不大合格。

因此,本数据应采用三阶回归分析,其预测模型如下:

心电图指标(zb)=-166+11.14x+15.78y-0.015x3+0.12x2*y-0.34x*y2+0.313y3

8

因篇幅问题不能全部显示,请点此查看更多更全内容

Top