SIXTH EDITION
Applied Multivariate
Statistical Analysis
RICHARD A. JOHNSON
University of Wisconsin—Madison
DEAN W. WICHERN
Texas A&M University
Pearson Education International
Contents
PREFACE xv
1 ASPECTS OF MULTIVARIATE ANALYSIS 1
1.1 Introduction 1
1.2 Applications of Multivariate Techniques 3
1.3 The Organization of Data 5
Arrays, 5
Descriptive Statistics, 6
Graphical Techniques, 11
1.4 Data Displays and Pictorial Representations 19
Linking Multiple Two-Dimensional Scatter Plots, 20
Graphs of Growth Curves, 24
Stars, 26
Chernoff Faces, 27
1.5 Distance 30
1.6 Final Comments 37
Exercises 37
References 47
2 MATRIX ALGEBRA AND RANDOM VECTORS 49
2.1 Introduction 49
2.2 Some Basics of Matrix and Vector Algebra 49
Vectors, 49
Matrices, 54
2.3 Positive Definite Matrices 60 **
2.4 A Square-Root Matrix 65
2.5 Random Vectors and Matrices 66
2.6 Mean Vectors and Covariance Matrices 68
Partitioning the Covariance Matrix, 73
The Mean Vector and Covariance Matrix
for Linear Combinations of Random Variables, 75
Partitioning the Sample Mean Vector
and Covariance Matrix, 77
2.7 Matrix Inequalities and Maximization 78
vii
viii Contents
Supplement 2A: Vectors and Matrices: Basic Concepts 82
Vectors, 82
Matrices, 87
Exercises 103
References 110
3 SAMPLE GEOMETRY AND RANDOM SAMPLING 111
3.1 Introduction 111
3.2 The Geometry of the Sample 111
3.3 Random Samples and the Expected Values of the Sample Mean and
Covariance Matrix 119
3.4 Generalized Variance 123
Situations in which the Generalized Sample Variance Is Zero, 129
Generalized Variance Determined by\R\
and Its Geometrical Interpretation, 134
Another Generalization of Variance, 137
3.5 Sample Mean, Covariance, and Correlation
As Matrix Operations 137
3.6 Sample Values of Linear Combinations of Variables 140
Exercises 144
References 148
4 THE MULTIVARIATE NORMAL DISTRIBUTION 149
4.1 Introduction 149
4.2 The Multivariate Normal Density and Its Properties 149
Additional Properties of the Multivariate
Normal Distribution, 156
4.3 Sampling from a Multivariate Normal Distribution
and Maximum Likelihood Estimation 168
The Multivariate Normal Likelihood, 168
Maximum Likelihood Estimation offi and X, 170
Sufficient Statistics, 173
4.4 The Sampling Distribution of X and S 173
Properties of the Wishart Distribution, 174
4.5 Large-Sample Behavior of X and S 175
4.6 Assessing the Assumption of Normality 177
Evaluating the Normality of the Univariate Marginal Distributions, 177
Evaluating Bivariate Normality, 182
4.7 Detecting Outliers and Cleaning Data 187
Steps for Detecting Outliers, 189
4.8 Transformations to Near Normality 192
Transforming Multivariate Observations, 195
Exercises 200
References 208
Contents ix
5 INFERENCES ABOUT A MEAN VECTOR 210
5.1 Introduction 210
5.2 The Plausibility of /JL0 as a Value for a Normal
Population Mean 210
5.3 Hotelling's T2 and Likelihood Ratio Tests 216
General Likelihood Ratio Method, 219
5.4 Confidence Regions and Simultaneous Comparisons
of Component Means 220
Simultaneous Confidence Statements, 223
A Comparison of Simultaneous Confidence Intervals
with One-at-a-Time Intervals, 229
The Bonferroni Method of Multiple Comparisons, 232
5.5 Large Sample Inferences about a Population Mean Vector 234
5.6 Multivariate Quality Control Charts 239
Charts for Monitoring a Sample of Individual Multivariate Observations
for Stability, 241
Control Regions for Future Individual Observations, 247
Control Ellipse for Future Observations, 248
T2-Chart for Future Observations, 248
Control Charts Based on Subsample Means, 249
Control Regions for Future Subsample Observations, 251
5.7 Inferences about Mean Vectors
when Some Observations Are Missing 251
5.8 Difficulties Due to Time Dependence
in Multivariate Observations 256
Supplement 5A: Simultaneous Confidence Intervals and Ellipses
as Shadows of the p-Dimensional Ellipsoids 258
Exercises 261
References 272
6 COMPARISONS OF SEVERAL MULTIVARIATE MEANS 273
6.1 Introduction 273
6.2 Paired Comparisons and a Repeated Measures Design 273
Paired Comparisons, 273
A Repeated Measures Design for CompaTing Treatments, 279
6.3 Comparing Mean Vectors from Two Populations 284
Assumptions Concerning the Structure of the Data, 284
Further Assumptions When n\ and n2Are Small, 285
Simultaneous Confidence Intervals, 288
The Two-Sample Situation When 2) ¥= X2, 291
An Approximation to the Distribution of T2 for Normal Populations
When Sample Sizes Are Not Large, 294
6.4 Comparing Several Multivariate Population Means
(One-Way Manova) 296
Assumptions about the Structure of the Data for One-Way MANOVA, 296
Contents
A Summary of Univariate ANOVA, 297
Multivariate Analysis of Variance (MANOVA), 301
6.5 Simultaneous Confidence Intervals for Treatment Effects 308
6.6 Testing for Equality of Covariance Matrices 310
6.7 Two-Way Multivariate Analysis of Variance 312
Univariate Two-Way Fixed-Effects Model with Interaction, 312
Multivariate Two-Way Fixed-Effects Model with Interaction, 315
6.8 Profile Analysis 323
6.9 Repeated Measures Designs and Growth Curves 328
6.10 Perspectives and a Strategy for Analyzing
Multivariate Models 332
Exercises 337
References 358
7 MULTIVARIATE LINEAR REGRESSION MODELS 360
7.1 Introduction 360
7.2 The Classical Linear Regression Model 360
7.3 Least Squares Estimation 364
Sum-of-Squares Decomposition, 366
Geometry of Least Squares, 367
Sampling Properties of Classical Least Squares Estimators, 369
7.4 Inferences About the Regression Model 370
Inferences Concerning the Regression Parameters, 370
Likelihood Ratio Tests for the Regression Parameters, 374
7.5 Inferences from the Estimated Regression Function 378
Estimating the Regression Function at z0, 378
Forecasting a New Observation at z0,379
7.6 Model Checking and Other Aspects of Regression 381
Does the Model Fit?, 381
Leverage and Influence, 384
Additional Problems in Linear Regression, 384
7.7 Multivariate Multiple Regression 387
Likelihood Ratio Tests for Regression Parameters, 395
Other Multivariate Test Statistics, 398
Predictions from Multivariate Multiple Regressions, 399
7.8 The Concept of Linear Regression 4Q1
Prediction of Several Variables, 406
Partial Correlation Coefficient, 409
7.9 Comparing the Two Formulations of the Regression Model 410
Mean Corrected Form of the Regression Model, 410
Relating the Formulations, 412
7.10 Multiple Regression Models with Time Dependent Errors 413
Supplement 7A: The Distribution of the Likelihood Ratio
for the Multivariate Multiple Regression Model 418
Exercises 420
References 428
Contents xi
8 PRINCIPAL COMPONENTS 430
8.1 Introduction 430
8.2 Population Principal Components 430
Principal Components Obtained from Standardized Variables, 436
Principal Components for Covariance Matrices
with Special Structures, 439
8.3 Summarizing Sample Variation by Principal Components 441
The Number of Principal Components, 444-
Interpretation of the Sample Principal Components, 448
Standardizing the Sample Principal Components, 449
8.4 Graphing the Principal Components 454
8.5 Large Sample Inferences^ 456
Large Sample Properties o/A/ and e,-, 456
Testing for the Equal Correlation Structure, 457
8.6 Monitoring Quality with Principal Components 459
Checking.a Given Set of Measurements for Stability, 459
Controlling Future Values, 463
Supplement 8A: The Geometry of the Sample Principal
Component Approximation 466
The p-Dimensional Geometrical Interpretation, 468
The n-Dimensional Geometrical Interpretation, 469
Exercises 470
References 480
9 FACTOR ANALYSIS AND INFERENCE
FOR STRUCTURED COVARIANCE MATRICES 481
9.1 Introduction 481
9.2 The Orthogonal Factor Model 482
9.3 Methods of Estimation 488
The Principal Component (and Principal Factor) Method, 488
A Modified Approach—the Principal Factor Solution, 494
The Maximum Likelihood Method, 495
A Large Sample Test for the Number of Common Factors, 501
9.4 Factor Rotation 504 .
Oblique Rotations, 512
9.5 Factor Scores 513
The Weighted Least Squares Method, 514
The Regression Method, 516
9.6 Perspectives and a Strategy for Factor Analysis 519
Supplement 9A: Some Computational Details
for Maximum Likelihood Estimation 527
Recommended Computational Scheme, 528
Maximum Likelihood Estimators of p = LzLj + iftz 529
Exercises 530
References 538
xii Contents
10 CANONICAL CORRELATION ANALYSIS 539
10.1 Introduction 539
10.2 Canonical Variates and Canonical Correlations 539
10.3 Interpreting the Population Canonical Variables 545
Identifying the Canonical Variables, 545
Canonical Correlations as Generalizations
of Other Correlation Coefficients, 547
The First r Canonical Variables as a Summary of Variability, 548
A Geometrical Interpretation of the Population Canonical
Correlation Analysis 549
10.4 The Sample Canonical Variates and Sample
Canonical Correlations 550
10.5 Additional Sample Descriptive Measures 558
Matrices of Errors of Approximations, 558
Proportions of Explained Sample Variance, 561
10.6 Large Sample Inferences 563
Exercises 567
References 574
11 DISCRIMINATION AND CLASSIFICATION 575
11.1 Introduction 575
11.2 Separation and Classification for Two Populations 576
11.3 Classification with Two Multivariate Normal Populations 584
Classification of Normal Populations When X1 = 2 2 = % 584
Scaling, 589
Fisher's Approach to Classification with Two Populations, 590
Is Classification a Good Idea?, 592
Classification of Normal Populations When %i ^ X2, 593
11.4 Evaluating Classification Functions 596
11.5 Classification with Several Populations 606
The Minimum Expected Cost of Misclassification Method, 606
Classification with Normal Populations, 609
11.6 Fisher's Method for Discriminating
among Several Populations 621
Using Fisher's Discriminants to Classify Objects, 628
11.7 Logistic Regression and Classification 634
Introduction, 634
The Logit Model, 634
Logistic Regression Analysis, 636
Classification, 638
Logistic Regression with Binomial Responses, 640
11.8 Final Comments 644
Including Qualitative Variables, 644
Classification Trees, 644
Neural Networks, 647
Selection ofVariables, 648
Contents xiii
Testing for Group Differences, 648
Graphics, 649
Practical Considerations Regarding Multivariate Normality, 649
Exercises 650
References 669
12 CLUSTERING, DISTANCE METHODS, AND ORDINATION 671
12.1 Introduction 671
12.2 Similarity Measures 673
Distances and Similarity Coefficients for Pairs of Items, 673
Similarities and Association Measures
for Pairs of Variables, 677
Concluding Comments on Similarity, 678
12.3 Hierarchical Clustering Methods 680
Single Linkage, 682
Complete Linkage, 685
Average Linkage, 690
Ward's Hierarchical Clustering Method, 692
Final Comments—Hierarchical Procedures, 695
VIA Nonhierarchical Clustering Methods 696
K-means Method, 696
Final Comments—Nonhierarchical Procedures, 701
12.5 Clustering Based on Statistical Models 703
12.6 Multidimensional Scaling 706
The Basic Algorithm, 708
12.7 Correspondence Analysis 716
Algebraic Development of Correspondence Analysis, 718
Inertia, 725
Interpretation in Two Dimensions, 726
Final Comments, 726
12.8 Biplots for Viewing Sampling Units and Variables 726
Constructing Biplots, 727
12.9 Procrustes Analysis: A Method
for Comparing Configurations 732
Constructing the Procrustes Measure of Agreement, 733
Supplement 12A: Data Mining 740
Introduction, 740 &
The Data Mining Process, 741
Model Assessment, 742
Exercises 747
References 755
APPENDIX 757
DATA INDEX 764
SUBJECT INDEX 767
本文档为【Applied Multivariate Statistical Analysis (5th Ed)】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。