CHAPTER 3 :
Linear Regression
Bias-Variance Tradeoff
3
Prediction of continuous
variables
Billionaire says: Wait, that’s not what I meant!
You says: Chill out, dude.
He says: I want to predict a continuous variable for
continuous inputs: I want to predict salaries from GPA.
You say: I can regress that…
4
The regression problem
Instances:
Learn: Mapping from x to t(x)
Hypothesis space:
Given, basis functions
Find coeffs w={w1,…,wk}
Why is this called linear regression???
model is linear in the parameters
Precisely, minimize the residual squared error:
5
The regression problem in matrix
notation
6
Regression solution=simple matrix
operations
But,why?
Billionaire(again)says:Why sum squared error???
You say:Gaussians…
Model: prediction is linear function plus Gaussian
noise
Maximizing log-likelihood
Least‐squares Linear Regression is MLE for Gaussians!!!
Applications Corner 1
Predict stock value over time from
past values
other relevant vars
e.g.,weather,demands,etc.
Applications Corner 2
Measure temperatures at
some locations
Predict temperatures
throughout the
environment
11
Bias-Variance tradeoff –Intuition
Model too “simple” → does not fit the data well
A biased solution
Model too complex → small changes to the data,
solution changes a lot
A high-variance solution
12
(Squared) Bias of learner
Given dataset D with m samples, learn function h(x)
If you sample a different datasets D, you will learn
different h(x)
Expected hypothesis: ED[h(x)]
Bias: difference between what you expect to learn
and truth
Measures how well you expect to represent true solution
Decreases with more complex model
13
Variance of learner
Given a dataset D with m samples, you learn function h(x)
If you sample a different datasets D, you will learn different
h(x)
Variance: difference between what you expect to learn and
what you learn from a from a particular dataset
Measures how sensitive learner is to specific dataset
Decreases with simpler model
14
Bias-Variance Tradeoff
Choice of hypothesis class introduces learning bias
More complex class → less bias
More complex class → more variance
More complex class → more
variance
More complex class → more variance
Collect some data, and learn a function h(x)
What are sources of prediction error?
15
Sources of error 1 –noise
What if we have perfect learner, infinite
data?
If our learning solution h(x) satisfies h(x)=g(x)
Still have remaining, unavoidable error of σ2
due to noise ε
16
Sources of error 2 –Finite data
17
What if we have imperfect learner, or only m
training examples?
What is our expected squared error per example?
Expectation taken over random training sets D of size m,
drawn from distribution P(X,T)
Bias-Variance Decomposition of Error
Assume target function: t = f(x) = g(x) + ε
18
Then expected sq error over fixed size training sets
D drawn from P(X,T) can be expressed as sum of
three components:
Where:
Bias-Variance Tradeoff
19
Choice of hypothesis class introduces learning bias
More complex class → less bias
More complex class →more variance
Training set error
20
Given a dataset (Training data)
Choose a loss function
e.g., squared error (L ) for regression
Training set error: For a particular set of
parameters, loss function on training data:
Training set error as a function
of model complexity
21
Prediction error
Training set error can be poor measure of “quality” of
solution
Prediction error: We really care about error over all possible
input points, not just training data:
22
23
Prediction error as a function of
model complexity
24
Computing prediction error
Computing prediction
hard integral
May not know t(x) for every x
Monte Carlo integration (sampling approximation)
Sample a set of i.i.d. points {x1,…,xM} from p(x)
Approximate integral with sample average
25
Why training set error doesn’t
approximate prediction error?
Sampling approximation of prediction error:
Training error :
Very similar equations!!!
Why is training set a bad measure of prediction error???
Why training set error doesn’t
approximate prediction error?
Very similar equations!!!
Why is training set a bad measure of prediction error???
26
27
Test set error
Given a dataset, randomly split it into two parts:
Training data –{x1,…, xNtrain}
Test data –{x1,…, xNtest}
Use training data to optimize parameters w
Test set error: For the final solution w*, evaluate
the error using:
Test set error as a function of
model complexity
28
Overfitting
Overfitting: a learning algorithm overfits the
training data if it outputs a solution w when there
exists another solution w’ such that:
29
How many points to use for
training/testing?
Very hard question to answer!
Too few training points, learned w is bad
Too few test points, you never know if you reached a good solution
Bounds, such as Hoeffding’s inequality can help:
More on this later this semester, but still hard to answer
Typically:
if you have a reasonable amount of data, pick test set “large enough”
for a “reasonable” estimate of error, and use the rest for learning
if you have little data, then …
30
Error estimators
31
Error as a function of number of
training examples for a fixed
model complexity
32
Error estimators
33
What you need to know
Regression
Basis function = features
Optimizing sum squared error
Relationship between regression and Gaussians
Bias-Variance trade-off
Play with Applet
True error, training error, test error
Never learn on the test data
Overfitting
34
本文档为【ml-chap3-Regression】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。