## Index

# Lecture7

Note for Coursera Machine Learning made by **Andrew Ng**.

## Regularization

### The problem of overfitting

#### Overfitting examle (Intro)

**Overfitting**: If we have too many features, the learned hypothesis may fit the training set very well (), but fail to generalize to new examples (predict prices on new examples).

Example of overfitting in classification problem

#### Addressing overfitting

For example we have an overfitting example with 100 features (

- In order to address it, we have following options:
- Reduce number of features.
- Manually select which features to keep.
- Model selection algorithm (later in course).

- Regularization
- Keep all the features, but reduce magnitude/values of parameters
- Works well when we have a lot of features, each of which contributes a bit to predicting
.

- Keep all the features, but reduce magnitude/values of parameters

- Reduce number of features.

##### Link to coursera section

https://www.coursera.org/learn/machine-learning/supplement/VTe37/the-problem-of-overfitting

### Cost fumction

#### Intuition

- Suppose we penalize and make
, really small. *Where**and

Then we can flattern the graph by doing above process. (see pink part in the figure)

#### Regularization

- Small values for parameters
- “Simpler” hypothesis
- Less prone to overfitting

- Housing:
- Features:
- Parameters:
(example in intuition part)

- Features:

**Below is the cost function with regularization term**

- The
, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated. used here is to make sure we are dealing with magnitude/values of parameters

#### What if is set to an extremely large value?

- Will cause underfit. (see reason below)

##### Link to coursera section

https://www.coursera.org/learn/machine-learning/supplement/1tJlY/cost-function

### Regularized linear regression

#### Gradient descent

**Note!**

The regularization term sum from **bias term()**.

**For more info**:**Bias term info**:

#### Normal equation

#### Normal equation Non-invertibility

- proof and more details, see below:

https://web.mit.edu/zoya/www/linearRegression.pdf

#### Link to coursera section

https://www.coursera.org/learn/machine-learning/supplement/pKAsc/regularized-linear-regression

### Regularized logistic regression

- The second sum,
means to explicitly exclude the bias term, . I.e. the vector is indexed from 0 to n (holding n+1 values, through ), and this sum explicitly skips , by running from 1 to n, skipping 0.

#### Gradient descent

#### Advanced optimization

#### Link to coursera section

https://www.coursera.org/learn/machine-learning/supplement/v51eg/regularized-logistic-regression