cost function in machine learning

The goal of Cost Function in Machine Learning is to start on a random point and find the global minimum point where the slope of the curve is almost zero. The gradient at a point is the vector of partial derivates (∂J/∂m)(∂J/∂c), where the direction represents the greatest rate of increase of the function. Therefore, starting at a point on the surface, to move towards the minimum we should move in the negative direction of the gradient at that point.

To train the model we actually predict the new value for given independent features, however, that features have some real value in datasets. In Regression, if model predicted value is closer to corresponding real value will be the optimal model. Cost function measure how close predicted with respect to real value. Gradient Descent method will be used to minimize the cost function.

Gradient-based and gradient free algorithms are tow type of algorithms to solve the model optimization based problem.

Gradient Descent method uses three steps to optimize the model.

  1. Search Direction.
  2. Step Size.
  3. Convergence check.

As we know the slop of the line will represent with below equation

cost function

In Regression, we represent the model in hypothesis term which develops from slop of line formula.

cost function

In regression, we find the accuracy of the cost function.

cost function3

The goal is to minimize costvalue.


To minimize the 5 will do partial differentiation with respect to 6.


In below equation, we use 8formula and replace 9 value with hypothesis equation.

cost function11


When we are partial differentiation with respect to111  will consider  112as a constant and differentiation of constant 112become zero.


If 111 is constant then differentiation of 111  become zero.



231is the learning rate.

m is no. of rows.

n is no. columns.

15 16

cost function machine learning

In starting  6values is high because model picks random  6values and start tuning cost function by differentiation. After differentiation of new 6  value it reduced and try to reach the bottom of the slope. If error decrease it means the model hopefully reaching to global minima or convergence point if error increase mean model overshooting and model is going to far from global minima point.

At that time if we adjust our  231learning rate our model will quickly reach to global minima point.cost

If our learning rate value is very (case 1) low then because of reducing new  6 value model try to reach global minima point but because of low learning rate it takes very little step size and takes a long time to reach global minima point.

In other cases, if learning rate value is high (case 2) step size crosses the global minima point and reach to another side of the curve by doing this it takes a long time to reach global minima point. So we need to define the   231value in such a way so it reaches to global minima fast, normally it value is 0.1.

cost function in machine learning

So as we can look above screenshot in starting when 6 value is random our model prediction is very bad but as soon as the new decreased 6value with proper 231 it optimizes the model performance when 6value near global minima where the model predicted value is same as actual value model performance improved and it is the best fit model.

Wrapping Up:- Cost function in machine learning is used to measure how close our model predicted value with respect to real value. Gradient Descent method will be used to minimize the cost function.

You May Also Like