Students learn why calculating full gradients is infeasible for large datasets and introduce the stochastic approximation. They analyze the noise introduced by minibatches and its effect on convergence.

Similar Lessons