Xβ=y,
XX=USVT,
S=diag(si)
ββOLS=VS−1UT
However, this approach fails as soon there is one singular value which is zero (as then the inverse does not exists). Moreover, even if no si is excatly zero, numerically small singular values can render the matrix ill-conditioned and lead to a solution which is highly susceptible to errors.
Ridge regression and PCA present two methods to avoid these problems. Ridge regression replaces S−1 in the above equation for β by
S−1ridgeβridge=diag(sis2i+α),= VS−1ridgeUT
PCA replaces S−1 by
S−1PCAβPCA=diag(1siθ(si−γ)),= VS−1PCAUT
wehre θ is the step function, and γ is the threshold parameter.
Both methods thus weaken the impact of subspaces corresponding to small values. PCA does that in a hard way, while the ridge is a smoother approach.
More abstractly, feel free to come up with your own regularization scheme
S−1myReg=diag(R(si)),
where R(x) is a function that should approach zero for x→0 and R(x)→x−1 for x large. But remember, there's no free lunch.