I needed an algorithm that has
- little branching (hopefully CMOVs)
- no trigonometric function calls
- high numerical accuracy even with 32 bit floats
We want to calculate c1,s1,c2,s2,σ1 and σ2 as follows:
A=USV, which can be expanded like:
[acbd]=[c1−s1s1c1][σ100σ2][c2s2−s2c2]
The main idea is to find a rotation matrix V that diagonalizes ATA, that is VATAVT=D is diagonal.
Recall that
USV=A
US=AV−1=AVT (since V is orthogonal)
VATAVT=(AVT)TAVT=(US)TUS=STUTUS=D
Multiplying both sides by S−1 we get
(S−TST)UTU(SS−1)=UTU=S−TDS−1
Since D is diagonal, setting S to D−−√ will give us UTU=Identity, meaning U is a rotation matrix, S is a diagonal matrix, V is a rotation matrix and USV=A, just what we are looking for.
Calculating the diagonalizing rotation can be done by solving the following equation:
t22−β−αγt2−1=0
where
ATA=[abcd][acbd]=[a2+c2ab+cdab+cdb2+d2]=[αγγβ]
and t2 is the tangent of angle of V. This can be derived by expanding VATAVT and making its off-diagonal elements equal to zero (they are equal to each other).
The problem with this method is that it loses significant floating point precision when calculating β−α and γ for certain matrices, because of the subtractions in the calculations. The solution for this is to do an RQ decomposition (A=RQ, R upper triangular and Q orthogonal) first, then use the algorithm to factorize USV′=R. This gives USV=USV′Q=RQ=A. Notice how setting d to 0 (as in R) eliminates some of the additions/subtractions. (The RQ decomposition is fairly trivial from the expansion of the matrix product).
The algorithm naively implemented this way has some numerical and logical anomalies (e.g. is S +D−−√ or −D−−√), which I fixed in the code below.
I threw about 2000 million randomized matrices at the code, and the largest numerical error produced was around 6⋅10−7 (with 32 bit floats, error=||USV−M||/||M||). The algorithm runs in about 340 clock cycles (MSVC 19, Ivy Bridge).
template <class T>
void Rq2x2Helper(const Matrix<T, 2, 2>& A, T& x, T& y, T& z, T& c2, T& s2) {
T a = A(0, 0);
T b = A(0, 1);
T c = A(1, 0);
T d = A(1, 1);
if (c == 0) {
x = a;
y = b;
z = d;
c2 = 1;
s2 = 0;
return;
}
T maxden = std::max(abs(c), abs(d));
T rcmaxden = 1/maxden;
c *= rcmaxden;
d *= rcmaxden;
T den = 1/sqrt(c*c + d*d);
T numx = (-b*c + a*d);
T numy = (a*c + b*d);
x = numx * den;
y = numy * den;
z = maxden/den;
s2 = -c * den;
c2 = d * den;
}
template <class T>
void Svd2x2Helper(const Matrix<T, 2, 2>& A, T& c1, T& s1, T& c2, T& s2, T& d1, T& d2) {
// Calculate RQ decomposition of A
T x, y, z;
Rq2x2Helper(A, x, y, z, c2, s2);
// Calculate tangent of rotation on R[x,y;0,z] to diagonalize R^T*R
T scaler = T(1)/std::max(abs(x), abs(y));
T x_ = x*scaler, y_ = y*scaler, z_ = z*scaler;
T numer = ((z_-x_)*(z_+x_)) + y_*y_;
T gamma = x_*y_;
gamma = numer == 0 ? 1 : gamma;
T zeta = numer/gamma;
T t = 2*impl::sign_nonzero(zeta)/(abs(zeta) + sqrt(zeta*zeta+4));
// Calculate sines and cosines
c1 = T(1) / sqrt(T(1) + t*t);
s1 = c1*t;
// Calculate U*S = R*R(c1,s1)
T usa = c1*x - s1*y;
T usb = s1*x + c1*y;
T usc = -s1*z;
T usd = c1*z;
// Update V = R(c1,s1)^T*Q
t = c1*c2 + s1*s2;
s2 = c2*s1 - c1*s2;
c2 = t;
// Separate U and S
d1 = std::hypot(usa, usc);
d2 = std::hypot(usb, usd);
T dmax = std::max(d1, d2);
T usmax1 = d2 > d1 ? usd : usa;
T usmax2 = d2 > d1 ? usb : -usc;
T signd1 = impl::sign_nonzero(x*z);
dmax *= d2 > d1 ? signd1 : 1;
d2 *= signd1;
T rcpdmax = 1/dmax;
c1 = dmax != T(0) ? usmax1 * rcpdmax : T(1);
s1 = dmax != T(0) ? usmax2 * rcpdmax : T(0);
}
Ideas from:
http://www.cs.utexas.edu/users/inderjit/public_papers/HLA_SVD.pdf
http://www.math.pitt.edu/~sussmanm/2071Spring08/lab09/index.html
http://www.lucidarme.me/singular-value-decomposition-of-a-2x2-matrix/