Một ví dụ về đa cộng đồng hoàn hảo là gì?

12

Một ví dụ về cộng tuyến hoàn hảo về mặt ma trận thiết kế $X$ gì?

Tôi muốn một ví dụ nơi không thể được ước tính bởi vì là không khả nghịch. $\hat \beta = (X'X)^{-1}X'Y$ $(X'X)$

— TsTeaTime
nguồn

Tôi đã xem qua bài viết được đề xuất cho Colinearity và cảm thấy rằng nó đủ để hiểu, nhưng một ví dụ đơn giản sử dụng dữ liệu sẽ thêm rõ ràng.

— TsTeaTime

2

Bạn có ý nghĩa gì bởi "về X và Y"? Colinearity tồn tại giữa các biến X, Y không liên quan gì đến nó.

— gung - Phục hồi Monica

1

Tôi đã điều chỉnh câu hỏi để cụ thể hơn liên quan đến

X

$X$

— TsTeaTime

1

Vì tính đa hướng của

X

$X$ hiển thị ở số ít của

X^{'} X

$X'X$ bạn cũng có thể muốn đọc câu hỏi này: stats.stackexchange.com/q/70899/3277 .

— ttnphns

10

Dưới đây là một ví dụ với 3 biến, $y$ , $x_1$ và $x_2$ , liên quan theo phương trình

y = x_{1} + x_{2} + ε

$y = x_1 + x_2 + \varepsilon$

trong đó $\varepsilon \sim N(0,1)$

Các dữ liệu cụ thể là

         y x1 x2
1 4.520866  1  2
2 6.849811  2  4
3 6.539804  3  6

Vì vậy, hiển nhiên là $x_2$ là bội số của $x_1$ do đó chúng ta có cộng tuyến hoàn hảo.

Chúng ta có thể viết mô hình như

Y = X β + ε

$Y = X \beta + \varepsilon$

Ở đâu:

Y = [\begin{matrix} 4.52 \\ 6.85 \\ 6.54 \end{matrix}]

$Y = \begin{bmatrix}4.52 \\6.85 \\6.54\end{bmatrix}$

X = [\begin{matrix} 1 & 1 & 2 \\ 1 & 2 & 4 \\ 1 & 3 & 6 \end{matrix}]

$X = \begin{bmatrix}1 & 1 & 2\\1 & 2 & 4 \\1 & 3 & 6\end{bmatrix}$

Vì vậy chúng tôi có

X X^{'} = [\begin{matrix} 1 & 1 & 2 \\ 1 & 2 & 4 \\ 1 & 3 & 6 \end{matrix}] [\begin{matrix} 1 & 1 & 1 \\ 1 & 2 & 3 \\ 2 & 4 & 6 \end{matrix}] = [\begin{matrix} 6 & 11 & 16 \\ 11 & 21 & 31 \\ 16 & 31 & 46 \end{matrix}]

$XX' = \begin{bmatrix}1 & 1 & 2\\1 & 2 & 4 \\1 & 3 & 6\end{bmatrix} \begin{bmatrix}1 & 1 & 1\\1 & 2 & 3 \\2 & 4 & 6\end{bmatrix} = \begin{bmatrix}6 & 11 & 16\\11 & 21 & 31 \\16 & 31 & 46\end{bmatrix}$

Bây giờ chúng ta tính toán yếu tố quyết định của $XX'$ :

det X X^{'} = 6 | \begin{matrix} 21 & 31 \\ 31 & 46 \end{matrix} | - 11 | \begin{matrix} 11 & 31 \\ 16 & 46 \end{matrix} | + 16 | \begin{matrix} 11 & 21 \\ 16 & 31 \end{matrix} | = 0

$\det XX' = 6\begin{vmatrix}21 & 31 \\31 & 46\end{vmatrix} - 11 \begin{vmatrix}11 & 31 \\16 & 46\end{vmatrix} + 16\begin{vmatrix}11 & 21 \\16 & 31\end{vmatrix}= 0$

Trong R chúng ta có thể hiển thị điều này như sau:

> x1 <- c(1,2,3)

tạo ra x2, nhiềux1

> x2 <- x1*2

tạo y, một sự kết hợp tuyến tính của x1, x2và một số ngẫu nhiên

> y <- x1 + x2 + rnorm(3,0,1)

quan sát rằng

> summary(m0 <- lm(y~x1+x2))

không ước tính giá trị cho x2hệ số:

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.9512     1.6457   2.401    0.251
x1            1.0095     0.7618   1.325    0.412
x2                NA         NA      NA       NA

Residual standard error: 0.02583 on 1 degrees of freedom
Multiple R-squared:      1,     Adjusted R-squared:  0.9999 
F-statistic: 2.981e+04 on 1 and 1 DF,  p-value: 0.003687

Ma trận mô hình $X$ là:

> (X <- model.matrix(m0))

(Intercept) x1 x2
1           1  1  2
2           1  2  4
3           1  3  6

Vậy $XX'$ là

> (XXdash <- X %*% t(X))
   1  2  3
1  6 11 16
2 11 21 31
3 16 31 46

mà không thể đảo ngược, như thể hiện bởi

> solve(XXdash)
Error in solve.default(XXdash) : 
  Lapack routine dgesv: system is exactly singular: U[3,3] = 0

Hoặc là:

det (XXdash) [1] 0

— Robert Long
nguồn

21

Dưới đây là một vài tình huống khá phổ biến tạo ra tính đa hình hoàn hảo, tức là các tình huống trong đó các cột của ma trận thiết kế phụ thuộc tuyến tính. Nhớ lại từ đại số tuyến tính rằng điều này có nghĩa là có một tổ hợp tuyến tính của các cột của ma trận thiết kế (có hệ số không phải là tất cả bằng 0) bằng 0. Tôi đã bao gồm một số ví dụ thực tế để giúp giải thích tại sao cạm bẫy này lại xảy ra thường xuyên như vậy - tôi đã gặp phải hầu hết tất cả chúng!

Một biến là bội số của một biến khác , bất kể có thuật ngữ chặn hay không: có lẽ vì bạn đã ghi lại cùng một biến hai lần bằng các đơn vị khác nhau (ví dụ: "chiều dài tính bằng centimet" lớn hơn 100 lần so với "chiều dài tính bằng mét") hoặc bởi vì bạn đã ghi lại một biến số một lần dưới dạng số nguyên và một lần theo tỷ lệ hoặc tỷ lệ phần trăm, khi mẫu số được cố định (ví dụ: "diện tích đĩa petri được định vị" và "tỷ lệ phần trăm của đĩa petri" sẽ là bội số chính xác của nhau nếu diện tích của mỗi đĩa petri là như nhau).Chúng ta có cộng tuyến vì nếu trong đó và là các biến (cột của ma trận thiết kế của bạn) và là hằng số vô hướng, thì $w_i = ax_i$ $w$ $x$ $a$ là tổ hợp tuyến tính của các biến bằng 0. $1(\vec w) - a(\vec x)$
Có một thuật ngữ đánh chặn và một khác biến từ khác bằng một hằng số : điều này sẽ xảy ra nếu bạn tập trung một biến ( ) và bao gồm cả nguyên và tập trung trong hồi quy của bạn. Nó cũng sẽ xảy ra nếu các biến của bạn được đo trong các hệ đơn vị khác nhau khác nhau bởi một hằng số, ví dụ nếu là "nhiệt độ tính bằng kelvin" và là "nhiệt độ ở ° C" thì . Nếu chúng ta coi thuật ngữ chặn là một biến luôn luôn là (được biểu diễn dưới dạng cột của một, $w_i = x_i - \bar x$ $x$ $w$ $w$ $x$ $w_i = x_i + 273.15$ $1$ , trong ma trận thiết kế) khi đó cóvới một số hằng sốcó nghĩa làlà tổ hợp tuyến tính của,vàcột của ma trận thiết kế bằng không. $\vec 1_n$ $w_i = x_i + k$ $k$ $1(\vec w) - 1(\vec x) - k(\vec 1_n)$ $w$ $x$ $1$
Có một thuật ngữ chặn và một biến được đưa ra bởi một phép biến đổi affine của một biến khác : tức là bạn có các biến và , liên quan bởi trong đó và là hằng số. Đối với trường hợp này xảy ra nếu bạn chuẩn hóa một biến như $w$ $x$ $w_i = ax_i + b$ $a$ $b$ $z_i = \frac{x_i - \bar x}{s_x}$ and include both raw $x$ and standardized $z$ variables in your regression. It also happens if you record $w$ as "temperature in °F" and $x$ as "temperature in °C", since those unit systems do not share a common zero but are related by $w_i = 1.8x_i + 32$ . Or in a business context, suppose there is fixed cost $b$ (e.g. covering delivery) for each order, as well as a cost $\$a$ per unit sold; then if $\$w_i$ is the cost of order $i$ and $x_i$ is the number of units ordered, we have $w_i = ax_i + b$ . The linear combination of interest is $1(\vec w) - a(\vec x) - b(\vec 1_n) = \vec 0$ . Note that if $a=1$ , then (3) includes (2) as a special case; if $b=0$ , then (3) includes (1) as a special case.
Có một thuật ngữ chặn và tổng của một số biến được cố định (ví dụ: trong "bẫy biến giả" nổi tiếng) : ví dụ: nếu bạn có "tỷ lệ khách hàng hài lòng", "tỷ lệ khách hàng không hài lòng" và "tỷ lệ khách hàng không hài lòng" cũng không hài lòng "thì ba biến này sẽ luôn luôn (lỗi làm tròn số) tổng cộng thành 100. Một trong những biến này - hoặc cách khác, thuật ngữ chặn - cần phải được loại bỏ khỏi hồi quy để ngăn ngừa cộng tuyến. "Bẫy biến giả" xảy ra khi bạn sử dụng các biến chỉ báo (phổ biến hơn nhưng ít được gọi là "giả" hơn cho mọi cấp độ có thể của biến phân loại. Ví dụ, các bình hoa giả sử được sản xuất theo tông màu đỏ, xanh lá cây hoặc xanh dương. Nếu bạn ghi lại biến phân loại "redgreen and blue would be binary variables, stored as 1 for "yes" and 0 for "no") then for each vase only one of the variables would be a one, and hence red + green + blue = 1. Since there is a vector of ones for the intercept term, the linear combination 1(red) + 1(green) + 1(blue) - 1(1) = 0. The usual remedy here is either to drop the intercept, or drop one of the indicators (e.g. leave out red) which becomes a baseline or reference level. In this case, the regression coefficient for green would indicate the change in the mean response associated with switching from a red vase to a green one, holding other explanatory variables constant.
There are at least two subsets of variables, each having a fixed sum, regardless of whether there is an intercept term: suppose the vases in (4) were produced in three sizes, and the categorical variable for size was stored as three additional indicator variables. We would havelarge + medium + small = 1. Then we have the linear combination 1(large) + 1(medium) + 1(small) - 1(red) - 1(green) - 1(blue) = 0, even when there is no intercept term. The two subsets need not share the same sum, e.g. if we have explanatory variables $u, v, w, x$ such that every $u_i + v_i = k_1$ and $x_i + y_i = k_2$ $k_2(\vec u) + k_2(\vec v) - k_1(\vec w) - k_1(\vec x) = \vec 0$
$l$ $w$ $p$ $p_i = 2l_i + 2w_i$ so we have the linear combination $1(\vec p) - 2(\vec l) - 2(\vec w) = \vec 0$ . An example with an intercept term: suppose a mail-order business has two product lines, and we record that order $i$ consisted of $u_i$ of the first product at unit cost $\$a$ and $v_i$ of the second at unit cost $\$b$ , with fixed delivery charge $\$c$ . If we also include the order cost $\$x$ as an explanatory variable, then $x_i = a u_i + b v_i + c$ and so $1(\vec x) - a(\vec u) - b(\vec v) -c(\vec 1_n) = \vec 0$ . This is an obvious generalization of (3). It also gives us a different way of thinking about (4): once we know all bar one of the subset of variables whose sum is fixed, then the remaining one is their complement so can be expressed as a linear combination of them and their sum. If we know 50% of customers were satisfied and 20% were dissatisfied, then 100% - 50% - 20% = 30% must be neither satisfied nor dissatisfied; if we know the vase is not red (red=0) and it is green (green=1) then we know it is not blue (blue = 1(1) - 1(red) - 1(green) = 1 - 0 - 1 = 0).
One variable is constant and zero, regardless of whether there is an intercept term: in an observational study, a variable will be constant if your sample does not exhibit sufficient (any!) variation. There may be variation in the population that is not captured in your sample, e.g. if there is a very common modal value: perhaps your sample size is too small and was therefore unlikely to include any values that differed from the mode, or your measurements were insufficiently accurate to detect small variations from the mode. Alternatively, there may be theoretical reasons for the lack of variation, particularly if you are studying a sub-population. In a study of new-build properties in Los Angeles, it would not be surprising that every data point has AgeOfProperty = 0 and State = California! In an experimental study, you may have measured an independent variable that is under experimental control. Should one of your explanatory variables $x$ be both constant and zero, then we have immediately that the linear combination $1(\vec x)$ (with coefficient zero for any other variables) is $\vec 0$ .
There is an intercept term and at least one variable is constant: if $x$ is constant so that each $x_i = k \neq 0$ , then the linear combination $1(\vec x) - k(\vec 1_n) = \vec 0$ .
At least two variables are constant, regardless of whether there is an intercept term: if each $w_i = k_1 \neq 0$ and $x_i = k_2 \neq 0$ , then the linear combination $k_2(\vec w) - k_1(\vec x) = \vec 0$ .
Number of columns of design matrix, $k$ , exceeds number of rows, $n$ : even when there is no conceptual relationship between your variables, it is mathematically necessitated that the columns of your design matrix will be linearly dependent when $k > n$ . It simply isn't possible to have $k$ linearly independent vectors in a space with a number of dimensions lower than $k$ : for instance, while you can draw two independent vectors on a sheet of paper (a two-dimensional plane, $\mathbb R^2$ ) any further vector drawn on the page must lie within their span, and hence be a linear combination of them. Note that an intercept term contributes a column of ones to the design matrix, so counts as one of your $k$ columns. (This scenario is often called the "large $p$ , small $n$ " problem: see also this related CV question.)

Data examples with R code

Each example gives a design matrix $X$ , the matrix $X'X$ (note this is always square and symmetrical) and $\det (X'X)$ . Note that if $X'X$ is singular (zero determinant, hence not invertible) then we cannot estimate $\hat \beta = (X'X)^{-1}X'y$ . The condition that $X'X$ be non-singular is equivalent to the condition that $X$ has full rank so its columns are linearly independent: see this Math SE question, or this one and its converse.

(1) One column is multiple of another

# x2 = 2 * x1
# Note no intercept term (column of 1s) is needed
X <- matrix(c(2, 4, 1, 2, 3, 6, 2, 4), ncol = 2, byrow=TRUE)

X
#     [,1] [,2]
#[1,]    2    4
#[2,]    1    2
#[3,]    3    6
#[4,]    2    4


t(X) %*% X
#     [,1] [,2]
#[1,]   18   36
#[2,]   36   72

round(det(t(X) %*% X), digits = 9)
#0

(2) Intercept term and one variable differs from another by constant

# x1 represents intercept term
# x3 = x2 + 2
X <- matrix(c(1, 2, 4, 1, 1, 3, 1, 3, 5, 1, 0, 2), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    2    4
#[2,]    1    1    3
#[3,]    1    3    5
#[4,]    1    0    2


t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    4    6   14
#[2,]    6   14   26
#[3,]   14   26   54

round(det(t(X) %*% X), digits = 9)
#0

# NB if we drop the intercept, cols now linearly independent
# x2 = x1 + 2 with no intercept column
X <- matrix(c(2, 4, 1, 3, 3, 5, 0, 2), ncol = 2, byrow=TRUE)

X
#     [,1] [,2]
#[1,]    2    4
#[2,]    1    3
#[3,]    3    5
#[4,]    0    2


t(X) %*% X
#     [,1] [,2]
#[1,]   14   26
#[2,]   26   54
# Can you see how this matrix is related to the previous one, and why?

round(det(t(X) %*% X), digits = 9)
#80
# Non-zero determinant so X'X is invertible

(3) Intercept term and one variable is affine transformation of another

# x1 represents intercept term
# x3 = 2*x2 - 3
X <- matrix(c(1, 2, 1, 1, 1, -1, 1, 3, 3, 1, 0, -3), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    2    1
#[2,]    1    1   -1
#[3,]    1    3    3
#[4,]    1    0   -3


t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    4    6    0
#[2,]    6   14   10
#[3,]    0   10   20

round(det(t(X) %*% X), digits = 9)
#0

# NB if we drop the intercept, cols now linearly independent
# x2 = 2*x1 - 3 with no intercept column
X <- matrix(c(2, 1, 1, -1, 3, 3, 0, -3), ncol = 2, byrow=TRUE)

X
#     [,1] [,2]
#[1,]    2    1
#[2,]    1   -1
#[3,]    3    3
#[4,]    0   -3


t(X) %*% X
#     [,1] [,2]
#[1,]   14   10
#[2,]   10   20
# Can you see how this matrix is related to the previous one, and why?

round(det(t(X) %*% X), digits = 9)
#180
# Non-zero determinant so X'X is invertible

(4) Intercept term and sum of several variables is fixed

# x1 represents intercept term
# x2 + x3 = 10
X <- matrix(c(1, 2, 8, 1, 1, 9, 1, 3, 7, 1, 0, 10), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    2    8
#[2,]    1    1    9
#[3,]    1    3    7
#[4,]    1    0   10


t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    4    6   34
#[2,]    6   14   46
#[3,]   34   46  294

round(det(t(X) %*% X), digits = 9)
#0

# NB if we drop the intercept, then columns now linearly independent
# x1 + x2 = 10 with no intercept column
X <- matrix(c(2, 8, 1, 9, 3, 7, 0, 10), ncol = 2, byrow=TRUE)

X
#     [,1] [,2]
#[1,]    2    8
#[2,]    1    9
#[3,]    3    7
#[4,]    0   10

t(X) %*% X
#     [,1] [,2]
#[1,]   14   46
#[2,]   46  294
# Can you see how this matrix is related to the previous one, and why?

round(det(t(X) %*% X), digits = 9)
#2000
# Non-zero determinant so X'X is invertible

(4a) Intercept term with dummy variable trap

# x1 represents intercept term
# x2 + x3 + x4 = 1
X <- matrix(c(1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0), ncol = 4, byrow=TRUE)

X
#     [,1] [,2] [,3] [,4]
#[1,]    1    0    0    1
#[2,]    1    1    0    0
#[3,]    1    0    1    0
#[4,]    1    1    0    0
#[5,]    1    0    1    0

t(X) %*% X
#     [,1] [,2] [,3] [,4]
#[1,]    5    2    2    1
#[2,]    2    2    0    0
#[3,]    2    0    2    0
#[4,]    1    0    0    1
# This matrix has a very natural interpretation - can you work it out?

round(det(t(X) %*% X), digits = 9)
#0

# NB if we drop the intercept, then columns now linearly independent
# x1 + x2 + x3 = 1 with no intercept column
X <- matrix(c(0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0), ncol = 3, byrow=TRUE)  

X
#     [,1] [,2] [,3]
#[1,]    0    0    1
#[2,]    1    0    0
#[3,]    0    1    0
#[4,]    1    0    0
#[5,]    0    1    0

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    2    0    0
#[2,]    0    2    0
#[3,]    0    0    1
# Can you see how this matrix is related to the previous one?

round(det(t(X) %*% X), digits = 9)
#4
# Non-zero determinant so X'X is invertible

(5) Two subsets of variables with fixed sum

# No intercept term needed
# x1 + x2 = 1
# x3 + x4 = 1
X <- matrix(c(0,1,0,1,1,0,0,1,0,1,1,0,1,0,0,1,1,0,1,0,0,1,1,0), ncol = 4, byrow=TRUE)

X
#     [,1] [,2] [,3] [,4]
#[1,]    0    1    0    1
#[2,]    1    0    0    1
#[3,]    0    1    1    0
#[4,]    1    0    0    1
#[5,]    1    0    1    0
#[6,]    0    1    1    0

t(X) %*% X
#     [,1] [,2] [,3] [,4]
#[1,]    3    0    1    2
#[2,]    0    3    2    1
#[3,]    1    2    3    0
#[4,]    2    1    0    3
# This matrix has a very natural interpretation - can you work it out?

round(det(t(X) %*% X), digits = 9)
#0

(6) One variable is linear combination of others

# No intercept term
# x3 = x1 + 2*x2
X <- matrix(c(1,1,3,0,2,4,2,1,4,3,1,5,1,2,5), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    1    3
#[2,]    0    2    4
#[3,]    2    1    4
#[4,]    3    1    5
#[5,]    1    2    5

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]   15    8   31
#[2,]    8   11   30
#[3,]   31   30   91

round(det(t(X) %*% X), digits = 9)
#0

(7) One variable is constant and zero

# No intercept term
# x3 = 0
X <- matrix(c(1,1,0,0,2,0,2,1,0,3,1,0,1,2,0), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    1    0
#[2,]    0    2    0
#[3,]    2    1    0
#[4,]    3    1    0
#[5,]    1    2    0

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]   15    8    0
#[2,]    8   11    0
#[3,]    0    0    0

round(det(t(X) %*% X), digits = 9)
#0

(8) Intercept term and one constant variable

# x1 is intercept term, x3 = 5
X <- matrix(c(1,1,5,1,2,5,1,1,5,1,1,5,1,2,5), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    1    5
#[2,]    1    2    5
#[3,]    1    1    5
#[4,]    1    1    5
#[5,]    1    2    5

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]    5    7   25
#[2,]    7   11   35
#[3,]   25   35  125

round(det(t(X) %*% X), digits = 9)
#0

(9) Two constant variables

# No intercept term, x2 = 2, x3 = 5
X <- matrix(c(1,2,5,2,2,5,1,2,5,1,2,5,2,2,5), ncol = 3, byrow=TRUE)

X
#     [,1] [,2] [,3]
#[1,]    1    2    5
#[2,]    2    2    5
#[3,]    1    2    5
#[4,]    1    2    5
#[5,]    2    2    5

t(X) %*% X
#     [,1] [,2] [,3]
#[1,]   11   14   35
#[2,]   14   20   50
#[3,]   35   50  125

round(det(t(X) %*% X), digits = 9)
#0

(10) $k > n$

# Design matrix has 4 columns but only 3 rows
X <- matrix(c(1,1,1,1,1,2,4,8,1,3,9,27), ncol = 4, byrow=TRUE)

X
#     [,1] [,2] [,3] [,4]
#[1,]    1    1    1    1
#[2,]    1    2    4    8
#[3,]    1    3    9   27

t(X) %*% X
#     [,1] [,2] [,3] [,4]
#[1,]    3    6   14   36
#[2,]    6   14   36   98
#[3,]   14   36   98  276
#[4,]   36   98  276  794

round(det(t(X) %*% X), digits = 9)
#0

— Silverfish
nguồn

4

Some trivial examples to help intuition:

is height in centimeters. is height in meters. Then:
- $\mathbf{x_1} = 100 \mathbf{x_2}$ , and your design matrix $X$ will not have linearly independent columns.
(i.e. you include a constant in your regression), is temperature in fahrenheit, and is temperature in celsius. Then:
- $\mathbf{x_2} = \frac{9}{5}\mathbf{x_3} + 32 \mathbf{x_1}$ , and your design matrix $X$ will not have linearly independent columns.
Everyone starts school at age 5, (i.e. constant value of 1 across all observations), is years of schooling, is age, and no one has left school. Then:
- $\mathbf{x_2} = \mathbf{x_3} - 5\mathbf{x_1}$ , and your design matrix $X$ will not have linearly independent columns.

There a multitude of ways such that one column of data will be a linear function of your other data. Some of them are obvious (eg. meters vs. centimeters) while others can be more subtle (eg. age and years of schooling for younger children).

Notational notes: Let $\mathbf{x_1}$ denote the first column of $X$ , $\mathbf{x_2}$ the second column etc..., and $\mathbf{1}$ denotes a vector of ones, which is what's included in the design matrix X if you include a constant in your regression.

— Matthew Gunn
nguồn

1

The schooling and age example is very good, though worth pointing out it that the relationship holds only while everyone is still in school! The logical extension of that is when you have age, years of schooling and years of work, which can continue the relationship beyond graduation. (Of course in practice such multicollinearity rarely tends to be perfect - there are always exceptions, like children who started school at a different age because they came from a different country - but it is often quite severe.)

— Silverfish

@Silverfish good points! I just made some edits/corrections.

— Matthew Gunn