Độ sâu tương tác có ý nghĩa gì trong GBM?


30

Tôi đã có một câu hỏi về tham số độ sâu tương tác trong gbm trong R. Đây có thể là một câu hỏi không, tôi xin lỗi, nhưng tham số mà tôi tin là biểu thị số nút thiết bị đầu cuối trong cây, về cơ bản chỉ ra cách X tương tác giữa các yếu tố dự đoán? Chỉ cần cố gắng để hiểu làm thế nào mà làm việc. Ngoài ra, tôi nhận được các mô hình khá khác nhau nếu tôi có một tập dữ liệu với hai biến nhân tố khác nhau so với cùng một tập dữ liệu ngoại trừ hai biến nhân tố đó được kết hợp thành một yếu tố duy nhất (ví dụ: cấp X trong yếu tố 1, cấp Y trong yếu tố 2, biến kết hợp có Các yếu tố X * Y). Cái sau được dự đoán nhiều hơn đáng kể so với cái trước. Tôi đã nghĩ rằng tăng độ sâu tương tác sẽ chọn mối quan hệ này lên.

Câu trả lời:


22

Cả hai câu trả lời trước đều sai. Gói GBM sử dụnginteraction.depth parameter as a number of splits it has to perform on a tree (starting from a single node). As each split increases the total number of nodes by 3 and number of terminal nodes by 2 (node {left node, right node, NA node}) the total number of nodes in the tree will be 3N+1 and the number of terminal nodes 2N+1. This can be verified by having a look at the output of pretty.gbm.tree function.

The behaviour is rather misleading, as the user indeed expects the depth to be the depth of the resulting tree. It is not.


What is N here: Numer of nodes, interaction.depth or something else?
Julian

It is a number of splits performed, starting from a single node (also, interaction depth).
random

1
I think each split only increases the total number of terminal nodes by 1. so suppose a tree just have only one split, then it has 2 terminal nodes, now you perform a split on one of the previous terminal nodes, and then there's 3 terminal nodes now. so the increment is just 1. Do I get this right or I mis-understood something?
Lily Long

1
@LilyLong It might not be immediately clear, but gbm actually splits nodes in three, the third child grouping NA values (i.e. those that cannot be directly compared to the given value). That means that each split increases number of nodes by two. The package might have evolved since I have last used it to avoid creating this third child, so please double-check this by running pretty.gbm.tree function.
random

2

I had a question on the interaction depth parameter in gbm in R. This may be a noob question, for which I apologize, but how does the parameter, which I believe denotes the number of terminal nodes in a tree, basically indicate X-way interaction among the predictors?

Link between interaction.depth and the number of terminal nodes

One as to see interaction.depth as the number of split nodes. An interaction.depth fixed at k will result in nodes with k+1 terminal nodes (omitting the NA nodes), so we have :

interaction.depth=#{TerminalNodes}+1

Link between interaction.depth and the interaction order

The link between interaction.depth and interaction order is more tedious.

Instead of reasoning with the interaction.depth, let's reason with the number of terminal nodes, which we will called J.

Example: Let's say you have J=4 terminal nodes (interaction.depth=3) you can either :

  1. do the first split on the root, then the second split on the left node of the root and the third split on the right node of the root. The interaction order for this tree will be 2.
  2. do the first split on the root, then the second split on the left (respectively right) node of the root, and a third split on this very left (respectively right) node. The interaction order for this tree will be 3.

So you cannot know in advance what will be the interaction order between your features in a given tree. However it is possible to upper bound this value. Let P be the interaction order of the features in a given tree. We have :

Pmin(J1,n)
with n being the number of observations. For more details see the section 7 of the original article of Friedman.

1

Previous answer is not correct.

Stumps will have an interaction.depth of 1 (and have two leaves). But interaction.depth=2 gives three leaves.

So: NumberOfLeaves = interaction.depth + 1


0

Actually, the previous answers are incorrect.

Let K be the interaction.depth, then the number of nodes N and leaves L (i.e terminal nodes) are respectively given by the following:

N=2(K+1)1L=2K
The previous 2 formulas can easily be demonstrated: a tree of depth K can be seen as having K+1 levels k ranging from 0 (root level) to K (leaf level).

Each of these levels has 2k nodes. And the tree's total number of nodes is the sum of the number of nodes at each level.

In mathematical terms:

N=k=0K2k)

which is equivalent to:

N=2(K+1)1
(as per the formula of the sum of the terms of a geometrical progression).

0

You can try

table(predict(gbm( y ~.,data=TrainingData, distribution="gaussian", verbose =FALSE, n.trees =1 , shrinkage =0.01, bag.fraction =1 , interaction.depth = 1 ),n.trees=1))

and see that there are only 2 unique predicted values. interaction.depth = 2 will get you 3 distinct predicted values. And convince yourself.


Not clear how this answers the question.
Michael R. Chernick
Khi sử dụng trang web của chúng tôi, bạn xác nhận rằng bạn đã đọc và hiểu Chính sách cookieChính sách bảo mật của chúng tôi.
Licensed under cc by-sa 3.0 with attribution required.