XGBRegressor so với xgboost.train chênh lệch tốc độ rất lớn?

Nếu tôi huấn luyện mô hình của mình bằng mã sau:

import xgboost as xg
params = {'max_depth':3,
'min_child_weight':10,
'learning_rate':0.3,
'subsample':0.5,
'colsample_bytree':0.6,
'obj':'reg:linear',
'n_estimators':1000,
'eta':0.3}

features = df[feature_columns]
target = df[target_columns]
dmatrix = xg.DMatrix(features.values,
                     target.values,
                     feature_names=features.columns.values)
clf = xg.train(params, dmatrix)

nó kết thúc sau khoảng 1 phút

Nếu tôi huấn luyện mô hình của mình bằng phương pháp học Sci-Kit:

import xgboost as xg
max_depth = 3
min_child_weight = 10
subsample = 0.5
colsample_bytree = 0.6
objective = 'reg:linear'
num_estimators = 1000
learning_rate = 0.3

features = df[feature_columns]
target = df[target_columns]
clf = xg.XGBRegressor(max_depth=max_depth,
                min_child_weight=min_child_weight,
                subsample=subsample,
                colsample_bytree=colsample_bytree,
                objective=objective,
                n_estimators=num_estimators,
                learning_rate=learning_rate)
clf.fit(features, target)

phải mất hơn 30 phút.

Tôi sẽ nghĩ rằng mã cơ bản gần như giống hệt nhau (tức là XGBRegressorcác cuộc gọi xg.train) - chuyện gì đang xảy ra ở đây?

— người dùng1566200
nguồn

xgboost.trainsẽ bỏ qua tham số n_estimators, trong khi xgboost.XGBRegressorchấp nhận. Trong xgboost.train, việc tăng các lần lặp (tức là n_estimators) được kiểm soát bởi num_boost_round(mặc định: 10)

Trong trường hợp của bạn, mã đầu tiên sẽ thực hiện 10 lần lặp (theo mặc định), nhưng mã thứ hai sẽ thực hiện 1000 lần lặp. Sẽ không có sự khác biệt lớn nếu bạn cố gắng thay đổi clf = xg.train(params, dmatrix)thành clf = xg.train(params, dmatrix, 1000),

Người giới thiệu

http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.train

http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBRegressor

— Băng giá
nguồn