Habr, hello!
In the laboratory for modeling natural systems at the National Center for Cognitive Development at ITMO University, we are actively researching the use of automatic machine learning for various tasks. In this article, we want to talk about the use of AutoML for efficient time series forecasting, as well as how this is implemented within the open-source framework FEDOT . This is the second article in a series of publications devoted to this development (the first of them can be found here ).
All the details are under the cut!
Automatic machine learning (AutoML)
Modern Data Science has become a very popular part of the IT sphere. Experts collect data, clean it up, try different models, perform validation, and choose the best ones. And all this in order to provide the business with the solution that will bring the most value. At the same time, some stages of obtaining such solutions are more and more automated every year. As a rule, this applies to the most routine parts. This frees up the experts' time for more important tasks.
So, let's imagine that a specialist is faced with the task of building a machine learning model and βwrapβ it in a web service so that this very model does useful work - predict something. But before you get to the stage of training the model, you need to go through several steps, including:
- collect data from many sources, clean it;
- , , ;
- , ;
- .
, , . , , , , . , - , . β MLFlow, Apache AirFlow . β - workflow management system (WMS) . .
, ?
, ββ, . ββ ML .
. , , open-source, TPOT, AutoGluon, MLJAR H2O. AutoML β , ( ) β. , . ( ) , : TPOT FEDOT.
SaaS-, DataRobot, GoogleAutoTables, Amazon SageMaker, ML , AutoML.
, AutoML : . , , . , .
, , open-source . , β ( ).
, . : , . : , , β . AutoML legacy β , ( β β) : , .
, - , β . open-source β AutoTS. ββ β AR ARIMA. ββ , ( ), . , , pmdarima.
β AutoML- . . , , H2O, . , open-source , , , . .
AutoML-?
, , . :
- (, , β , β );
- ;
- () ;
- ( -);
- in-sample out-of-sample ;
- β ?
, , , β .
AutoML . , , data-driven , .. , .. - .
AutoML, . , open-source AutoML , β FEDOT.
, , β . , , , (, ). .
, , . , , .. , , , . , , baseline .
, , (, , ).
. , , . FEDOT, , .
, FEDOT , :
- β , : (, , ) , ;
- β , . . Primary , Secondary β ;
- β , . FEDOT ( Chain).
:
, FEDOT
, , , .
. ββ, β . β . β ββ AutoML. , . , β .
FEDOT
, . FEDOT- . ? , , ? ?
! .
. β , . , . , , :
βlagged-β . FEDOT βlaggedβ. β .
1 . , . muli-target . :
. 3 lagged
. , AR ARIMA. , .
: β . β . β .
, , . β , , . , .
14 . , β .
, . , . , , jupyter notebookβ.
β . FEDOT :
- , ;
- ;
- .
, , , . . . . .
, ,
lagged-, , ridge- (. ), ββ.
, β ββ. , , .
. , , . , , . ( ) . β β . . , , , . ( ) , β .
:
! ββ . . . () (). FEDOT.
, :
(, β )
.
AutoML. FEDOT , API.
, ββ. AutoML :
- β . , , . , , ββ . ;
- β , , . , .
:
.
. : (MAE) (RMSE): MAE β 100.52, RMSE β 120.42.
, : ?
: . . 14 . 14 ( 42). in-sample .
, out-of-sample in-sample :
. in-sample out-of-sample
, 14 . 28 β 2 14 . , (out-of-sample).
, in-sample . ( ). , , .
FEDOT β 3- 14 . . . , , .
14 .
. , , .
( ). , ,
, . , , , . , .
β K- , . , . K- . β .
, , , .
, ββ ,
, β , 1000 β , 0. . (, ) . K-nn . , , : MAE β 88.19 RMSE β 177.31.
, - . 5 . , , .
: . , , . , -, ( ). : . , FEDOT , . !
FEDOT open-source β AutoTS pmdarima. Jupyter notebook , , . , . 3 , . ( - ):
β | RMSEβCKO | |
---|---|---|
pmdarima | 155β1 | 196β1 |
AutoTS | 198β22 | 236β41 |
FEDOT | 110β14 | 170β26 |
:
, FEDOT β β.
, , AutoML. , ML-, .
AutoML FEDOT: , . FEDOT .
:
AutoML, FEDOT!
Worked on the article: Mikhail Sarafanov , Pavel Vychuzhanin and Nikolai Nikitin .