Generative Pretrained Hierarchical Transformer for Time Series Forecasting

Recent efforts have been dedicated to enhancing time series forecastingaccuracy by introducing advanced network architectures and self-supervisedpretraining strategies. Nevertheless, existing approaches still exhibit twocritical drawbacks. Firstly, these methods often rely on a single dataset fortraining, limiting the model’s generalizability due to the restricted scale ofthe training data. Secondly, the one-step generation schema is widely followed,which necessitates a customized forecasting head and overlooks the temporaldependencies in the output series, and also leads to increased training costsunder different horizon length settings. To address these issues, we propose a novel generative pretrainedhierarchical transformer architecture for forecasting, named GPHT.There are two aspects of key designs in GPHT. On the one hand, we advocate forconstructing a mixed dataset under the channel-independent assumption forpretraining our model, comprising various datasets from diverse data scenarios.This approach significantly expands the scale of training data, allowing ourmodel to uncover commonalities in time series data and facilitating improvedtransfer to specific datasets. On the other hand, GPHT employs anauto-regressive forecasting approach, effectively modeling temporaldependencies in the output series. Importantly, no customized forecasting headis required, enabling a single model to forecast at arbitrary horizonsettings. We conduct sufficient experiments on eight datasets with mainstreamself-supervised pretraining models and supervised models. The resultsdemonstrated that GPHT surpasses the baseline models across various fine-tuningand zero/few-shot learning settings in the traditional long-term forecastingtask. We make our codes publiclyavailable[https://github.com/icantnamemyself/GPHT].

Further reading