Nice study on the inner workings of time series MoE foundation models.
Moirai-MoE: Making time series models smarter by letting experts handle specific patterns automatically.
https://arxiv.org/abs/2410.10469
Original Problem 🎯:
Time series foundation models struggle with handling diverse data patterns effectively. Current approaches use frequency-based specialization or dataset-level categorization, which are too rigid and miss important pattern similarities across different frequencies.
-----
Solution in this Paper 🛠️:
→ Introduces Moirai-MoE, a mixture-of-experts time series foundation model that uses a single input/output projection layer and delegates pattern modeling to sparse specialized experts
→ Implements token-level specialization instead of frequency-based specialization, allowing similar patterns to share parameters regardless of frequency
→ Uses a novel gating function that leverages cluster centroids from pretrained model representations for more accurate expert assignments
→ Adopts a decoder-only training objective to enable parallel learning of various context lengths
-----
Key Insights 🔍:
→ Frequency is not always a reliable indicator of time series patterns - similar patterns can exist across different frequencies
→ Non-stationarity in real-world time series needs more fine-grained modeling than frequency-level specialization
→ Token-level specialization through mixture of experts provides better pattern recognition than rigid frequency-based approaches
-----
Results 📊:
→ Delivers 17% performance improvement over baseline Moirai model
→ Outperforms other foundation models with 65× fewer activated parameters
→ Achieves best zero-shot performance across 39 datasets
→ Maintains similar inference speed as baseline while delivering substantial improvements
Share this post