ml-cpu-sched: Machine Learning for Energy-Performance-aware Scheduling

Abstract

In the post-Dennard era, optimizing embedded systems requires navigating complex trade-offs between energy efficiency and latency. Traditional heuristic tuning is often inefficient in such high-dimensional, non-smooth landscapes. In this work, we propose a Bayesian Optimization framework using Gaussian Processes to automate the search for optimal scheduling configurations on heterogeneous multi-core architectures. We explicitly address the multi-objective nature of the problem by approximating the Pareto Frontier between energy and time. Furthermore, by incorporating Sensitivity Analysis (fANOVA) and comparing different covariance kernels (e.g., Matérn vs. RBF), we provide physical interpretability to the black-box model, revealing the dominant hardware parameters driving system performance.

Introduction

This work addresses the challenge of optimizing scheduling on heterogeneous multi-core processors to balance energy consumption and latency via the Gaussian Process-based Bayesian Optimization (GP-BO) framework, with analysis from Sensitivity Analysis. The contributions of our work are summarized as follows:

Methodological Validation for Non-Smooth Landscapes: We demonstrate that the scheduling landscape is inherently non-smooth. Through rigorous kernel benchmarking, we establish that the Matérn 5/2 kernel outperforms the standard RBF kernel, as it correctly models the sharp performance cliffs associated with discrete core allocations.
Discovery of "Race-to-Idle" Physics: By analyzing the energy-latency trade-off, our model autonomously rediscovers the "Race-to-Idle" phenomenon. We provide empirical evidence that activating high-frequency big cores often yields superior energy efficiency compared to leakage-prone low-frequency execution.
Structural Decoupling of Heterogeneity: Our multi-objective analysis reveals a functional decoupling in hardware resources. The optimizer learns to map latency-critical tasks to big cores while leveraging little cores for energy conservation, effectively disentangling the conflicting objectives of the heterogeneous system.

Simulation Framework

We adopt SimPy, a process-based discrete-event framework, as our backend simulation environment. It supports waiting, interrupting a process and shared resources. Our simulator is divided into Task, Processors, Schedulers, and Simulators packages, where details are described as above. For efficient verification and debugging, we deploy logging to track every event of the simulation. For details, please refer to our paper.

Power and Performance Model

With derivation from the paper, the aggregated latency is modelled as the priority-weighted turn around time of all tasks, and the energy consumption is modelled as the total energy of all cores during the scheduling period. Among which, active or dynamic power and idle or leakage power are considered, while the system (IO, memory, peripherals) power is ignored for simplicity. We visualize the power and performance model below.

Emulation and Results

To comprehensively evaluate the proposed BO framework, we designed four distinct experimental scenarios. Each scenario targets a specific aspect of the system's behaviour and the optimizer's capability.

(I) Surrogate Model Calibration

In this phase, we benchmark different kernels (RBF vs. Matérn) under a standard workload (λ=1.0). The objective is to determine which kernel best captures the discrete and non-smooth landscape of the CPU scheduling problem. The selected kernel is then used as the baseline for subsequent experiments.

(II) Preference Sensitivity Analysis

Since the cost function 𝓛 is a weighted sum of energy and time, system behaviour is highly sensitive to the weights β (energy) and γ (time). We vary these coefficients to simulate different user priorities:
    (1) Performance-First: High penalty on time (γ > β),
    (2) Energy-First: High penalty on energy (β > γ),
    (3) Equal weights (β = γ).
The objective is to verify if the optimizer can correctly shift the hardware configuration (e.g., scaling frequencies or core counts) to align with the specified high-level preferences.

(III) Workload Robustness Testing

We evaluate the optimizer's robustness by varying the task arrival rate λ (from low load λ=0.5 to high load λ=5.0). The objective is to investigate how the optimal architectural configuration evolves under pressure. Specifically, we aim to observe if the system automatically scales up resources (e.g., activating big cores) to prevent latency cliffs during peak loads.

(IV) Multi-Objective Pareto Exploration

Finally, to overcome the limitations of fixed weights, we decouple the objectives and perform Multi-Objective Optimization (MOO). Instead of minimizing a scalar loss, we aim to approximate the Pareto Frontier. The objective is to uncover the intrinsic trade-off curve between Energy and Time, providing a set of non-dominated solutions that allow system administrators to make a posteriori decisions without manually tuning weights.

We utilize a combination of scalarized cost metrics, multi-objective indicators, and statistical distribution analysis. The below table presents a summary of the optimal configurations found across all experimental scenarios.

Further Analysis

We utilize sensitivity analysis to interpret the results and investigate the pareto frontier of the following four additional aspects.

(1) We evaluate the impact of the Gaussian Process covariance kernel on the optimization performance. We compared the Matérn 5/2, Matérn 3/2, and RBF kernels under the standard balanced metric defined by Equation 4, with the hyperparameter of β = 1, γ = 1.

(2) We measure results from different preference for energy-performance tradeoff. To assess the adaptability of the framework, we tested the optimizer under three distinct preference scenarios: Balance (β = γ = 1), Energy-First (β = 3, γ = 1), and Time-First (β = 1, γ = 3).

(3) We investigate workload robustness under varying task arrival rates. The results confirm that the optimizer performs autonomous scaling strategies aligned with the workload pressure.

(4) We explore Pareto frontier in the multi-objective optimization setting. Energy and Time are treated as conflicting objectives to approximate the Pareto Frontier. To provide a clear visualization of the trade-off surface, outliers with extreme penalties were filtered from the scatter plot.

For detailed results and analysis, please refer to our paper.

Future work

Although this study successfully demonstrates the efficacy of Bayesian Optimization for offline parameter tuning, several avenues remain for extending the framework's applicability to dynamic, real-world environments.

First, future research will focus on transitioning from static offline configuration to online run-time adaptation. By integrating lightweight surrogate models, such as Contextual Bandits, the system could dynamically adjust voltage and frequency (DVFS) settings in response to real-time traffic bursts, rather than relying on a fixed schedule.

Additionally, we aim to relax the assumption of independent tasks by incorporating task dependency models, specifically Directed Acyclic Graphs (DAGs). Handling inter-task dependencies introduces additional challenges regarding communication overhead and pipeline stalling, which are critical for complex embedded applications like video processing.

Acknowledgements

We sincerely appreciate Professor Carl Henrik Ek for organizing the exciting module L48 Machine Learning and the Physical World at Department of Computer Science and Technology, University of Cambridge and providing consistent feedback regarding this project during the proposal and viva phase.

Citation

If you found the paper or code useful, please consider citing:

@misc{HuShi2026mlcpusched,
      title={Machine Learning for Energy-Performance-aware Scheduling}, 
      author={Zheyuan Hu and Yifei Shi},
      year={2026},
      eprint={2601.23134},
      archivePrefix={arXiv},
      primaryClass={cs.AR},
      url={https://arxiv.org/abs/2601.23134}, 
}

The website template was inspired by M³ashy.

Machine Learning for Energy-Performance-aware Scheduling
ACS MLPW 2025-26

Paper

Code

Video

Abstract

Introduction

Simulation Framework

Power and Performance Model

Emulation and Results

(I) Surrogate Model Calibration

(II) Preference Sensitivity Analysis

(III) Workload Robustness Testing

(IV) Multi-Objective Pareto Exploration

Further Analysis

Future work

Acknowledgements

Citation

Machine Learning for Energy-Performance-aware Scheduling ACS MLPW 2025-26

Paper

Code

Video

Abstract

Introduction

Simulation Framework

Power and Performance Model

Emulation and Results

(I) Surrogate Model Calibration

(II) Preference Sensitivity Analysis

(III) Workload Robustness Testing

(IV) Multi-Objective Pareto Exploration

Further Analysis

Future work

Acknowledgements

Citation

Machine Learning for Energy-Performance-aware Scheduling
ACS MLPW 2025-26