Explore

เนื้อหาโอ

════════════════════════════════════════

Slide 1: Title and Introduction

════════════════════════════════════════

Hello everyone. Today, I'll be presenting on "Architecture-aware, Constraint-first, Energy-smart Scheduling," which is a key method for managing tasks on Heterogeneous Distributed Embedded Systems.

The main problem we need to solve is: How can we minimize the system's energy consumption while still maintaining the required performance and ensuring the system operates correctly even if some components fail?

In this presentation, I will divide the content into 3 core principles, each helping us achieve this goal of efficient energy savings.

⁠

════════════════════════════════════════

Slide 2: The Core Problem - Energy vs. Performance Balance

════════════════════════════════════════

Let's look at the main problem. These distributed embedded systems have three special characteristics:

First: "Heterogeneous Cores." The system has multiple different processors. Some are General-Purpose Processors (GPPs) that can do various tasks, while others are Digital Signal Processors (DSPs) specialized for signal processing. Each has different energy consumption characteristics.

Second: "DVFS," or Dynamic Voltage and Frequency Scaling. This means each processor can adjust its speed (frequency) and voltage as needed. But the faster it runs, the more power it consumes.

Third: "NoC Communication." The various processors communicate via a Network-on-Chip, and sending data across multiple 'hops' also uses more energy.

The problem is: We must decide which task should go to which processor, and when it should run at what frequency, to achieve conflicting goals: saving energy while maintaining performance.

⁠

Our primary Goal is to: Minimize the total energy consumption, (which includes both computation and communication), while maintaining performance requirements and fault tolerance.

════════════════════════════════════════

Slide 3: Principle 1 - Architecture-aware Mapping

════════════════════════════════════════

The first principle is "Architecture-aware Mapping." This means: Don't just schedule tasks randomly; you must understand the actual hardware "floorplan."

Why is this important? Because communication energy can be very high. If we place two tasks that communicate frequently at opposite corners of the chip, like Core 0 and Core 15, the data has to travel through many NoC hops, wasting a massive amount of energy.

You can see it in this diagram. We see a 4x4 grid of processors. If we place Task 1 at Core 0 and Task 2 at Core 15, the data must travel 6 hops, using high energy.

But if we place Task 1 and Task 2 close together, the data travels only 1-2 hops, and the energy is reduced significantly.

Therefore, the principle is: When mapping tasks, we must try to place tasks that communicate heavily "close together" on the NoC to reduce the hop count and lower communication energy.

⁠

════════════════════════════════════════

Slide 4: Principle 2 - Constraint-first Optimization

════════════════════════════════════════

The second principle is "Constraint-first Optimization." This means we must define what the most critical constraint is, and then optimize based on that constraint.

In this case, we choose "Throughput-first," which means: no matter what happens, even if some processors fail, the system must still meet its target speed.

We use a method called "TConEMin," which stands for "Throughput-Constrained Energy Minimization."

This method works in two phases:

First phase - Design-time: We pre-calculate the best mapping plan (one that maintains throughput and saves the most energy) for various failure scenarios—like if Core 1 fails, if Core 2 fails, or if Cores 1 and 3 fail, etc.—and we store these plans.

Second phase - Run-time: If a core actually fails, the system retrieves the prepared plan for that specific scenario and performs a Task Migration immediately.

The advantage of this method is: The system guarantees it will maintain throughput, even with core failures, and energy consumption is reduced by 22% compared to other methods.

⁠

════════════════════════════════════════

Slide 5: TConEMin - Design-time Fault-aware Mapping

════════════════════════════════════════

Let's look at the details of Design-time Optimization.

As you can see from this graph: the X-axis shows the number of failed cores, and the Y-axis shows the number of Fault Scenarios to consider.

For example, if the system has 8 cores:

No core failure: 1 scenario

1 core failure: 8 scenarios (choosing 1 of 8)

2 core failures: 28 scenarios (choosing 2 of 8)

3 core failures: 56 scenarios

4 core failures: 70 scenarios

For each of these scenarios, we must calculate the optimal mapping that still maintains throughput.

Looking at the blue line (Pre-calculated Optimal Mappings), we see that it increases with the number of fault scenarios.

But looking at the light blue line (Runtime Lookup Time), we see the time to find a mapping remains very low, even with many scenarios, because it's just a table lookup.

The advantage is: The system can respond to core failures very quickly, without needing to calculate a new mapping while the system is running.

⁠

════════════════════════════════════════

Slide 7: Self-timed Scheduling - Reducing Resource Usage

════════════════════════════════════════

Now, let's talk about a very important part of TConEMin called "Self-timed Scheduling."

The problem with traditional scheduling is: you have to store the "exact time" for every task. For example, Task A starts at 5ms, Task B at 10ms, Task C at 15ms, and so on. If the system has 100 tasks, you have to store 100 timestamps. This uses a lot of storage and is complicated to create at runtime.

The Self-timed Scheduling method says: Don't store exact times. Just store the "order" of execution.

For example, on Core 1, just execute Task A $\rightarrow$ Task C $\rightarrow$ Task B, in that order.

When Task A finishes, it sends a signal to Task C saying, "I'm done. Check if you're ready." If Task C is ready (it has all its data), it can start immediately.

And the result? Look at this graph.

Schedule construction time is reduced by 95%—from 100 units down to 5.

Schedule storage space is reduced by 92%—from 100 units down to 8.

This is a massive improvement, making the TConEMin method practical for use in real systems.

เนื้อคิง

** In This section covers energy-constrained scheduling with ESMM (2025) so that battery-powered systems finish fast while staying within the energy budget.

** We’ll cover four topics:

Energy-budget-first;

ESMM’s three-stage mechanism;

Results of ESMM vs HEFT—why HEFT is fast but often infeasible under a strict budget;

and the Energy-smart wrap-up: TConEMin vs ESMM—choose the method to match your constraint.

Slide 10 — Constraint-first: Energy-budget-first (second condition)

The second condition, we focus on energy-budget-first. We set the hard constraint—total energy must not exceed the budget—then minimize makespan within that limit. This fits battery-powered systems: feasibility under the budget comes first; speed is optimized second.”

“งบพลังงานมาก่อน’ เรากำหนดข้อจำกัดชัด ๆ ว่า พลังงานรวมต้องไม่เกินงบ แล้วค่อย ทำให้เสร็จเร็วที่สุด (ลด makespan) ภายใต้ข้อจำกัดนั้น แนวคิดนี้ตอบโจทย์อุปกรณ์แบตเตอรี่—เน้น ‘อยู่ในงบเสมอ’ ก่อนเรื่องความเร็วสุด ๆ”

(Bridge → Slide 11)

“Because the budget is the red line, we need a mechanism that’s fast-yet-within-budget. Next, the three stages of ESMM.”

Slide 11 — “ESMM uses a three-stage list-based flow:

1. Design-time: Pre-compute mappings for all core-failure cases—preserve throughput with minimal energy for fast lookup at runtime.

2. Energy pre-assignment: Split the global budget into fair per-task sub-budgets so early tasks can’t starve later ones.

3. Allocation (P–f): For each ready task, pick the processor–frequency pair yielding the earliest finish time while respecting the task’s energy cap.”

“ESMM ทำงานแบบ 3 ขั้นตอน

1. Design-time: คำนวณล่วงหน้าสำหรับทุกกรณี core พัง—หา mapping ที่ คง throughput และ ใช้พลังงานต่ำสุด เผื่อไว้เรียกใช้ตอนรันไทม์

2. Energy Pre-assignment: ‘ซอยงบ’ เป็น งบย่อยรายงาน อย่างยุติธรรม เพื่อกันไม่ให้ task แรก ๆ ใช้พลังงานหมดก่อน

3. Allocation (P–f pair): ถึงคิวงานไหน ก็เลือก โปรเซสเซอร์+ความถี่ ที่ทำ EFT เร็วสุด แต่ ไม่เกินงบย่อย ของงานนั้น”

(Bridge → Slide 12) “Since the per-task caps are clear, the outcomes differ sharply from methods that ignore the budget. Next, ESMM vs HEFT.”

Slide 12 — “Against HEFT: HEFT often yields the shortest makespan but ignores the budget, making it infeasible. ESMM keeps the schedule short yet always within budget, which matters more in practice. In a sample benchmark, HEFT violates the energy cap while ESMM meets it with comparable completion time.”

“เมื่อเทียบกับวิธีมาตรฐาน HEFT: HEFT มักทำตาราง สั้นมาก แต่ ไม่สนใจงบพลังงาน จึงมัก เกินงบ ในขณะที่ ESMM ทำตาราง สั้นใกล้เคียง แต่ อยู่ในงบเสมอ—ซึ่งในงานจริง ‘ความอยู่ในงบ’ สำคัญกว่า นอกจากนี้ เคสตัวอย่างเชิงตัวเลขยังชี้ว่า HEFT เสร็จเร็วแต่พลังงานเกิน, ส่วน ESMM เสร็จไวและยัง feasible ภายใต้งบ”

(Bridge → Slide 13) “Therefore, the right choice depends on your primary constraint. Finally, the Energy-smart wrap-up: TConEMin vs ESMM.”

Slide 13 — Energy-smart: TConEMin vs ESMM

“TConEMin is throughput-first: pre-design for faults, then use a gradient move to trim energy while meeting the throughput requirement.ESMM is budget-first: pre-assign per-task energy so the system never overshoots the cap, then go as fast as possible within that cap—yielding short-and-feasible schedules.

Bottom line: choose the condition, then choose the method—if your primary constraint is throughput, start with TConEMin; if it’s energy budget, start with ESMM.”

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.