The Embedded New Testament

The "Holy Bible" for embedded engineers


Project maintained by theEmbeddedGeorge Hosted on GitHub Pages — Theme by mattgraham

Performance Profiling for Embedded Systems

Understanding performance profiling through concepts, not just code. Learn why performance matters and how to think about system optimization.

📋 Table of Contents


Concept → Why it matters → Minimal example → Try it → Takeaways

Concept: Performance profiling is like being a detective investigating why your embedded system isn’t running as fast or efficiently as it should be. It’s about measuring what’s actually happening rather than guessing.

Why it matters: In embedded systems, performance directly affects battery life, responsiveness, and whether you can meet real-time deadlines. Without profiling, you’re optimizing blindly and might waste time on the wrong things.

Minimal example: A simple LED blinking program that should run every 100ms but sometimes takes 150ms. Profiling reveals that a sensor reading function is occasionally taking too long.

Try it: Start with a simple program and measure its performance, then add complexity and observe how performance changes.

Takeaways: Performance profiling gives you data to make informed decisions about optimization, ensuring you focus on the real bottlenecks rather than perceived problems.


📋 Quick Reference: Key Facts

Performance Profiling Fundamentals

Profiling Techniques

Key Performance Metrics

Common Bottlenecks


🧠 Core Concepts

What is Performance Profiling?

Performance profiling is the systematic measurement and analysis of how your system behaves in terms of speed, memory usage, and resource consumption. It’s like having a dashboard that shows you exactly what’s happening under the hood.

┌─────────────────────────────────────────────────────────────┐
│                    Performance Profiling Overview            │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    │
│  │   System    │───▶│  Profiling  │───▶│   Analysis  │    │
│  │  Running    │    │   Tools     │    │   Results   │    │
│  └─────────────┘    └─────────────┘    └─────────────┘    │
│         │                   │                   │          │
│         ▼                   ▼                   ▼          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    │
│  │   CPU Time  │    │  Memory     │    │   Timing    │    │
│  │   Usage     │    │  Usage      │    │   Data      │    │
│  └─────────────┘    └─────────────┘    └─────────────┘    │
│                                                           │
│  The goal: Find bottlenecks and optimization opportunities │
└─────────────────────────────────────────────────────────────┘

Why Profile Instead of Guess?

Guessing Approach:

┌─────────────────────────────────────────────────────────────┐
│                    Guessing vs Profiling                   │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  ❌ Guessing:                                              │
│  "I think the problem is in the sensor reading function"  │
│                                                           │
│  • Spend hours optimizing sensor code                     │
│  • Performance improves by 5%                             │
│  • Real bottleneck was elsewhere                          │
│  • Wasted time and effort                                 │
│                                                           │
│  ✅ Profiling:                                             │
│  "Let me measure where the time is actually spent"        │
│                                                           │
│  • Identify actual bottlenecks                            │
│  • Focus optimization efforts                             │
│  • Measure real improvements                              │
│  • Efficient use of time                                  │
└─────────────────────────────────────────────────────────────┘

Performance Metrics That Matter

Different types of profiling give you different insights:

┌─────────────────────────────────────────────────────────────┐
│                    Performance Metrics                     │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │   CPU       │  │   Memory    │  │   Timing    │        │
│  │ Profiling   │  │ Profiling   │  │ Profiling   │        │
│  └─────────────┘  └─────────────┘  └─────────────┘        │
│                                                           │
│  • Function execution time                                │
│  • CPU utilization                                        │
│  • Call frequency                                         │
│                                                           │
│  • Memory allocation                                      │
│  • Memory leaks                                           │
│  • Fragmentation                                          │
│                                                           │
│  • Response time                                          │
│  • Jitter                                                 │
│  • Deadline compliance                                     │
│                                                           │
│  Each metric tells a different story about performance    │
└─────────────────────────────────────────────────────────────┘

🔍 Profiling Techniques

Instrumentation vs Sampling

There are two main approaches to profiling:

Instrumentation (Code Insertion):

┌─────────────────────────────────────────────────────────────┐
│                    Instrumentation Profiling               │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  Original Code:                                           │
│  void readSensor() {                                      │
│      sensor_value = read_adc();                           │
│      process_data(sensor_value);                          │
│  }                                                        │
│                                                           │
│  Instrumented Code:                                       │
│  void readSensor() {                                      │
│      uint32_t start_time = get_timer();                   │
│      sensor_value = read_adc();                           │
│      uint32_t adc_time = get_timer() - start_time;        │
│      update_profile("ADC_READ", adc_time);                │
│                                                           │
│      start_time = get_timer();                            │
│      process_data(sensor_value);                          │
│      uint32_t process_time = get_timer() - start_time;    │
│      update_profile("PROCESS", process_time);              │
│  }                                                        │
│                                                           │
│  ✅ Precise measurements                                  │
│  ❌ Code overhead                                         │
│  ❌ Changes program behavior                              │
└─────────────────────────────────────────────────────────────┘

Sampling (Statistical):

┌─────────────────────────────────────────────────────────────┐
│                    Sampling Profiling                     │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │   Timer     │  │   Sample    │  │   Analyze   │        │
│  │  Interrupt  │  │   Current   │  │   Results   │        │
│  └─────────────┘  └─────────────┘  └─────────────┘        │
│                                                           │
│  Every 1ms:                                               │
│  • Check what function is running                          │
│  • Increment counter for that function                     │
│  • Continue normal execution                               │
│                                                           │
│  ✅ Minimal overhead                                      │
│  ✅ No code changes                                       │
│  ❌ Less precise                                          │
│  ❌ May miss short functions                              │
└─────────────────────────────────────────────────────────────┘

When to Use Each Technique

Choose Instrumentation When:

Choose Sampling When:


CPU Profiling

What CPU Profiling Tells You

CPU profiling reveals where your program spends its time:

┌─────────────────────────────────────────────────────────────┐
│                    CPU Profiling Results                   │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  Function Name          │ Calls │ Total Time │ % of Total │
│  ────────────────────────────────────────────────────────── │
│  read_sensor()          │  1000 │     50ms   │    50%     │
│  process_data()         │  1000 │     30ms   │    30%     │
│  send_data()            │   100 │     15ms   │    15%     │
│  main_loop()            │  1000 │      5ms   │     5%     │
│                                                           │
│  Insights:                                               │
│  • read_sensor() is the biggest time consumer            │
│  • process_data() is the second biggest                  │
│  • send_data() is called less but takes significant time │
│  • main_loop() overhead is minimal                       │
│                                                           │
│  Optimization Strategy:                                   │
│  • Focus on read_sensor() first                          │
│  • Then optimize process_data()                          │
│  • Consider batching send_data() calls                   │
└─────────────────────────────────────────────────────────────┘

Common CPU Bottlenecks

I/O Operations:

Computational Complexity:

Memory Access Patterns:

System Calls:


💾 Memory Profiling

What Memory Profiling Reveals

Memory profiling shows how your program uses memory over time:

┌─────────────────────────────────────────────────────────────┐
│                    Memory Usage Over Time                  │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  Memory Usage (bytes)                                     │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │    ██████████████████████████████████████████████   │   │
│  │   ████████████████████████████████████████████████  │   │
│  │  ██████████████████████████████████████████████████ │   │
│  │  ██████████████████████████████████████████████████ │   │
│  │  ██████████████████████████████████████████████████ │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│  ↑           ↑           ↑           ↑                    │
│  0s          10s         20s         30s                  │
│                                                           │
│  ❌ Memory leak detected!                                 │
│  • Memory usage keeps growing                            │
│  • No apparent reason for increase                        │
│  • System will eventually run out of memory              │
└─────────────────────────────────────────────────────────────┘

Memory Profiling Metrics

Allocation Patterns:

Memory Leaks:

Fragmentation:


⏱️ Timing Profiling

Real-Time Performance

In embedded systems, timing is often more critical than raw speed:

┌─────────────────────────────────────────────────────────────┐
│                    Timing Requirements                     │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │   Task A    │  │   Task B    │  │   Task C    │        │
│  │  100ms      │  │  500ms      │  │  1000ms     │        │
│  └─────────────┘  └─────────────┘  └─────────────┘        │
│                                                           │
│  Timing Constraints:                                       │
│  • Task A must complete within 100ms                      │
│  • Task B must complete within 500ms                      │
│  • Task C must complete within 1000ms                     │
│                                                           │
│  Performance Goal:                                         │
│  • Meet all deadlines consistently                        │
│  • Minimize jitter (timing variation)                     │
│  • Predictable response times                             │
└─────────────────────────────────────────────────────────────┘

Jitter Analysis

Jitter is the variation in timing - it’s often more important than average performance:

┌─────────────────────────────────────────────────────────────┐
│                    Jitter Analysis                        │
├─────────────────────────────────────────────────────────────┤
│                                                           │
│  Low Jitter (Good):                                       │
│  ┌─┐  ┌─┐  ┌─┐  ┌─┐  ┌─┐  ┌─┐  ┌─┐  ┌─┐              │
│  │█│  │█│  │█│  │█│  │█│  │█│  │█│  │█│              │
│  └─┘  └─┘  └─┘  └─┘  └─┘  └─┘  └─┘  └─┘              │
│  Consistent 100ms intervals                               │
│                                                           │
│  High Jitter (Bad):                                       │
│  ┌─┐    ┌─┐  ┌─┐      ┌─┐    ┌─┐  ┌─┐                  │
│  │█│    │█│  │█│      │█│    │█│  │█│                  │
│  └─┘    └─┘  └─┘      └─┘    └─┘  └─┘                  │
│  Variable intervals: 80ms, 120ms, 90ms, 130ms            │
│                                                           │
│  High jitter can cause:                                   │
│  • Missed deadlines                                       │
│  • Unpredictable behavior                                 │
│  • System instability                                     │
└─────────────────────────────────────────────────────────────┘

🧪 Guided Labs

Lab 1: Basic Timing Measurement

Objective: Understand how to measure basic performance.

Setup: Create a simple program that performs a repetitive task.

Steps:

  1. Create a function that does some work (e.g., mathematical calculations)
  2. Measure how long it takes to execute
  3. Run it multiple times and observe timing variations
  4. Identify sources of timing variation

Expected Outcome: Understanding of basic timing measurement and the concept of jitter.

Lab 2: Function Profiling

Objective: Learn to profile individual functions.

Setup: Create a program with multiple functions of different complexities.

Steps:

  1. Implement simple profiling for each function
  2. Run the program and collect timing data
  3. Analyze which functions take the most time
  4. Identify optimization opportunities

Expected Outcome: Understanding of how to identify performance bottlenecks in code.

Lab 3: Memory Usage Analysis

Objective: Learn to profile memory usage.

Setup: Create a program that allocates and frees memory.

Steps:

  1. Implement memory allocation tracking
  2. Run the program and monitor memory usage
  3. Introduce a memory leak and observe the effect
  4. Fix the leak and verify the fix

Expected Outcome: Understanding of memory profiling and leak detection.


Check Yourself

Understanding Check

Application Check

Analysis Check


Further Reading

Industry Standards