The "Holy Bible" for embedded engineers
Understanding performance profiling through concepts, not just code. Learn why performance matters and how to think about system optimization.
Concept: Performance profiling is like being a detective investigating why your embedded system isn’t running as fast or efficiently as it should be. It’s about measuring what’s actually happening rather than guessing.
Why it matters: In embedded systems, performance directly affects battery life, responsiveness, and whether you can meet real-time deadlines. Without profiling, you’re optimizing blindly and might waste time on the wrong things.
Minimal example: A simple LED blinking program that should run every 100ms but sometimes takes 150ms. Profiling reveals that a sensor reading function is occasionally taking too long.
Try it: Start with a simple program and measure its performance, then add complexity and observe how performance changes.
Takeaways: Performance profiling gives you data to make informed decisions about optimization, ensuring you focus on the real bottlenecks rather than perceived problems.
Performance profiling is the systematic measurement and analysis of how your system behaves in terms of speed, memory usage, and resource consumption. It’s like having a dashboard that shows you exactly what’s happening under the hood.
┌─────────────────────────────────────────────────────────────┐
│ Performance Profiling Overview │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ System │───▶│ Profiling │───▶│ Analysis │ │
│ │ Running │ │ Tools │ │ Results │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ CPU Time │ │ Memory │ │ Timing │ │
│ │ Usage │ │ Usage │ │ Data │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ The goal: Find bottlenecks and optimization opportunities │
└─────────────────────────────────────────────────────────────┘
Guessing Approach:
┌─────────────────────────────────────────────────────────────┐
│ Guessing vs Profiling │
├─────────────────────────────────────────────────────────────┤
│ │
│ ❌ Guessing: │
│ "I think the problem is in the sensor reading function" │
│ │
│ • Spend hours optimizing sensor code │
│ • Performance improves by 5% │
│ • Real bottleneck was elsewhere │
│ • Wasted time and effort │
│ │
│ ✅ Profiling: │
│ "Let me measure where the time is actually spent" │
│ │
│ • Identify actual bottlenecks │
│ • Focus optimization efforts │
│ • Measure real improvements │
│ • Efficient use of time │
└─────────────────────────────────────────────────────────────┘
Different types of profiling give you different insights:
┌─────────────────────────────────────────────────────────────┐
│ Performance Metrics │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ CPU │ │ Memory │ │ Timing │ │
│ │ Profiling │ │ Profiling │ │ Profiling │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ • Function execution time │
│ • CPU utilization │
│ • Call frequency │
│ │
│ • Memory allocation │
│ • Memory leaks │
│ • Fragmentation │
│ │
│ • Response time │
│ • Jitter │
│ • Deadline compliance │
│ │
│ Each metric tells a different story about performance │
└─────────────────────────────────────────────────────────────┘
There are two main approaches to profiling:
Instrumentation (Code Insertion):
┌─────────────────────────────────────────────────────────────┐
│ Instrumentation Profiling │
├─────────────────────────────────────────────────────────────┤
│ │
│ Original Code: │
│ void readSensor() { │
│ sensor_value = read_adc(); │
│ process_data(sensor_value); │
│ } │
│ │
│ Instrumented Code: │
│ void readSensor() { │
│ uint32_t start_time = get_timer(); │
│ sensor_value = read_adc(); │
│ uint32_t adc_time = get_timer() - start_time; │
│ update_profile("ADC_READ", adc_time); │
│ │
│ start_time = get_timer(); │
│ process_data(sensor_value); │
│ uint32_t process_time = get_timer() - start_time; │
│ update_profile("PROCESS", process_time); │
│ } │
│ │
│ ✅ Precise measurements │
│ ❌ Code overhead │
│ ❌ Changes program behavior │
└─────────────────────────────────────────────────────────────┘
Sampling (Statistical):
┌─────────────────────────────────────────────────────────────┐
│ Sampling Profiling │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Timer │ │ Sample │ │ Analyze │ │
│ │ Interrupt │ │ Current │ │ Results │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Every 1ms: │
│ • Check what function is running │
│ • Increment counter for that function │
│ • Continue normal execution │
│ │
│ ✅ Minimal overhead │
│ ✅ No code changes │
│ ❌ Less precise │
│ ❌ May miss short functions │
└─────────────────────────────────────────────────────────────┘
Choose Instrumentation When:
Choose Sampling When:
CPU profiling reveals where your program spends its time:
┌─────────────────────────────────────────────────────────────┐
│ CPU Profiling Results │
├─────────────────────────────────────────────────────────────┤
│ │
│ Function Name │ Calls │ Total Time │ % of Total │
│ ────────────────────────────────────────────────────────── │
│ read_sensor() │ 1000 │ 50ms │ 50% │
│ process_data() │ 1000 │ 30ms │ 30% │
│ send_data() │ 100 │ 15ms │ 15% │
│ main_loop() │ 1000 │ 5ms │ 5% │
│ │
│ Insights: │
│ • read_sensor() is the biggest time consumer │
│ • process_data() is the second biggest │
│ • send_data() is called less but takes significant time │
│ • main_loop() overhead is minimal │
│ │
│ Optimization Strategy: │
│ • Focus on read_sensor() first │
│ • Then optimize process_data() │
│ • Consider batching send_data() calls │
└─────────────────────────────────────────────────────────────┘
I/O Operations:
Computational Complexity:
Memory Access Patterns:
System Calls:
Memory profiling shows how your program uses memory over time:
┌─────────────────────────────────────────────────────────────┐
│ Memory Usage Over Time │
├─────────────────────────────────────────────────────────────┤
│ │
│ Memory Usage (bytes) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ██████████████████████████████████████████████ │ │
│ │ ████████████████████████████████████████████████ │ │
│ │ ██████████████████████████████████████████████████ │ │
│ │ ██████████████████████████████████████████████████ │ │
│ │ ██████████████████████████████████████████████████ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────┘ │
│ ↑ ↑ ↑ ↑ │
│ 0s 10s 20s 30s │
│ │
│ ❌ Memory leak detected! │
│ • Memory usage keeps growing │
│ • No apparent reason for increase │
│ • System will eventually run out of memory │
└─────────────────────────────────────────────────────────────┘
Allocation Patterns:
Memory Leaks:
Fragmentation:
In embedded systems, timing is often more critical than raw speed:
┌─────────────────────────────────────────────────────────────┐
│ Timing Requirements │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Task A │ │ Task B │ │ Task C │ │
│ │ 100ms │ │ 500ms │ │ 1000ms │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Timing Constraints: │
│ • Task A must complete within 100ms │
│ • Task B must complete within 500ms │
│ • Task C must complete within 1000ms │
│ │
│ Performance Goal: │
│ • Meet all deadlines consistently │
│ • Minimize jitter (timing variation) │
│ • Predictable response times │
└─────────────────────────────────────────────────────────────┘
Jitter is the variation in timing - it’s often more important than average performance:
┌─────────────────────────────────────────────────────────────┐
│ Jitter Analysis │
├─────────────────────────────────────────────────────────────┤
│ │
│ Low Jitter (Good): │
│ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ │
│ │█│ │█│ │█│ │█│ │█│ │█│ │█│ │█│ │
│ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ │
│ Consistent 100ms intervals │
│ │
│ High Jitter (Bad): │
│ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ │
│ │█│ │█│ │█│ │█│ │█│ │█│ │
│ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ │
│ Variable intervals: 80ms, 120ms, 90ms, 130ms │
│ │
│ High jitter can cause: │
│ • Missed deadlines │
│ • Unpredictable behavior │
│ • System instability │
└─────────────────────────────────────────────────────────────┘
Objective: Understand how to measure basic performance.
Setup: Create a simple program that performs a repetitive task.
Steps:
Expected Outcome: Understanding of basic timing measurement and the concept of jitter.
Objective: Learn to profile individual functions.
Setup: Create a program with multiple functions of different complexities.
Steps:
Expected Outcome: Understanding of how to identify performance bottlenecks in code.
Objective: Learn to profile memory usage.
Setup: Create a program that allocates and frees memory.
Steps:
Expected Outcome: Understanding of memory profiling and leak detection.