The Embedded New Testament

The "Holy Bible" for embedded engineers


Project maintained by theEmbeddedGeorge Hosted on GitHub Pages — Theme by mattgraham

Error Handling and Logging

Quick Reference: Key Facts

Overview

Error handling and logging are fundamental aspects of robust embedded system design. This guide covers comprehensive error detection, classification, handling strategies, and logging mechanisms that enable developers to build reliable, debuggable, and maintainable embedded systems.

Table of Contents

  1. Core Concepts
  2. Error Classification System
  3. Error Detection Mechanisms
  4. Error Handling Strategies
  5. Logging System Design
  6. Error Recovery Mechanisms
  7. Implementation Examples
  8. Common Pitfalls
  9. Best Practices
  10. Interview Questions

Core Concepts

What is Error Handling?

Error handling is a systematic approach to:

Error Handling Lifecycle

Error Handling Flow:
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Error     │───▶│   Detection │───▶│  Analysis   │───▶│   Response  │
│  Occurs     │    │   & Capture │    │ & Logging   │    │ & Recovery  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
       │                   │                   │                   │
       ▼                   ▼                   ▼                   ▼
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Monitor   │    │   Context   │    │   Severity  │    │   Escalate  │
│  & Prevent  │    │  Collection │    │ Assessment  │    │  if Needed  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Error Handling Principles


Error Handling Lifecycle

Error Handling Flow
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Error     │───▶│   Detection │───▶│  Analysis   │───▶│   Response  │
│  Occurs     │    │   & Capture │    │ & Logging   │    │ & Recovery  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
       │                   │                   │                   │
       │                   │                   │                   │
       ▼                   ▼                   ▼                   ▼
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Monitor   │    │   Context   │    │   Severity  │    │   Escalate  │
│  & Prevent  │    │  Collection │    │ Assessment  │    │  if Needed  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Error Severity Hierarchy

Error Severity Levels
┌─────────────────────────────────────────────────────────────┐
│ FATAL: System failure, requires immediate attention        │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ System reset, safe mode entry, external notification   │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ CRITICAL: Severe error, system may be unstable            │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Safe mode, component restart, external notification     │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ ERROR: Operation failed, functionality impaired            │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Retry, fallback, graceful degradation                   │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ WARNING: Potential issue, operation continues             │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Monitor, log, user notification                         │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ INFO: Normal operation information                        │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Log for monitoring and debugging                         │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ DEBUG: Detailed debugging information                     │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Log only when debugging enabled                          │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Error Response Decision Tree

Error Response Decision Tree
┌─────────────────────────────────────────────────────────────┐
│ Error Occurs                                               │
│         │                                                  │
│         ▼                                                  │
│ ┌─────────────────┐                                        │
│ │ Analyze Error   │                                        │
│ │ Severity &      │                                        │
│ │ Category        │                                        │
│ └─────────────────┘                                        │
│         │                                                  │
│         ▼                                                  │
│ ┌─────────────────┐                                        │
│ │ Fatal Error?    │───Yes───▶│ System Reset │              │
│ └─────────────────┘         └──────────────┘              │
│         │                                                  │
│         │ No                                               │
│         ▼                                                  │
│ ┌─────────────────┐                                        │
│ │ Critical Error? │───Yes───▶│ Safe Mode    │              │
│ └─────────────────┘         └──────────────┘              │
│         │                                                  │
│         │ No                                               │
│         ▼                                                  │
│ ┌─────────────────┐                                        │
│ │ Retry Possible? │───Yes───▶│ Retry        │              │
│ └─────────────────┘         └──────────────┘              │
│         │                                                  │
│         │ No                                               │
│         ▼                                                  │
│ ┌─────────────────┐                                        │
│ │ Fallback        │                                        │
│ │ Available?      │                                        │
│ └─────────────────┘                                        │
└─────────────────────────────────────────────────────────────┘

Logging System Architecture

Structured Logging System
┌─────────────────────────────────────────────────────────────┐
│ Application Layer                                          │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ LOG(ERROR, "Operation failed")                         │ │
│ │ LOG_WITH_CONTEXT(CRITICAL, "Memory error", data, size) │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Logging Middleware                                         │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Timestamp, Component, Function, Line, Context          │ │
│ │ Severity filtering, Formatting, Routing                │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Output Destinations                                        │
│ ┌─────────────┬─────────────┬─────────────┬─────────────┐ │
│ │ Console     │ File        │ External    │ Network     │ │
│ │ Output      │ Storage     │ System      │ Logging     │ │
│ └─────────────┴─────────────┴─────────────┴─────────────┘ │
└─────────────────────────────────────────────────────────────┘

Error Classification System

Error Severity Levels

// Error severity classification
typedef enum {
    ERROR_SEVERITY_DEBUG = 0,          // Debug information only
    ERROR_SEVERITY_INFO,               // Informational messages
    ERROR_SEVERITY_WARNING,            // Warning conditions
    ERROR_SEVERITY_ERROR,              // Error conditions
    ERROR_SEVERITY_CRITICAL,           // Critical errors
    ERROR_SEVERITY_FATAL               // Fatal errors (system failure)
} error_severity_t;

// Error severity descriptions
static const char* severity_strings[] = {
    "DEBUG",
    "INFO", 
    "WARN",
    "ERROR",
    "CRITICAL",
    "FATAL"
};

// Get severity string
const char* get_severity_string(error_severity_t severity) {
    if (severity < ERROR_SEVERITY_DEBUG || severity > ERROR_SEVERITY_FATAL) {
        return "UNKNOWN";
    }
    return severity_strings[severity];
}

Error Categories and Types

// Error categories
typedef enum {
    ERROR_CATEGORY_SYSTEM = 0,         // System-level errors
    ERROR_CATEGORY_HARDWARE,           // Hardware-related errors
    ERROR_CATEGORY_SOFTWARE,           // Software-related errors
    ERROR_CATEGORY_COMMUNICATION,      // Communication errors
    ERROR_CATEGORY_TIMING,             // Timing and deadline errors
    ERROR_CATEGORY_MEMORY,             // Memory-related errors
    ERROR_CATEGORY_POWER,              // Power management errors
    ERROR_CATEGORY_SECURITY,           // Security-related errors
    ERROR_CATEGORY_USER,               // User input errors
    ERROR_CATEGORY_EXTERNAL            // External system errors
} error_category_t;

// Error types within categories
typedef enum {
    ERROR_TYPE_GENERAL = 0,            // General errors
    ERROR_TYPE_TIMEOUT,                // Timeout errors
    ERROR_TYPE_INVALID_PARAMETER,      // Invalid parameter errors
    ERROR_TYPE_RESOURCE_UNAVAILABLE,   // Resource unavailable
    ERROR_TYPE_PERMISSION_DENIED,      // Permission denied
    ERROR_TYPE_COMMUNICATION_FAILURE,  // Communication failure
    ERROR_TYPE_HARDWARE_FAULT,         // Hardware fault
    ERROR_TYPE_SOFTWARE_BUG,           // Software bug
    ERROR_TYPE_CONFIGURATION_ERROR,    // Configuration error
    ERROR_TYPE_STATE_ERROR             // Invalid state error
} error_type_t;

// Error category descriptions
static const char* category_strings[] = {
    "SYSTEM",
    "HARDWARE", 
    "SOFTWARE",
    "COMMUNICATION",
    "TIMING",
    "MEMORY",
    "POWER",
    "SECURITY",
    "USER",
    "EXTERNAL"
};

Comprehensive Error Information Structure

// Complete error information structure
typedef struct {
    uint32_t error_id;                 // Unique error identifier
    error_severity_t severity;         // Error severity level
    error_category_t category;         // Error category
    error_type_t type;                 // Error type
    uint32_t timestamp;                // Error occurrence time
    uint32_t task_id;                  // Task that detected the error
    uint32_t line_number;              // Source code line number
    const char *file_name;             // Source file name
    const char *function_name;         // Function name
    uint32_t error_code;               // Specific error code
    char     description[128];         // Error description
    uint32_t context_data[8];          // Additional context data
    uint32_t system_state;             // System state when error occurred
    uint32_t user_data;                // User-defined data
} error_info_t;

// Error context data structure
typedef struct {
    uint32_t cpu_usage;                // CPU usage percentage
    uint32_t memory_usage;             // Memory usage percentage
    uint32_t free_heap;                // Free heap memory
    uint32_t task_count;               // Number of active tasks
    uint32_t uptime_ms;                // System uptime
    uint32_t temperature;              // System temperature
    uint32_t voltage;                  // System voltage
    uint32_t clock_frequency;          // Current clock frequency
} system_context_t;

Error Detection Mechanisms

Runtime Error Detection

// Runtime error detection macros
#define ERROR_CHECK(condition, error_code, description) \
    do { \
        if (!(condition)) { \
            report_error(ERROR_SEVERITY_ERROR, ERROR_CATEGORY_SOFTWARE, \
                        ERROR_TYPE_GENERAL, error_code, description, \
                        __LINE__, __FILE__, __FUNCTION__); \
            return -1; \
        } \
    } while(0)

#define ERROR_CHECK_WARN(condition, error_code, description) \
    do { \
        if (!(condition)) { \
            report_error(ERROR_SEVERITY_WARNING, ERROR_CATEGORY_SOFTWARE, \
                        ERROR_TYPE_GENERAL, error_code, description, \
                        __LINE__, __FILE__, __FUNCTION__); \
        } \
    } while(0)

#define ERROR_CHECK_CRITICAL(condition, error_code, description) \
    do { \
        if (!(condition)) { \
            report_error(ERROR_SEVERITY_CRITICAL, ERROR_CATEGORY_SOFTWARE, \
                        ERROR_TYPE_GENERAL, error_code, description, \
                        __LINE__, __FILE__, __FUNCTION__); \
            system_panic(); \
        } \
    } while(0)

// Parameter validation macro
#define VALIDATE_PARAMETER(param, error_code) \
    ERROR_CHECK(param != NULL, error_code, "Invalid parameter: " #param " is NULL")

#define VALIDATE_RANGE(value, min, max, error_code) \
    ERROR_CHECK((value) >= (min) && (value) <= (max), error_code, \
                "Value " #value " out of range [" #min ", " #max "]")

Memory Error Detection

// Memory corruption detection
typedef struct {
    uint32_t start_magic;              // Start magic number
    uint32_t end_magic;                // End magic number
    uint32_t checksum;                 // Data checksum
    uint32_t size;                     // Data size
    uint32_t allocation_id;            // Unique allocation identifier
    uint32_t allocation_time;          // Allocation timestamp
} memory_guard_t;

#define MEMORY_GUARD_MAGIC_START      0xDEADBEEF
#define MEMORY_GUARD_MAGIC_END        0xCAFEBABE

// Add memory guards to allocated memory
void* add_memory_guards(void *data, uint32_t size) {
    if (data == NULL || size == 0) {
        return NULL;
    }
    
    // Calculate total size with guards
    uint32_t total_size = size + 2 * sizeof(memory_guard_t);
    
    // Allocate memory with guards
    uint8_t *guarded_memory = malloc(total_size);
    if (guarded_memory == NULL) {
        return NULL;
    }
    
    // Setup start guard
    memory_guard_t *start_guard = (memory_guard_t*)guarded_memory;
    start_guard->start_magic = MEMORY_GUARD_MAGIC_START;
    start_guard->size = size;
    start_guard->allocation_id = generate_allocation_id();
    start_guard->allocation_time = get_system_time_ms();
    
    // Setup end guard
    memory_guard_t *end_guard = (memory_guard_t*)(guarded_memory + total_size - sizeof(memory_guard_t));
    end_guard->end_magic = MEMORY_GUARD_MAGIC_END;
    end_guard->size = size;
    
    // Calculate checksum for data
    uint32_t checksum = calculate_checksum(
        guarded_memory + sizeof(memory_guard_t), size);
    start_guard->checksum = checksum;
    end_guard->checksum = checksum;
    
    // Return pointer to data area
    return guarded_memory + sizeof(memory_guard_t);
}

// Check memory integrity
bool check_memory_integrity(void *data) {
    if (data == NULL) {
        return false;
    }
    
    // Get guard pointers
    memory_guard_t *start_guard = (memory_guard_t*)((uint8_t*)data - sizeof(memory_guard_t));
    memory_guard_t *end_guard = (memory_guard_t*)((uint8_t*)data + start_guard->size);
    
    // Check magic numbers
    if (start_guard->start_magic != MEMORY_GUARD_MAGIC_START ||
        end_guard->end_magic != MEMORY_GUARD_MAGIC_END) {
        return false;
    }
    
    // Check size consistency
    if (start_guard->size != end_guard->size) {
        return false;
    }
    
    // Check checksum
    uint32_t calculated_checksum = calculate_checksum(data, start_guard->size);
    if (calculated_checksum != start_guard->checksum ||
        calculated_checksum != end_guard->checksum) {
        return false;
    }
    
    return true;
}

Stack Overflow Detection

// Stack overflow detection
typedef struct {
    uint32_t stack_start;              // Stack start address
    uint32_t stack_end;                // Stack end address
    uint32_t stack_size;               // Stack size
    uint32_t watermark_low;            // Low watermark threshold
    uint32_t watermark_high;           // High watermark threshold
    uint32_t pattern;                  // Pattern for corruption detection
} stack_monitor_t;

// Initialize stack monitoring
int init_stack_monitor(stack_monitor_t *monitor, uint32_t stack_start, 
                      uint32_t stack_size) {
    monitor->stack_start = stack_start;
    monitor->stack_end = stack_start + stack_size;
    monitor->stack_size = stack_size;
    monitor->watermark_low = stack_start + (stack_size * 10) / 100;  // 10% threshold
    monitor->watermark_high = stack_start + (stack_size * 90) / 100; // 90% threshold
    monitor->pattern = 0xDEADBEEF;
    
    // Fill stack with pattern
    uint32_t *stack_ptr = (uint32_t*)stack_start;
    for (uint32_t i = 0; i < stack_size / 4; i++) {
        stack_ptr[i] = monitor->pattern;
    }
    
    return 0;
}

// Check for stack overflow
bool check_stack_overflow(stack_monitor_t *monitor) {
    uint32_t current_sp = __get_MSP(); // Current stack pointer
    
    // Check if stack pointer is within valid range
    if (current_sp < monitor->stack_start || current_sp > monitor->stack_end) {
        return true; // Stack overflow detected
    }
    
    // Check low watermark
    if (current_sp < monitor->watermark_low) {
        log_warning("Stack usage high: %d bytes remaining", 
                   current_sp - monitor->stack_start);
    }
    
    // Check pattern corruption
    uint32_t *stack_ptr = (uint32_t*)monitor->stack_start;
    for (uint32_t i = 0; i < (monitor->stack_end - current_sp) / 4; i++) {
        if (stack_ptr[i] != monitor->pattern) {
            return true; // Stack corruption detected
        }
    }
    
    return false;
}

Error Handling Strategies

Error Handler Registration System

// Error handler function type
typedef int (*error_handler_t)(error_info_t *error);

// Error handler registration
typedef struct {
    error_category_t category;          // Category to handle
    error_severity_t min_severity;     // Minimum severity to handle
    error_type_t type;                 // Specific error type
    error_handler_t handler;            // Handler function
    uint32_t priority;                 // Handler priority (lower = higher)
    bool     enabled;                  // Handler enabled flag
} error_handler_registration_t;

// Error handler registry
#define MAX_ERROR_HANDLERS 32
static error_handler_registration_t error_handlers[MAX_ERROR_HANDLERS];
static uint32_t error_handler_count = 0;

// Register error handler
int register_error_handler(error_category_t category, error_severity_t min_severity,
                          error_type_t type, error_handler_t handler, uint32_t priority) {
    if (error_handler_count >= MAX_ERROR_HANDLERS) {
        return -1; // No more slots available
    }
    
    // Find insertion point to maintain priority order
    uint32_t insert_index = 0;
    for (uint32_t i = 0; i < error_handler_count; i++) {
        if (priority < error_handlers[i].priority) {
            insert_index = i;
            break;
        }
        insert_index = i + 1;
    }
    
    // Shift existing handlers
    for (uint32_t i = error_handler_count; i > insert_index; i--) {
        error_handlers[i] = error_handlers[i - 1];
    }
    
    // Insert new handler
    error_handlers[insert_index].category = category;
    error_handlers[insert_index].min_severity = min_severity;
    error_handlers[insert_index].type = type;
    error_handlers[insert_index].handler = handler;
    error_handlers[insert_index].priority = priority;
    error_handlers[insert_index].enabled = true;
    
    error_handler_count++;
    return 0;
}

Error Response and Recovery

// Error response actions
typedef enum {
    ERROR_ACTION_NONE = 0,             // No action required
    ERROR_ACTION_LOG,                  // Log error only
    ERROR_ACTION_RETRY,                // Retry operation
    ERROR_ACTION_FALLBACK,             // Use fallback method
    ERROR_ACTION_RESTART,              // Restart component
    ERROR_ACTION_RESET,                // Reset system
    ERROR_ACTION_PANIC                 // System panic
} error_action_t;

// Error response configuration
typedef struct {
    error_severity_t severity;         // Error severity
    error_action_t action;             // Required action
    uint32_t retry_count;             // Number of retry attempts
    uint32_t retry_delay_ms;          // Delay between retries
    uint32_t timeout_ms;              // Action timeout
    bool     escalate_on_failure;      // Escalate if action fails
} error_response_config_t;

// Execute error response
int execute_error_response(error_info_t *error, error_response_config_t *config) {
    int result = 0;
    
    switch (config->action) {
        case ERROR_ACTION_NONE:
            break;
            
        case ERROR_ACTION_LOG:
            log_error_info(error);
            break;
            
        case ERROR_ACTION_RETRY:
            result = retry_operation(error, config->retry_count, config->retry_delay_ms);
            break;
            
        case ERROR_ACTION_FALLBACK:
            result = execute_fallback_operation(error);
            break;
            
        case ERROR_ACTION_RESTART:
            result = restart_component(error);
            break;
            
        case ERROR_ACTION_RESET:
            result = reset_system(error);
            break;
            
        case ERROR_ACTION_PANIC:
            system_panic(error);
            break;
    }
    
    // Check if escalation is needed
    if (result != 0 && config->escalate_on_failure) {
        escalate_error(error);
    }
    
    return result;
}

Logging System Design

Log Entry Structure

// Log entry structure
typedef struct {
    uint32_t log_id;                   // Unique log identifier
    uint32_t timestamp;                // Log timestamp
    error_severity_t severity;         // Log severity level
    error_category_t category;         // Log category
    uint32_t task_id;                  // Source task identifier
    uint32_t line_number;              // Source line number
    const char *file_name;             // Source file name
    const char *function_name;         // Source function name
    char     message[256];             // Log message
    uint32_t context_data[4];          // Additional context data
    uint32_t sequence_number;          // Sequential log number
} log_entry_t;

// Log level configuration
typedef struct {
    error_severity_t min_level;        // Minimum log level
    bool     enable_timestamp;          // Enable timestamp logging
    bool     enable_location;           // Enable source location logging
    bool     enable_context;            // Enable context data logging
    uint32_t max_entries;              // Maximum log entries
    uint32_t buffer_size;              // Log buffer size
} log_config_t;

Logging Implementation

// Logging system
typedef struct {
    log_entry_t *entries;              // Log entries buffer
    uint32_t entry_count;              // Number of entries
    uint32_t max_entries;              // Maximum entries
    uint32_t write_index;              // Current write index
    uint32_t read_index;               // Current read index
    bool     buffer_full;              // Buffer full flag
    log_config_t config;               // Logging configuration
    uint32_t sequence_counter;         // Sequence counter
} logging_system_t;

// Initialize logging system
int init_logging_system(logging_system_t *logger, log_config_t *config) {
    // Allocate log buffer
    logger->entries = malloc(config->max_entries * sizeof(log_entry_t));
    if (logger->entries == NULL) {
        return -1;
    }
    
    // Initialize logger state
    logger->entry_count = 0;
    logger->max_entries = config->max_entries;
    logger->write_index = 0;
    logger->read_index = 0;
    logger->buffer_full = false;
    logger->config = *config;
    logger->sequence_counter = 0;
    
    // Clear buffer
    memset(logger->entries, 0, config->max_entries * sizeof(log_entry_t));
    
    return 0;
}

// Add log entry
int add_log_entry(logging_system_t *logger, error_severity_t severity,
                  error_category_t category, const char *message, 
                  uint32_t line_number, const char *file_name, 
                  const char *function_name) {
    // Check if logging is enabled for this severity
    if (severity < logger->config.min_level) {
        return 0; // Logging disabled for this level
    }
    
    // Create log entry
    log_entry_t *entry = &logger->entries[logger->write_index];
    entry->log_id = generate_log_id();
    entry->timestamp = get_system_time_ms();
    entry->severity = severity;
    entry->category = category;
    entry->task_id = get_current_task_id();
    entry->line_number = line_number;
    entry->file_name = file_name;
    entry->function_name = function_name;
    entry->sequence_number = logger->sequence_counter++;
    
    // Copy message (truncate if too long)
    strncpy(entry->message, message, sizeof(entry->message) - 1);
    entry->message[sizeof(entry->message) - 1] = '\0';
    
    // Update buffer state
    if (logger->buffer_full) {
        // Buffer is full, move read index
        logger->read_index = (logger->read_index + 1) % logger->max_entries;
    } else {
        logger->entry_count++;
    }
    
    // Move write index
    logger->write_index = (logger->write_index + 1) % logger->max_entries;
    
    // Check if buffer is now full
    if (logger->write_index == logger->read_index) {
        logger->buffer_full = true;
    }
    
    return 0;
}

Log Output and Formatting

// Log output formats
typedef enum {
    LOG_FORMAT_SIMPLE,                 // Simple text format
    LOG_FORMAT_DETAILED,               // Detailed format with context
    LOG_FORMAT_JSON,                   // JSON format
    LOG_FORMAT_CSV,                    // CSV format
    LOG_FORMAT_BINARY                  // Binary format for analysis
} log_format_t;

// Format log entry for output
int format_log_entry(logging_system_t *logger, log_entry_t *entry, 
                    log_format_t format, char *output_buffer, uint32_t buffer_size) {
    switch (format) {
        case LOG_FORMAT_SIMPLE:
            return format_log_simple(entry, output_buffer, buffer_size);
            
        case LOG_FORMAT_DETAILED:
            return format_log_detailed(entry, output_buffer, buffer_size);
            
        case LOG_FORMAT_JSON:
            return format_log_json(entry, output_buffer, buffer_size);
            
        case LOG_FORMAT_CSV:
            return format_log_csv(entry, output_buffer, buffer_size);
            
        case LOG_FORMAT_BINARY:
            return format_log_binary(entry, output_buffer, buffer_size);
            
        default:
            return -1;
    }
}

// Simple text format
int format_log_simple(log_entry_t *entry, char *output_buffer, uint32_t buffer_size) {
    const char *severity_str = get_severity_string(entry->severity);
    const char *category_str = category_strings[entry->category];
    
    int written = snprintf(output_buffer, buffer_size,
                          "[%s] %s: %s\n",
                          severity_str, category_str, entry->message);
    
    return (written < buffer_size) ? written : -1;
}

// Detailed format
int format_log_detailed(log_entry_t *entry, char *output_buffer, uint32_t buffer_size) {
    const char *severity_str = get_severity_string(entry->severity);
    const char *category_str = category_strings[entry->category];
    
    int written = snprintf(output_buffer, buffer_size,
                          "[%08X] %s [%s] %s: %s\n"
                          "  File: %s:%d\n"
                          "  Function: %s\n"
                          "  Task: %d\n"
                          "  Time: %d ms\n",
                          entry->log_id, severity_str, category_str, 
                          entry->message, entry->file_name, entry->line_number,
                          entry->function_name, entry->task_id, entry->timestamp);
    
    return (written < buffer_size) ? written : -1;
}

Error Recovery Mechanisms

Automatic Error Recovery

// Error recovery manager
typedef struct {
    uint32_t recovery_attempts;         // Number of recovery attempts
    uint32_t max_recovery_attempts;     // Maximum recovery attempts
    uint32_t recovery_cooldown_ms;      // Cooldown period between attempts
    uint32_t last_recovery_time;        // Timestamp of last recovery
    bool     automatic_recovery_enabled; // Automatic recovery flag
    error_action_t default_action;      // Default recovery action
} error_recovery_manager_t;

// Perform automatic error recovery
int perform_automatic_recovery(error_recovery_manager_t *manager, 
                              error_info_t *error) {
    uint32_t current_time = get_system_time_ms();
    
    // Check if enough time has passed since last recovery
    if (current_time - manager->last_recovery_time < manager->recovery_cooldown_ms) {
        log_warning("Recovery cooldown period not expired");
        return -1;
    }
    
    // Check if maximum attempts exceeded
    if (manager->recovery_attempts >= manager->max_recovery_attempts) {
        log_error("Maximum recovery attempts exceeded");
        return -1;
    }
    
    // Determine recovery action based on error
    error_action_t action = determine_recovery_action(error);
    if (action == ERROR_ACTION_NONE) {
        action = manager->default_action;
    }
    
    // Execute recovery action
    int result = execute_error_response(error, get_response_config(action));
    
    if (result == 0) {
        manager->recovery_attempts++;
        manager->last_recovery_time = current_time;
        log_info("Automatic recovery successful (attempt %d/%d)", 
                manager->recovery_attempts, manager->max_recovery_attempts);
    }
    
    return result;
}

Error Escalation System

// Error escalation levels
typedef enum {
    ESCALATION_LEVEL_NONE = 0,         // No escalation
    ESCALATION_LEVEL_LOG,              // Log error
    ESCALATION_LEVEL_NOTIFY,           // Notify operator
    ESCALATION_LEVEL_ALERT,            // Send alert
    ESCALATION_LEVEL_EMERGENCY,        // Emergency response
    ESCALATION_LEVEL_SHUTDOWN          // System shutdown
} escalation_level_t;

// Escalation configuration
typedef struct {
    error_severity_t severity;         // Error severity for escalation
    escalation_level_t level;          // Escalation level
    uint32_t timeout_ms;               // Escalation timeout
    bool     require_acknowledgment;    // Require operator acknowledgment
    void     (*escalation_handler)(error_info_t *error); // Escalation function
} escalation_config_t;

// Escalate error
int escalate_error(error_info_t *error) {
    escalation_config_t *config = get_escalation_config(error->severity);
    if (config == NULL) {
        return -1;
    }
    
    // Execute escalation handler
    if (config->escalation_handler != NULL) {
        config->escalation_handler(error);
    }
    
    // Log escalation
    log_error("Error escalated to level %d: %s", 
              config->level, error->description);
    
    return 0;
}

Implementation Examples

Complete Error Handling System

// Complete error handling system
typedef struct {
    logging_system_t logger;            // Logging system
    error_recovery_manager_t recovery;  // Error recovery manager
    error_handler_registration_t handlers[MAX_ERROR_HANDLERS]; // Error handlers
    uint32_t handler_count;             // Number of registered handlers
    bool     system_enabled;            // System enabled flag
} error_handling_system_t;

// Initialize error handling system
int init_error_handling_system(error_handling_system_t *system) {
    int result = 0;
    
    // 1. Initialize logging system
    log_config_t log_config = {
        .min_level = ERROR_SEVERITY_INFO,
        .enable_timestamp = true,
        .enable_location = true,
        .enable_context = true,
        .max_entries = 1000,
        .buffer_size = 64000
    };
    
    result = init_logging_system(&system->logger, &log_config);
    if (result != 0) {
        return result;
    }
    
    // 2. Initialize recovery manager
    system->recovery.recovery_attempts = 0;
    system->recovery.max_recovery_attempts = 5;
    system->recovery.recovery_cooldown_ms = 10000;
    system->recovery.automatic_recovery_enabled = true;
    system->recovery.default_action = ERROR_ACTION_RESTART;
    
    // 3. Initialize handler registry
    system->handler_count = 0;
    
    // 4. Register default error handlers
    register_default_error_handlers(system);
    
    system->system_enabled = true;
    
    log_info(&system->logger, "Error handling system initialized successfully");
    return 0;
}

// Main error handling function
int handle_error(error_handling_system_t *system, error_info_t *error) {
    if (!system->system_enabled) {
        return -1;
    }
    
    // 1. Log the error
    add_log_entry(&system->logger, error->severity, error->category,
                  error->description, error->line_number, error->file_name,
                  error->function_name);
    
    // 2. Execute registered handlers
    for (uint32_t i = 0; i < system->handler_count; i++) {
        error_handler_registration_t *handler = &system->handlers[i];
        
        if (handler->enabled && 
            (handler->category == ERROR_CATEGORY_SYSTEM || 
             handler->category == error->category) &&
            error->severity >= handler->min_severity &&
            (handler->type == ERROR_TYPE_GENERAL || 
             handler->type == error->type)) {
            
            handler->handler(error);
        }
    }
    
    // 3. Attempt automatic recovery if enabled
    if (system->recovery.automatic_recovery_enabled) {
        perform_automatic_recovery(&system->recovery, error);
    }
    
    // 4. Escalate if necessary
    if (error->severity >= ERROR_SEVERITY_CRITICAL) {
        escalate_error(error);
    }
    
    return 0;
}

Common Pitfalls

1. Insufficient Error Context

2. Poor Error Classification

3. Inadequate Recovery Mechanisms

4. Memory-Intensive Logging


Best Practices

1. Comprehensive Error Information

2. Systematic Error Classification

3. Progressive Recovery Strategy

4. Efficient Logging System

5. Testing and Validation


Interview Questions

Basic Level

  1. What is the purpose of error handling in embedded systems?
    • Detect failures, classify errors, enable recovery, prevent system failures
  2. What are the main components of an error handling system?
    • Error detection, classification, logging, recovery, escalation
  3. How do you classify errors by severity?
    • Use severity levels (DEBUG, INFO, WARN, ERROR, CRITICAL, FATAL)

Intermediate Level

  1. How would you implement a comprehensive error logging system?
    • Circular buffer, multiple formats, context capture, efficient storage
  2. What are the challenges in implementing automatic error recovery?
    • Error classification, recovery strategy selection, escalation management
  3. How do you prevent error handling from consuming too much memory?
    • Circular buffers, log level filtering, efficient data structures

Advanced Level

  1. How would you design an error handling system for a distributed embedded network?
    • Centralized logging, error propagation, coordinated recovery, network awareness
  2. What are the performance implications of different error handling strategies?
    • Memory usage, CPU overhead, recovery time, system availability
  3. How do you implement fault-tolerant error handling in safety-critical systems?
    • Redundant systems, voting mechanisms, fail-safe operation, formal verification

Practical Coding Questions

  1. Implement a basic error logging system with circular buffer
  2. Design an error classification and handling system
  3. Create an automatic error recovery manager
  4. Implement error escalation with timeout handling
  5. Design a multi-format log output system

Guided Labs

Lab 1: Error Classification System

  1. Design: Error classification scheme for your system
  2. Implement: Error severity levels and categories
  3. Add: Error code mapping and descriptions
  4. Test: Error classification with different scenarios

Lab 2: Fail-Safe Error Handling

  1. Implement: Error response strategies for different severities
  2. Add: Automatic retry and fallback mechanisms
  3. Create: Safe mode entry and system reset functions
  4. Test: Error handling with intentional failures

Lab 3: Structured Logging System

  1. Design: Log entry structure with context information
  2. Implement: Logging functions with severity filtering
  3. Add: Multiple output destinations (console, file, external)
  4. Test: Logging system with different scenarios and levels

Check Yourself

Understanding Check

Application Check

Analysis Check

Conclusion

Error handling and logging are fundamental to building robust embedded systems. A well-designed error handling system provides:

The key to successful error handling implementation lies in:

By following these principles and implementing the techniques discussed in this guide, developers can create robust, reliable, and maintainable embedded systems that gracefully handle errors and provide excellent debugging capabilities.