Skip to main content

Alert Troubleshooting

This guide covers common alert scenarios and their resolutions.

Critical Alert Playbooks

Device Offline (CONN_001)

Immediate Actions:

  1. Check Console - Verify device shows as offline
  2. Ping device (if on same network)
    ping {device-ip}
  3. Check physical - Power, network cable, indicator lights

If device unreachable:

  • Check network switch/router
  • Verify VLAN configuration
  • Check PoE power budget

If device reachable but still showing offline:

  • Access camera web UI
  • Check Anava ACAP status (Apps → Anava Agent)
  • Restart ACAP if stopped
  • Check MQTT configuration hasn't changed

Certificate Validation Failed (SEC_001)

This is a security-critical alert. Investigate immediately.

  1. Check certificate expiry

    Console → Devices → [Device] → Certificates
  2. Verify CA chain

    • Device should trust Anava root CA
    • Check for CA mismatch
  3. Investigate network path

    • Could indicate MITM attempt
    • Check for proxy interception
    • Verify DNS resolution
  4. If legitimate expiry:

    • Initiate certificate rotation
    • Console → Devices → Actions → Rotate Certificate

Unauthorized Broker Attempt (SEC_002)

Security incident - treat as high priority.

  1. Check ConfigGuardian logs

    Console → Devices → [Device] → Logs → Filter: ConfigGuardian
  2. Identify the attempted broker

    • Alert context shows attempted hostname/IP
  3. Determine source:

    • Manual configuration change?
    • Network-level redirect?
    • Malicious tampering?
  4. Response:

    • Verify device is now connected to correct broker
    • Check for other affected devices
    • Review network security

Common Scenarios

Cluster of Devices Go Offline

Pattern: Multiple devices in same location offline simultaneously

Likely causes:

  • Network switch/router failure
  • PoE switch overload
  • ISP/WAN outage
  • DHCP server issue

Investigation:

  1. Check if all devices are in same subnet/location
  2. Verify network infrastructure status
  3. Check DHCP lease availability
  4. Contact network team

Repeated Configuration Conflicts (CFG_003)

Pattern: Same device shows CFG_002 (healed) followed by CFG_003 (conflict) repeatedly

Likely causes:

  • Someone manually changing camera settings
  • Third-party integration modifying MQTT config
  • Script or automation fighting ConfigGuardian

Resolution:

  1. Identify who/what is making changes
  2. If legitimate: Update golden config through Console
  3. If unauthorized: Investigate access, review camera audit logs

Memory Slowly Increasing

Pattern: RES_002 alerts appearing on same device over days/weeks

Likely causes:

  • Memory leak in ACAP
  • Increasing number of concurrent operations
  • Camera firmware issue

Resolution:

  1. Check ACAP version - upgrade if outdated
  2. Review skill configuration - reduce if excessive
  3. Schedule regular ACAP restart as workaround
  4. Report to support with device diagnostics

Certificate Expiry Wave

Pattern: Multiple SEC_003 alerts across fleet

Likely causes:

  • Devices provisioned at same time, certs expiring together
  • Certificate rotation not running

Resolution:

  1. Check Console → Settings → Certificates → Auto-rotation status
  2. If disabled, enable auto-rotation
  3. Manually rotate affected devices if urgent
  4. Review certificate lifecycle policy

Diagnostic Commands

From Anava Console

ActionPath
View device logsDevices → [Device] → Logs
Download diagnosticsDevices → [Device] → Actions → Download Diagnostics
Check connectivityDevices → [Device] → Connection Status
View alert historyEvents → Filter by device
Restart ACAPDevices → [Device] → Actions → Restart Application

From Camera Web UI

ActionPath
View ACAP statusApps → Anava Agent
Check ACAP logsSystem → Logs → Application Log
View network settingsSystem → Network
Check MQTT configSettings → MQTT (if accessible)

Direct API Queries

# Get device status
curl -H "Authorization: Bearer $TOKEN" \
"https://api.anava.ai/v1/devices/ACCC8EF12345/status"

# Get recent alerts for device
curl -H "Authorization: Bearer $TOKEN" \
"https://api.anava.ai/v1/devices/ACCC8EF12345/alerts?hours=24"

# Get alert details
curl -H "Authorization: Bearer $TOKEN" \
"https://api.anava.ai/v1/alerts/{alertId}"

Alert Noise Reduction

Too Many Alerts?

  1. Adjust thresholds

    Console → Settings → Alert Rules → Thresholds
    • Increase latency threshold if false positives
    • Adjust memory warning level for known high-usage devices
  2. Use alert rules

    rule:
    name: "Suppress INFO on test devices"
    condition:
    group: "test-lab"
    severity: "INFO"
    action:
    suppress: true
  3. Configure quiet hours

    Console → Settings → Notifications → Quiet Hours
    • Suppress non-critical alerts during off-hours

Not Enough Alerts?

  1. Check notification settings

    • Verify email/Slack channels configured
    • Check spam folders
  2. Verify alert rules

    • Ensure no rules suppressing needed alerts
  3. Test alert pipeline

    Console → Devices → [Device] → Actions → Send Test Alert

Escalation Guide

When to Escalate to Anava Support

ScenarioPriorityInformation to Include
Security incident (SEC_002)P1Device ID, timestamps, network logs
Fleet-wide outageP1Affected device list, network topology
Repeated ACAP crashesP2Device diagnostics bundle, ACAP version
Unexplained alertsP3Alert IDs, patterns observed

How to Collect Diagnostics

  1. Via Console:

    Devices → [Device] → Actions → Download Diagnostics
  2. Via API:

    curl -H "Authorization: Bearer $TOKEN" \
    "https://api.anava.ai/v1/devices/ACCC8EF12345/diagnostics" \
    -o diagnostics.zip
  3. Include:

    • Device ID and firmware version
    • ACAP version
    • Timestamps of issues
    • Steps to reproduce (if applicable)

Last updated: December 2025