Troubleshooting your Deployments with Lumu

Troubleshooting your Deployments with Lumu

Notes Learn more about the Deployment Monitoring feature in our official documentation.
While Lumu alerts the exact component that failed, the root cause of the failure can have several explanations depending on the type of component. However, most issues often stem from the local network environment issues, third-party configurations, or licensing constraints.

Here is an expanded guide to the most common failure points, broken down by component. We recommend reviewing the multiple causes that might be causing the failure to find the proper solution.

Virtual Appliances

  • Loss of Source Data: The firewall or network element configured to forward logs stops sending data. 
    • Recommended action: This can happen due to a myriad of reasons that are outside Lumu’s control, we recommend directly checking the element status and routing rules to ensure logs are actively directed to the appliance's IP address.
  • Altered Log Formats: The Virtual Appliance works under a specific log format, any changes to this format will fail to parse the new data causing the virtual appliance to fail. 
    • Recommended action: Reconfigure the virtual appliance to its default state to revert the log format to the Lumu supported standard. Check out the Configure Virtual Appliances documentation.
  • Resource Exhaustion: Although it is not a common occurrence, the machine can run out of memory. 
    • Recommended action: Allocate additional RAM/CPU to the virtual machine in your hypervisor and reboot the instance.
  • Connectivity issues: The virtual appliance loses outbound internet access or its IP assignment changes unexpectedly after a reboot. 
    • Recommended action: Verify the VA's network adapter settings, IP allocation, and local firewall rules to restore outbound internet access.

Log Forwarders

  • Loss of Source Data: The element forwarding logs to the Log Forwarder stops data transmission. 
    • Recommended action: Verify that the sending device's syslog configuration and status, and ensure that it is correctly pointing to the Log Forwarder.
  • Altered Log Formats: The source log format is modified, breaking Lumu's parsing logic. 
    • Recommended action: Restore the original log export format on the sending device.
  • Connectivity issues: The machine hosting the Log Forwarder loses its connection to the internet. 
    • Recommended action: Check the host machine's network connection, routing, and proxy settings.

Collector Agents

  • Process Interruption: It could be possible that the host device’s operating system stopped the Lumu Agent process by accident. 
    • Recommended action: Review the host device to make sure it is not stopping the Lumu Agent and whitelist it in the host's endpoint protection mechanisms to prevent it from happening again.
  • Connectivity issues: If the host device loses internet connectivity a failure will be alerted. 
    • Recommended action: Check the host device connectivity status and restore it.
Idea
Generally for all Network Metadata collectors there is an easy way to determine if the component is broken or simply it is not receiving any data. Access the machine directly and use command-line tools like tcpdump or nc (Netcat) to verify if the logs are actually reaching the collector's interface.

Hardware Appliances

  • Loss of Source Data: The connected switch or physical SPAN port stops mirroring or sharing traffic with the Lumu appliance.
    • Recommended action: Verify the SPAN port configuration on your switch to ensure traffic is actively being mirrored to the correct physical interface on the appliance. 
  • Network configuration changes: The network interface goes down or loses its configuration, which most commonly occurs immediately following a system reboot or power cycle. 
    • Recommended action: Access the hardware appliance to verify the network interface status and reapply the correct IP assignments and routing configurations.

Custom Collectors

  • Expired Credentials: Across all platforms, the most frequent failure is caused by expired API keys, rotated passwords, or invalidated tokens.
    • Recommended action: Generate new credentials or tokens in the third-part platform and update them in your Lumu Integration.
  • Connectivity issues:  If the host device loses internet connectivity a failure will be alerted.
    • Recommended action: Check the host device connectivity status and restore it.

General integrations issues

  • Expired Credentials: Across all platforms, the most frequent failure is caused by expired API keys, rotated passwords, or invalidated tokens.
    • Recommended action: Generate new credentials or tokens in the third-part platform and update them in your Lumu Integration.

Entra ID integration

  • Revoked Permissions: The integration authorizes Lumu to query their tenant, and this process can fail due to an administrator accidentally deleting the required permissions or the specific account used to provide those permissions is deleted automatically revoking the permissions.
    • Recommended action: Re-authorize the Lumu integration in Entra ID, preferably using a dedicated service account.

SentinelOne XDR Integration

  • Expired Credentials: To facilitate setup, the SentinelOne API requires broad permissions. However, some security teams can erase some of those permissions due to security concerns breaking the integration in the process.
    • Recommended action: You must restore the exact permissions for the scope you selected during creation. If you created the integration for a Site, you can safely remove Group and Account permissions, but you must keep the Site permissions. Most importantly, you can never remove the blocklist permission, or the integration will instantly fail. Follow the instructions of the integration guide to correctly set up your integration.

AWS, Sophos, Vision One, and Cisco Umbrella integrationm

  • API Quota Limits: All of these platforms share the same characteristic, they enforce strict limits on the number of indicators they accept. For example, AWS has a hard limit of just 40 indicators. High-noise environments (like Guest Wi-Fi networks or firewalls performing massive amounts of DNS resolutions) quickly exhaust these quotas, causing the integration to reject further data and trigger a failure in Lumu. 
    • Recommended action: Do not delete and recreate the integration; this rarely clears the indicators fast enough and does not solve the root cause. Instead, You must manage the noise at the source. Identify the noisy element (e.g., the IP of the firewall or the Guest Wi-Fi range) and add it to your exclusions list in Lumu so it stops triggering massive indicator pushes.

Bitdefender and Microsoft Defender integrations   

  • Missing License Tiers: These platforms share the same issue, they require a specific type of licensing to work. When the integration is created successfully in the Lumu portal with the wrong license, it will fail immediately when attempting to push data. 
    • Recommended action: Create the integration using an account with the correct licensing or you can upgrade the account used in the integration. Check out the Bitdefender and Microsoft Defender integration guides for more information. 

Cisco Meraki, Infoblox, and Zscaler integrations   

  • Deleted Vendor Infrastructure: These integrations function by mapping Lumu to  specific blocklists or templates configured on the vendor's platform. Users often forget this link and delete or rename the list/template on the vendor's side. Lumu will not attempt to automatically recreate these lists, leaving the integration stranded. (This is especially common in Meraki, where users map many templates/networks and later delete them).
    • Recommended action: You must delete the broken integration in Lumu and recreate it, mapping it to a currently existing list or network.

WatchGuard integration

  • Incorrect Deployment Type: During setup, users mistakenly select Cloud when they have an On-Premise architecture, or vice versa, causing conflict when the integration is trying to push data.
    • Recommended action: Recreate the WatchGuard integration ensuring you select the correct deployment type. Follow the instructions of the integration guide.
  • Invalid Device IDs: At the time of configuring the integration, you must set up the device ID, this ID must be strictly numeric for cloud integrations. Users frequently paste the entire alphanumeric string, causing the initial data push to fail.
    • Recommended action: Recreate the WatchGuard integration ensuring you enter the correct device ID. Follow the instructions of the integration guide.

FAQ

Can I opt out of these alerts or customize which users receive them?

Currently, Deployment Monitoring alerts are mandatory and cannot be disabled. To guarantee there are no blind spots in your network visibility, these alerts are automatically routed to all users with the Administrator role (and Supervisors for MSP accounts).

I received an alert that my collector is offline, but when I check the Lumu portal, the traffic graph looks completely normal and doesn't show a drop. Why?

The traffic graphs in the portal display data over a longer timeframe (such as the last 7 or 30 days). Because Lumu triggers an alert after just 3 hours of inactivity, a 3-hour gap in traffic is often too small to be visually noticeable on a multi-day graph. If you receive an alert, do not rely solely on the visual graph to confirm the issue. Instead, we highly recommend checking the local service logs on your host machine or firewall to verify the real-time status of the data transmission.

My Lumu Portal shows that my Virtual Appliance is sending traffic, but I still received a failure email. Is this a false positive?

This is likely an isolated failure. A single Virtual Appliance often runs multiple embedded collectors at once. If just one of those collectors stops sending data, the appliance's overall 30-day traffic graph may still look normal because the other collectors are working. Always check the alert email—it will specify the exact name of the individual collector (e.g., Forcepoint Proxy) that failed.

I just created a new collector, but it immediately showed an Alerted status, is it not working?

No, this is expected behavior due to the way this feature works. The system is continuously monitoring for data flow, a newly created collector is immediately recognized as not receiving any data yet, hence the alerted status. As soon as you finish the configuration on your network side and the collector begins receiving traffic, this status will automatically change to Online. Please note that if the collector remains without data for 3 full hours after creation, the system will also trigger a standard failure email notification.

How quickly will Lumu notify me if my deployment goes offline?

It depends on the architecture of the component:

  • Third-party integrations: You will be alerted in approximately 30 minutes (triggered after 6 consecutive internal check failures).
  • Network Collectors & Appliances: You will be alerted after 3 continuous hours of total inactivity.
My integration failed because it hit an API quota limit. Should I just delete and recreate the integration in Lumu to clear the queue?

No. Deleting and recreating the integration does not fix this issue, you should aim to fix it in the root cause. This issue happens when your network is too noisy (e.g., sending massive amounts of DNS resolutions or Guest Wi-Fi traffic). The proper resolution is to manage the noise by excluding those specific noisy IP addresses or networks directly within your Lumu network exclusions.

I created an integration successfully, but just hours later I received an email alert, why is this happening?

This typically occurs due to a configuration error made during the initial setup. For many third-party integrations (such as WatchGuard), Lumu waits for an actual security incident to occur before it attempts to push the first batch of indicators. Because of this, the setup may appear successful on the Lumu portal initially, but it will immediately fail the moment Lumu actually tries to send the data. Please refer to the Troubleshooting section for your specific vendor to check for common misconfigurations.


      Get an AI Summary

          • Related Articles

          • Deployment Monitoring

            At the core of Lumu’s Continuous Compromise Assessment model lies a single fact: Maintaining continuous visibility of network traffic is fundamental to identifying and mitigating threats in real-time. To accommodate any infrastructure, Lumu offers a ...
          • Lumu Email Intelligence

            Lumu simplifies Continuous Compromise Assessment by consolidating its management, reporting, and related contextual intelligence within a single portal. Security teams no longer need to chase down data from multiple network monitoring tools. In this ...
          • Lumu Autopilot

            In today’s rapidly evolving digital environment, prompt and effective responses to security threats are essential. Lumu Autopilot simplifies the entire incident management process, reducing human error and optimizing resource allocation. By utilizing ...
          • Lumu Portal

            Lumu simplifies Continuous Compromise Assessment by consolidating its management, reporting, and related contextual intelligence within a single portal. Security teams no longer need to chase down data from multiple network monitoring tools. The Lumu ...
          • Lumu Discover Similar Domains Playbook

            Lumu Discover is continuously looking for domains on the Internet that attempt to mislead your customers. Taking down these domains is of utmost importance to prevent Based on the NIST Special Publication 800-61 incident response life cycle, this ...