Processors Overview

George Alpizar
George Alpizar
  • Updated

Overview

A processor allows users to specify various analytical, statistical, and machine learning-based algorithms to apply to their incoming data. Specifically, a processor's configurations will populate the Anomalies page, as well as the Insights page. 

Note

To learn more about Anomalies, see Anomalies.

To learn more about Insights, see Insights.


Review Supported Processor Types

The Edge Delta App supports the following processor types:

Processor Type Description
Anomaly

This processor combines multiple collocated agent metrics, such as agents running on containers or servers in same data center.

This processor is used in an aggregator agent mode to follow trends and detect anomalies that take place on local clusters. 

To learn more, see Anomaly Processors.

Cluster

This processor type finds patterns in logs, and then groups (or clusters) these patterns based on similarities.

To learn more, see Cluster Processors.

Dimension Counter (Regexes)

This processor:

  • Extracts a specified part of the log, then
  • Considers the part as the dimension (the key), then
  • Counts matching logs for each distinct dimension value, and then
  • Generates count and anomaly metrics for each unique dimension value.

To learn more, see Dimension Counter (Regexes) Processors.

Dimension Numeric Capture (Regexes)

This processor: 

  • Monitors a specific numerical field, such as latency, per unique dimension value, such as api_path
  • Automatically generate statistics, such as counts and averages
  • Detects anomalies, based on the aggregate values grouped by dimensions

To learn more, see Dimension Numeric Capture (Regexes) Processors.

Numeric Capture (Regexes)

This processor type:

  • Checks logs for matches for regular expression with numeric capture group, then
  • Counts the matching logs, and then 
  • Generates anomaly scores.

To learn more, see Numeric Capture (Regexes) Processors.

Ratio

This processor takes one successful regex pattern and one failed regex pattern to calculate a success ratio.

To learn more, see Ratio Processors.

Simple Keyword Match (Regexes)

This processor type:

  • Checks for basic regex matches in logs, then
  • Counts the matching logs, and then 
  • Generates anomaly scores.

To learn more, see Simple Keyword Match (Regexes) Processors.

Top-K

This processor monitors top K records, such as k=10, where the records are identified with one or more named regex group values combined together

To learn more, see Top-K Processors.

Trace

This processor is useful to track events that have a unique ID and clear start and end logs. 

To learn more, see Trace Processors.

 


Create and Manage a Processor 

At a high level, there are 2 ways to manage Processors:

  • If you need to create a new configuration, then you can use the visual editor to populate a YAML file, as well as make changes directly in the YAML file.
  • If you already have an existing configuration, then you can update the configuration in the YAML file.

To access the visual editor for a new configuration:

  1. In the Edge Delta App, on the left-side navigation, click Data Pipeline, and then click Agent Settings.

  2. Click Create Configuration.

  3. Click Visual.

  4. On the right-side, select Processors.

  5. Review the list of options.

To access the YAML file for an existing configuration:

  1. In the Edge Delta App, on the left-side navigation, click Data Pipeline, and then click Agent Settings.

  2. Locate the desired configuration, then under Actions, click the vertical ellipses, and then click Edit.

  3. Review the YAML file.


Learn about Clustered Invariants 

You can use this section to learn how Edge Delta calculates similarities between invariants for clustering purposes. 

When a new log passes through the pipeline:

  • Variants are identified via a proprietary Ragel FSM-based tokenization process 
  • The identified variants are stripped from the log and replaced with wildcards
  • The remaining invariant components  are compared to existing pattern sets to calculate similarities
    • Invariant components are calculated for similarities so that the invariants can be transformed and clustered into structured log messages.

There are 2 ways to calculate similarities: 

  • Drain
  • Levenshtein distance

Learn about Drain

Drain is the default log parsing algorithm used to cluster logs. This algorithm is based on a parse tree, with a fixed depth to guide the log group search process. This workflow helps to avoid a deep and unbalanced tree. 

When a new raw log message arrives, Edge Delta processes the message with the Ragel FSM-based tokenization process. Then, Edge Delta searches for a log group through the nodes of the tree, based on the token prefix. 

If a suitable log group is found, then Edge Delta will also calculates the similarities between the log message and the log event stored in the log group. If the similarity rate is above a certain threshold, then the log message will be matched with the log event stored in that log group.

  • If not, a new log group will be created based on the log message.

To accelerate this process, Edge Delta designs a parse tree with a fixed depth, and nodes with fixed children to guide the log group search. This helps to limit the number of log groups that a raw log message needs to be compared to. 

Since Edge Delta uses a drain log parse tree for clustering based on a common prefix, Edge Delta can easily merge the clusters by using their ancestors in the tree. The merge level determines how many levels Edge Delta will go up in the tree.

Drain algorithm visual overview

Learn about Levenshtein Distance

Levenshtein distance is a string metric that measures the difference between 2 sequences. The Levenshtein distance between 2 words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.

Levenshtein algorithm visual overview

When a new raw log message arrives, Edge Delta processes the message with the Ragel FSM-based tokenization process.

Then, Edge Delta uses the Levenshtein distance algorithm to calculate similarities between tokens. If there is a similarity above a certain threshold, then Edge Delta will determine that these logs belong to the same log group. 

The similarity calculation is based on the minimum number of operations required to make 2 tokens the same. If the required operation number is below a certain threshold, then the 2 tokens are more similar and grouped in the same log group. Otherwise, a new log group will be created based on the log message.


Share this document