Overview
A processor allows users to specify various analytical, statistical, and machine learning-based algorithms to apply to their incoming data. Specifically, a processor's configurations will populate the Anomalies page, as well as the Insights page.
Note
To learn more about Anomalies, see Anomalies.
To learn more about Insights, see Insights.
Review Supported Processor Types
The Edge Delta App supports the following processor types:
Processor Type | Description |
Anomaly |
This processor combines multiple collocated agent metrics, such as agents running on containers or servers in same data center. This processor is used in an aggregator agent mode to follow trends and detect anomalies that take place on local clusters. To learn more, see Anomaly Processors. |
Cluster |
This processor type finds patterns in logs, and then groups (or clusters) these patterns based on similarities. To learn more, see Cluster Processors. |
Dimension Counter (Regexes) |
This processor:
To learn more, see Dimension Counter (Regexes) Processors. |
Dimension Numeric Capture (Regexes) |
This processor:
To learn more, see Dimension Numeric Capture (Regexes) Processors. |
Numeric Capture (Regexes) |
This processor type:
To learn more, see Numeric Capture (Regexes) Processors. |
Ratio |
This processor takes one successful regex pattern and one failed regex pattern to calculate a success ratio. To learn more, see Ratio Processors. |
Simple Keyword Match (Regexes) |
This processor type:
To learn more, see Simple Keyword Match (Regexes) Processors. |
Top-K |
This processor monitors top K records, such as k=10, where the records are identified with one or more named regex group values combined together To learn more, see Top-K Processors. |
Trace |
This processor is useful to track events that have a unique ID and clear start and end logs. To learn more, see Trace Processors. |
Create and Manage a Processor
At a high level, there are 2 ways to manage Processors:
- If you need to create a new configuration, then you can use the visual editor to populate a YAML file, as well as make changes directly in the YAML file.
- If you already have an existing configuration, then you can update the configuration in the YAML file.
To access the visual editor for a new configuration:
-
In the Edge Delta App, on the left-side navigation, click Data Pipeline, and then click Agent Settings.
-
Click Create Configuration.
-
Click Visual.
-
On the right-side, select Processors.
-
Review the list of options.
- To review available process types and corresponding parameters, see Review Supported Processor Types.
To access the YAML file for an existing configuration:
-
In the Edge Delta App, on the left-side navigation, click Data Pipeline, and then click Agent Settings.
-
Locate the desired configuration, then under Actions, click the vertical ellipses, and then click Edit.
-
Review the YAML file.
- To review available process types and corresponding parameters, see Review Supported Processor Types.
Learn about Clustered Invariants
You can use this section to learn how Edge Delta calculates similarities between invariants for clustering purposes.
When a new log passes through the pipeline:
- Variants are identified via a proprietary Ragel FSM-based tokenization process
- The identified variants are stripped from the log and replaced with wildcards
- The remaining invariant components are compared to existing pattern sets to calculate similarities
- Invariant components are calculated for similarities so that the invariants can be transformed and clustered into structured log messages.
There are 2 ways to calculate similarities:
- Drain
- Levenshtein distance
Learn about Drain
Drain is the default log parsing algorithm used to cluster logs. This algorithm is based on a parse tree, with a fixed depth to guide the log group search process. This workflow helps to avoid a deep and unbalanced tree.
When a new raw log message arrives, Edge Delta processes the message with the Ragel FSM-based tokenization process. Then, Edge Delta searches for a log group through the nodes of the tree, based on the token prefix.
If a suitable log group is found, then Edge Delta will also calculates the similarities between the log message and the log event stored in the log group. If the similarity rate is above a certain threshold, then the log message will be matched with the log event stored in that log group.
- If not, a new log group will be created based on the log message.
To accelerate this process, Edge Delta designs a parse tree with a fixed depth, and nodes with fixed children to guide the log group search. This helps to limit the number of log groups that a raw log message needs to be compared to.
Since Edge Delta uses a drain log parse tree for clustering based on a common prefix, Edge Delta can easily merge the clusters by using their ancestors in the tree. The merge level determines how many levels Edge Delta will go up in the tree.
Learn about Levenshtein Distance
Levenshtein distance is a string metric that measures the difference between 2 sequences. The Levenshtein distance between 2 words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.
When a new raw log message arrives, Edge Delta processes the message with the Ragel FSM-based tokenization process.
Then, Edge Delta uses the Levenshtein distance algorithm to calculate similarities between tokens. If there is a similarity above a certain threshold, then Edge Delta will determine that these logs belong to the same log group.
The similarity calculation is based on the minimum number of operations required to make 2 tokens the same. If the required operation number is below a certain threshold, then the 2 tokens are more similar and grouped in the same log group. Otherwise, a new log group will be created based on the log message.