diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..982afd5d --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,38 @@ +# Contributing + +We want everyone to feel that they can contribute to the Siembol Project. Whether you have an idea to share, an issue to report, or a pull-request of the finest, most scintillating code ever, we want you to participate! + +In the sections below, you'll find some easy way to connect with other Siembol developers as well as some useful informational links on our license and community policies. + +Looking forward to building Siembol with you! + + +## Github + +The main project fork lives here in Github: + +* [https://github.com/G-Research/siembol](https://github.com/G-Research/siembol) + +If you spot a bug, then please raise an issue in our main GitHub project: + +* [https://github.com/G-Research/siembol/issues](https://github.com/G-Research/siembol/issues) + +Likewise, if you have developed a cool new feature or improvement, then send us a pull request! Please read the siembol developer guide: + +* [https://github.com/G-Research/siembol/blob/master/docs/introduction/how-tos/how_to_develop.md](https://github.com/G-Research/siembol/blob/master/docs/introduction/how-tos/how_to_develop.md) + +If you want to brainstorm a potential new feature, hop on over to our Gitter room, listed below. + +## Gitter + +Sometimes, it's good to hash things out in real time. We have a Gitter community where you can chat with other Siembol developers and get feedback and support (even if it's just moral support!). + +To join our Gitter community, use this link: + +* [https://gitter.im/siembolProject/community](https://gitter.im/siembolProject/community) + +## License + +Siembol is licensed with the Apache 2.0 license. You can find it published here: + +* [https://github.com/G-Research/siembol/blob/master/LICENSE](https://github.com/G-Research/siembol/blob/master/LICENSE) \ No newline at end of file diff --git a/docs/User Guide/Siembol_Alert_User_Guide.md b/docs/User Guide/Siembol_Alert_User_Guide.md deleted file mode 100644 index c836bb6b..00000000 --- a/docs/User Guide/Siembol_Alert_User_Guide.md +++ /dev/null @@ -1,123 +0,0 @@ -# Siembol Alert User Guide -## Overview -Siembol Alert is a detection engine used to filter matching events from an incoming data stream based on a configurable rule set. Rules can be built, modified, and tested through the Siembol UI. - -Rules are JSON configs generated by the UI. The raw JSON can be viewed through the UI as necessary or can be seen in the store repo for Siembol Alert. - -## Accessing the UI -From the Siembol home page, click the alert manager button to load the alert manager interface - -![alert manager button](images/alert/alert_manager_button.PNG) - -## Alert Manager UI -The Alert Manager UI is split into two sections. On the left-hand side you can see all the rules which are stored in Siembol Alert. On the right-hand side you can see all the rules which are deployed. Rules which are deployed still show up in the store section. - - - [//]: # (TODO add image of store/deploy) - -### Alert Rule Editor UI - Tabs -There are 3 different tabs in the Alert Rule Editor UI: - -- Edit config: allows you to edit the rule -- Test config: allows you to: - - Provide a raw event to test your rule works as expected - - Allows you to see the output from the event parsing through Siembol Alert - - (n.b.) These tests are not persistent and just provide a point in time verification that the alert works for a specific alert -- Test cases: allows you to write persistent tests to verify the rule works - -[//]: # (TODO show tabs alert rule editor UI) - -### Alert Rule Editor UI - Edit Config -At the top there is a text input that allows you to provide a name for a rule. Use a descriptive name to make it easier to identify later in the pipeline. - -[//]: # (TODO add image for the config name) - -There are 5 different tabs within the edit config UI: - -1. [Rule Description](#rule-description) -2. [Source Type](#source-type) -3. [Matchers](#matchers) -4. [Tags](#tags) -5. [Rule Protection](#rule-protection) - -#### Rule Description -This section contains a single text input that allows you set a description for the alert. This should be a short, helpful comment that allows anyone to identify the purpose of this alert. - -[//]: # (TODO add image of rule description) - -#### Source Type -This section allows you to determine the type of data you want to match on. It is essentially a matcher for the "source_type" field. This field does not support regex - however, using * as an input matches all source types. - -The source_type field is set during parsing and is equal to the name of the parser config which was used to parse the event. - -[//]: # (TODO add image of sourcetype) - -``` -Tip: if you want to match on multiple data sources, set the source type to be * and add a regex matcher (in the matcher section) to filter down to your desired source types. -``` - -#### Matchers -Matchers allow you to select the events you want the rule to alert on. - -To add a matcher, click the 'Add to Matchers' button: - -[//]: # (TODO add image of add to matchers button) - -For guidance on how to use matchers, see the [General Guide](Siembol_General_Guide.md#matchers) - -[//]: # (TODO add image of is_in_set matcher) - -#### Tags -Tags are optional but recommended as they allow you categorise your rules. - -To add a tag, click the "Add to Tags" button: - -[//]: # (TODO add image of "add to tags button") - -Each tag is a key-value pair. Both the key and the value inputs are completely freeform allowing you to tag your rules in the way which works best for your organisation. - -You can use substitution in the value input to set the tag value equal to the value of a field from the event. The syntax for this is `{field_name}` eg: - -[//]: # (TODO add image of substitution in tag value) - -#### Rule Protection -Rule Protection allows you to prevent a noisy alert from flooding the components downstream. You can set the maximum number of times an alert can fire per hour and per day. If either limit is exceeded then any event that matches is filtered and not sent on to Siembol Response until the threshold is reset. - -Rule Protection is optional. If it is not configured for a rule, the rule will get the global defualts applied (global defaults are set during the deployment process - see below). - -[//]: # (TODO add image of rule procetion page) - -### Alert Rule Editor UI - Test Config -The use of the Test Config UI is covered in the [Test\_Config\_User\_Guide](#). - -[//]: # (TODO add link to test config guide) - -### Alert Rule Editor UI - Test Cases -The use of a test cases UI is covered in the [Test\_Cases\_User\_Guide](#). - -[//]: # (TODO add link to test cases guide) - -### Saving a config -You can save the config at any time (provided the config is valid) by clicking the submit button found in the edit config tab. This will commit it to the store. - -[//]: # (TODO add image of the submit button) - -## Deploying a config - Siembol Alert-specfic options -When you click the deploy button in the Alert Manager UI, you are presented with a box which allows you to set two things: - -1. [Global Tags](#global_tags) -2. [Global Rule Protection](#global_rule_protection) - -[//]: # (TODO add image of deploy options) - -### Global Tags -You can add tags in the same way as you would for an individual rule. Clicking the "Add to Tags" button allows you to set the key and value fields for the rules. This tag is applied to all rules which are being deployed. - -Adding global tags is optional. - -[//]: # (TODO add image of global tags) - -### Global Rule Protection -Setting the global rule protection limits is mandatory. These protections will apply to any rule which doesn't have rule protections configured already. See the [Rule Protection](#rule-protection) section above for more details. - -Once both of these sections are complete, you can press "Validate" to ensure all of your rules are complete and free from syntax errors. Provided all validations pass, you can then press "Deploy" to create the pull request with the release repo. diff --git a/docs/User Guide/Siembol_Correlation_Alert_User_Guide.md b/docs/User Guide/Siembol_Correlation_Alert_User_Guide.md deleted file mode 100644 index 4e2cfcec..00000000 --- a/docs/User Guide/Siembol_Correlation_Alert_User_Guide.md +++ /dev/null @@ -1,67 +0,0 @@ -# Siembol Correlation Alert User Guide -## Overview -The correlation alert allows you to group several detections together before raising an alert. The primary use case for this is when you have a group of detections which indvidually shouldn't be alerted on (eg high volume or detections with high false positive rate) you can group several together to get more reliable alerts. - -Rules are JSON configs generated by the UI. The raw JSON can be viewed through the UI as necessary or can be seen in the store repo for Siembol Alert. - -## Accessing the UI -From the Siembol home page, click the correlationalert editor button to load the UI. - -[//]: # (Add picture of link to correlationalert) - -## Creating / Editing a rule -### Correlationalert Editor - tabs -The correlationalert editor has 4 tabs; -- Rule description: allows you to add a description for -- Correlation attributes: allows you to configure the correlation of detections -- Tags: allows you to add metadata tags to the alert -- Rule Protection: allows you to configure options to prevent alert flooding - -[//]: # (Add picture of tabs - as seen from rule description page) - -#### Rule description - This tab simply allows you to enter a string description providing some context around the rule - eg what it does or which events it affects. This field is optional but recommeded. - - [//]: # (Probably don't need a picture in here provided the one above is of the whole rule description page) - -#### Correlation attributes -The correlation attributes tab allows you to configure which detections to correlate together. - -To add this configuration click the "Add to Correlation Attributes" - -[//]: # (Add picture of add to correlation attributes button) - -This opens up the configuration options. - -The "Time Unit" field allows you to configure the time unit to use, this is a fixed option with the choices: -- hours -- minutes -- seconds - -This is used in conjuction with the "Time Window" field to set the time window for the correlation. The "Time Window" field accepts an integer value. The requirements within this time window vary depending on the options you select - this is discussed further below. - -You can also configure how the time window is calculated using the "Time computation type". There are two values for this: -- event_time: the time window is calculated using the timestamp field in the events, the timestamps need to be inside the time window for an alert to trigger -- processing_time: the time window is calcualted using the current time (when an alert is matched), the events need to be processed by the correlationalert component within the time window - -[//]: # (Add picture of time fields) - -The alerts threshold allows you to configure how many detections (you can specify which detections later) need to trigger in the time window for the alert to trigger. This field accepts an integer value, if it is left empty then all detections need to to trigger before an alert is created. - -To configure which detections to correlate on click the "Add to Alerts" button - -[//]: # (Add picture of add to alerts button) - -For each detection you have three fields: -- Alert name: string, name of the detection (as named in the alert component) -- Threshold: integer, the number of times the detection has to trigger in the time window -- Mandatory: checkbox, whether a detection has to trigger within a time window in order for the rule to match - -If the mandatory field is checked, the detection has to trigger before an alert is created - even if the alert threshold has already been reached. - -If the threshold field is set to more than 1, then if that condition is fulfilled it only counts as 1 detection towards the total alerts threshold. - -#### Tags -This is the same as the tags section in the [alert editor](Siembol_Alert_User_Guide.md#tags) -#### Rule Protection -This is the same as the tags section in the [alert editor](Siembol_Alert_User_Guide.md#rule-protection) \ No newline at end of file diff --git a/docs/User Guide/Siembol_Enrichment_User_Guide.md b/docs/User Guide/Siembol_Enrichment_User_Guide.md deleted file mode 100644 index c809e17a..00000000 --- a/docs/User Guide/Siembol_Enrichment_User_Guide.md +++ /dev/null @@ -1,79 +0,0 @@ -# Siembol Enrichment User Guide -## Overview -Siembol Enrichment is an enrichment engine used to add useful data to events to assist in detection and investigations. As with the other components, enrichment rules can be created in the Siembol UI. Each rule is JSON and can be seen in the UI or in the enrichment store repo. - -The data that is used to enrich events is stored in JSON files in HDFS in the following format: -``` -{ - "key" : - { - "column1":"value", - "column2":"value2", - ... - } -} - ``` - -. When creating a rule you can specify the table to use, the column to join on, and the column to add to the event. - -## Accessing the Enrichement Editor -The enrichments editor can be accessed from the home page of the Siembol UI. - -[//]: # (Add photo of enrichment editor link on home page) - -## Enrichment Editor UI -Similar to other components, the enrichment editor UI is split into two sections: the left hand side is the store while the right hand side is the deployment section. A deployed rule will show up in both sections - -For a more detailed insight into this UI, read the [Editor Page](Siembol_General_Guide.md#editor-page) section of the general guide. - -## Creating / Editing a rule -### Enrichment rule editor - tabs -There are four tabs: - - Rule Description: allows you to provide a brief text description of what the rule does - - Source Type: allows you to specify the source types to apply the enrichment on - - Matchers: allows you to create matchers that allow you to define which events to enrich - - Table Mappings: allows you to configure the enrichment to perform - - [//]: # (Add photo of tabs) - - #### Rule Description - This tab simply allows you to enter a string description providing some context around the rule - eg what it does or which events it affects. This field is optional but recommeded. - -[//]: # (Add photo of rule description tab) - - #### Source Type - This tab allows you to specify the source type of events to apply the enrichment to. Essentially this is a literal string matcher for the source_type field of an event. - - ``` - Tip: if you want to match multiple source types select * in the source type tab and then add a matcher on the source_type field in the matchers tab to match only the source types you want to match. - ``` - -#### Matchers -Matchers allow you to further filter the events that the enrichment will be applied to. - -There are two types of matchers, as described in the [General Guide](Siembol_General_Guide.md#matchers). - -To add a new matcher, click the "Add to Matchers" button. - -[//]: # (Add picture of add to matchers button) - -#### Table Mapping -The table mapping tab is where you configure the enrichment you want to perform. - -The "Table Name" field should be the name of thetable which contains the data you want to enrich the event with. - -The "Joining Key" field should be the string used to join the event with the table (the key json field). This field supports substitution eg `${field_name}` or `http://${host_field_name}/${path_field_name}`. This is used to filter the key field of the table. - -[//]: # (Add picture of table name and joining key fields) - -To add data from the table to the event click the "Add to Enriching Fields" button. You will then have two fields to fill: -- Table field name: the column in the enrichment table that you want to add -- Event field name: the name you want the field to have in event. - -You can add as many enriching fields as you want. - -``` -Note: you can only enrich from one table per rule. If you want to enrich the same event from multiple table, you need to create multiple rules. -``` - -[//]: # (Add picture of enriching field section) \ No newline at end of file diff --git a/docs/User Guide/Siembol_General_Guide.md b/docs/User Guide/Siembol_General_Guide.md deleted file mode 100644 index 61e9a47f..00000000 --- a/docs/User Guide/Siembol_General_Guide.md +++ /dev/null @@ -1,115 +0,0 @@ -# Siembol General User Guide -## Overview -This document gives advice on how to use some of the common UI elements that are used in multiple Siembol components. - -## Siembol Home Page -The Siembol homepage provides access to the editors for each Siembol component. - [//]: # (TODO add image of home page) - - Each component has it's own block with 3 buttons - 1. Editor: this button takes you to the config editor for that component. - 2. Store repo: this takes you to the git repo for the config store - 3. Release repo: this takes you to the git repo for the released config for that component - -## Editor Page -### Filtering and Searching - -There is a search bar at the top of the Editor page to allow you to filter through stored configs by name or tag. - -[//]: # (TODO add image of search bar/search results) - -There are also checkboxes for commonly used filters. These allow you to select any combination of: - -* rules you've edited -* undeployed rules -* rules which have an undeployed upgrade - -[//]: # (TODO add image of filtering checkboxes) - -### Config block -Each config consists of a UI block containing: - -1. The version number -2. The last author -3. Config name -4. Config description -5. Config tags - -[//]: # (TODO add annotated image of a config block) - -Hovering over the right side of the box allows you to see 3 further options: - -1. Modify the config - this opens the config in the create config UI (discussed in the guide for each component) with the config details pre-populated -2. View the config's raw json -3. Move the config to the deployment section - -[//]: # (TODO add annotaed image of config block ft hidden options) - -### Change history -The change history for an individual config can be seen by hovering over its version number. - -[//]: # (TODO add image of individual config change history) - -The change history for the deployment config can be seen by hovering over the time icon in the top-right corner of the deployment section. - -[//]: # (TODO add image of deployment change history) - - -### Creating a new config -To the right-hand side of the filter checkboxes in the Editor UI there is a blue cross. Clicking this button allows you to create a new config and changes the view to the config editor mode. - -[//]: # (TODO add image of new parser button) - -### Deploying a config - -[//]: # (TODO move this section to the general guide) - -Once a config is in the store it can be be deployed from the Editor UI. - -### Deploying a config for the first time -If a config only exists in the store it can be added to the deployment section by clicking the deployment arrow on the right-hand side of the config block. - -Once a config is in the deployment section it can be committed to the deployment repo by clicking the deploy button at the top of the deployment section. - -[//]: # (TODO add image of the deploy button) - -Rules are stored in individual config files in the store. When the deploy button is pressed, all rules in the deployment section are combined together to create one deployment config. Therefore unless you want to un-deploy them, all configs need to remain in the deployment section. - -### Upgrading a config which is already deployed -If you make changes to a config which is already deployed and commit them to the store, then an upgrade button will appear in the config block in the deployment section. - -To deploy your changes: - -1. click the upgrade button -2. click the deploy button at the top of the deployment section - -[//]: # (TODO add image of the upgrade button) - -## Matchers -In multiple components you have the option to use matchers to search and filter events. - -You can add as many matchers as you want. - -There are two types of matchers: - -##### 1) REGEX_MATCH matcher -A regex_match allows you use a regex statement to match a specified field. There are two string inputs: -- Field: the name of the field to compare to the regex statement -- Data: the regex statement - -There is a "is negated" checkbox - this means that if the regex statement doesn't match the value of the field then the matcher will return true. - -Named capture groups in the regex are added as fields in the event. They are available from the next matcher onwards and are included in the output event. - -Siembol uses Java regex, for support on how to write this see the Java Documentation here: [Java Regular Expressions](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) - -[//]: # (TODO add image of regex matcher) - -##### 2) IS_IN_SET matcher -An "is_in_set" matcher compares the value of a field to a set of strings, if the value is in the set then the matcher returns true. -There are also two string inputs for this type of matcher: -- Field: the name of the field to compare with -- Data: A list of strings to compare the value to. New line delimited. Does not support regex - each line must be a literal match however, field substitution is supported in this field. - -The "is_negated" checkbox is the same as for the regex_matcher. -The "case_insensitive" checkbox means that the case of the strings in the data field is ignored. \ No newline at end of file diff --git a/docs/User Guide/TODO b/docs/User Guide/TODO deleted file mode 100644 index febc6d46..00000000 --- a/docs/User Guide/TODO +++ /dev/null @@ -1,3 +0,0 @@ -# TODO Add images in to Siembol_Alert_User_Guide -# TODO Move generic items out of the alert ug into a common ug -# TODO Proof read alert ug diff --git a/docs/User Guide/images/alert/alert_manager_button.PNG b/docs/User Guide/images/alert/alert_manager_button.PNG deleted file mode 100644 index 993bb58e..00000000 Binary files a/docs/User Guide/images/alert/alert_manager_button.PNG and /dev/null differ diff --git a/docs/deployment/deployment.md b/docs/deployment/deployment.md new file mode 100644 index 00000000..d3c3ae6b --- /dev/null +++ b/docs/deployment/deployment.md @@ -0,0 +1,6 @@ +# Deployment +## Build artifacts +## Infrastructure dependencies +## Deployment scenarios +## Helm charts + diff --git a/docs/deployment/how-tos/how_to_set_up_kerberos_for_external_dependencies.md b/docs/deployment/how-tos/how_to_set_up_kerberos_for_external_dependencies.md new file mode 100644 index 00000000..e0fe9fde --- /dev/null +++ b/docs/deployment/how-tos/how_to_set_up_kerberos_for_external_dependencies.md @@ -0,0 +1,4 @@ +# How to set-up kerberos for external dependencies +## Kafka +## Zookeeper +## Storm api \ No newline at end of file diff --git a/docs/deployment/how-tos/how_to_set_up_zookeper_nodes.md b/docs/deployment/how-tos/how_to_set_up_zookeper_nodes.md new file mode 100644 index 00000000..8211c084 --- /dev/null +++ b/docs/deployment/how-tos/how_to_set_up_zookeper_nodes.md @@ -0,0 +1,5 @@ +# How to set-up zookeper nodes +## Zookeeper nodes for configuration deployments +### Admin configuration settings +### Config editor rest application properties +## Zookeper nodes for storm topology manager diff --git a/docs/deployment/how-tos/how_to_setup_github_webhook.md b/docs/deployment/how-tos/how_to_setup_github_webhook.md new file mode 100644 index 00000000..15c42313 --- /dev/null +++ b/docs/deployment/how-tos/how_to_setup_github_webhook.md @@ -0,0 +1,35 @@ +# How to set-up a github webhook +The synchronisation of service configurations are stored in git repositories with zookeeper nodes is implemented in siembol in config editor rest service. +## Siembol config editor rest rest endpoint for webhooks +Find a hostname of siembol config editor rest and prepare url. +### Url parameters +- serviceNames - Comma-separated list of service names or ```all``` if the hook is for all services +- syncType - Type of synchronisation that should be triggered by the hook + - one from ```ALL```, ```RELEASE```, ```ADMIN_CONFIG``` + +``` +Example of url: +https://config-editor/api/v1/sync/webhook?serviceNames=alert&syncType=ALL +``` +## Setting a webhook in a github repository +For a git repository you should recognise: +- services which configurations are stored in a git repository +- type of configurations that are stored in the git repository +### Prepare and test url +You can prepare url using above example or by swagger +``` +Example of swagger url: +https://config-editor-rest/swagger-ui.html +``` +Ensure that the prepared url is accessible form the github server. +### Set a webhook url in github +To set up a webhook, go to the settings page of your repository or organization in github. From there, click Webhooks, then Add webhook url with push event. +### Set the content type +Set the content type as ```application/json``` +### Set the github secret (optional) +Setting a webhook secret allows you to ensure that POST requests sent to siembol are from GitHub. +#### Set the secret for verification in config editor rest application properties +The verification of webhook signature is computed only if is set the secret in the application properties of config editor rest. Otherwise this check is skipped. +``` +config-editor.gitWebhookSecret=your secret provided in github +``` \ No newline at end of file diff --git a/docs/deployment/how-tos/how_to_tune_performance_of_storm_topologies.md b/docs/deployment/how-tos/how_to_tune_performance_of_storm_topologies.md new file mode 100644 index 00000000..41d26e4b --- /dev/null +++ b/docs/deployment/how-tos/how_to_tune_performance_of_storm_topologies.md @@ -0,0 +1,5 @@ +# How to tune the performance of storm topologies +### Parallelism +### Kafka spout properties +### Strom configuration settings +### Kafka batch writer properties \ No newline at end of file diff --git a/docs/introduction/how-tos/how_to_contribute.md b/docs/introduction/how-tos/how_to_contribute.md new file mode 100644 index 00000000..131a98ee --- /dev/null +++ b/docs/introduction/how-tos/how_to_contribute.md @@ -0,0 +1,9 @@ +# How to contribute +## General +## Issues +## How to contribute to siembol java project +### How to compile +### Naming conventions +## How to contribute to config editor UI project +### How to compile + diff --git a/docs/introduction/how-tos/how_to_try_siembol.md b/docs/introduction/how-tos/how_to_try_siembol.md new file mode 100644 index 00000000..941e0ba3 --- /dev/null +++ b/docs/introduction/how-tos/how_to_try_siembol.md @@ -0,0 +1,8 @@ +# How to try siembol +## Requirements +## Prepare github repository +## Deploy dependencies +## Deploy siembol ui +## Deploy testing parsing application +## Create testing alerting rule +## Test matching alert diff --git a/docs/introduction/images/architecture.svg b/docs/introduction/images/architecture.svg new file mode 100644 index 00000000..2069f48e --- /dev/null +++ b/docs/introduction/images/architecture.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/introduction/images/pipelines.svg b/docs/introduction/images/pipelines.svg new file mode 100644 index 00000000..af794d8f --- /dev/null +++ b/docs/introduction/images/pipelines.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/introduction/introduction.md b/docs/introduction/introduction.md new file mode 100644 index 00000000..d56558f9 --- /dev/null +++ b/docs/introduction/introduction.md @@ -0,0 +1,56 @@ +# Siembol +Siembol provides a scalable, advanced security analytics framework based on open-source big data technologies. Siembol normalizes, enriches, and alerts on data from various sources which allows security teams to respond to attacks before they become incidents. +## History +Siembol is an in-house developed security data processing application, forming the core of GR Security Data Platform. Following our experience of using Splunk and Apache Metron it was clear that we needed a highly efficient, real-time event processing engine with features that mattered to GR. We were early adopters of Apache Metron and recognised its limits and missing features that we aimed to implement in siembol. +## How Siembol improves upon Metron +### Components for alert escalation +- Security teams can easily create a rule based alert from a single data source or they can create advanced correlation rules that combine various data sources +- We are planning to prepare a tool for translating Sigma rule specification (generic and open signature format for SIEM alerting [https://github.com/SigmaHQ/sigma](https://github.com/SigmaHQ/sigma)) into siembol alerting rule engine +### Component for integration with other systems – siembol response +- Easy way how to integrate siembol with other systems such as Jira, The hive, Cortex, Elk, Ldap +- Functionality to provide additional enrichments about an alert such as Elk searches or ldap searches with possibility to filter the alert as a part of an automatic incident response +- Plugin interface that allows to implement custom integration with other systems used in incident response +- We are planning to publish collection of plugins that we are using internally at GR and provide space for collecting plugins from the siembol community +### Advanced parsing framework for building fault tolerant parsers +- Originally designed framework for normalising logs (parsing) including chaining of extractors and transformations that allows + - extracting json, csv structures, key value pairs, timestamps + - parse timestamp using standard formatters to an epoch form + - transform message by renaming fields, filtering fields or even possibility to filter the whole message +- Supporting use cases for advanced log ingestion using multiple parsers and a routing logic +- Supporting a generic text parser, syslog, BSD syslog and Neflow v9 binary parser +### Advanced enrichment component +- Defining rules for selecting enrichment logic, joining enrichment tables and defining how to enrich the processed log + +### Configurations and rules are defined by a web application siembol ui +- All configurations are stored in json format and edited by web forms in order to avoid mistakes and speed-up the creation and learning time +- Configurations are stored in git repositories +- Supporting high integrity use cases with protected github main branches for deploying configurations +- Supporting validation and testing configurations. Moreover, siembol ui supports creating and evaluating test cases +- Siembol prefers a declarative json language rather than a script language like Stellar. We consider declarative language with testing and validation less error prone and simpler to understand +- Supporting oauth2/oidc for authentication and authorisation in siembol ui +- All siembol services can have multiple instances with authorisation based on oidc group membership. This allows multitenancy usage without need to deploy multiple instances of siembol +- We are planning to test and tune oauth2/oidc integration with popular identity providers +### Easy installation to try it with prepared docker images and helm charts +- Siembol supports deployment on external hadoop cluster to ensure high performance which we are using at GR. However we are providing k8s helm charts for all deployment dependencies in order to try siembol in a development environment. +## Use-Cases +### SIEM log collection using open source technologies +- Siembol can be used for a centralised security collecting and monitoring logs from different sources. The format of logs is usually not under our complete control since we need to collect and inspect logs from third party tools. This way it is important for SIEM to support normalisation of logs into standardized format with common fields such as timestamp. It is often usefull to enrich a log about metadata provided by cmdb or other internal systems that are important for building detections. For example data repositories can be enriched by data clasiffication, network devices by a network zone, username by active directory group etc. Csirt team is using siembol for building detections on top of normalised logs using siembol alerting services. Alerts triggered from the detections are integrated in incident response defined and evaluated by siembol response service. This allows integration of siembol with systems such as Jira, The Hive, Cortex and provide additional enrichments by searching Elk, doing Ldap queries. TODO: provide basic stats about siembol at GR +### Detection tool for detection of leaks and attacks on infrastructure +- Siembol can be used as a tool for detecting attacks or leaks by teams responsible for a system platform. Big Data team at GR is using siembol for detecting leaks and attacks on Hadoop platform. These detections are then used another data source in siembol as SIEM log collection for Csirt team which handles these incidents. +## High Level Architecture +### Data Pipelines +![pipelines](images/pipelines.svg) +### Services +- Parsing - normalising logs into messages with one layer of key/value pairs +- Enrichment - adding useful data to events to assist in detection and investigations +- Alerting - filtering matching events from an incoming data stream ov events based on a configurable rule set. The correlation alerting allows to group several detections together before raising an alert +- Response +### Infrastructure dependencies +- Kafka - message broker for data pipelines +- Storm - stream processing framework for services except siembol response integrated in kafka streaming +- Github - store for service configurations used in siembol ui +- Zookeeper - synchronisation cache for updating service configurations from git to services +- k8s cluster - environment to deploy siembol ui and related microservices for managements and orchestration of siembol services configurations +- Identity provider - identity povider (oauth2/oidc) used for siembol ui. It allows to use oidc groups for managing authorisation to services +### Architecture +![pipelines](images/architecture.svg) \ No newline at end of file diff --git a/docs/services/how-tos/how_to_set_up_enrichment_table.md b/docs/services/how-tos/how_to_set_up_enrichment_table.md new file mode 100644 index 00000000..dbe44d7f --- /dev/null +++ b/docs/services/how-tos/how_to_set_up_enrichment_table.md @@ -0,0 +1,4 @@ +# How to set-up an enrichment table +## The structure of an enrichment table +## The structure of zookeeper update message + diff --git a/docs/services/how-tos/how_to_set_up_service_in_config_editor_rest.md b/docs/services/how-tos/how_to_set_up_service_in_config_editor_rest.md new file mode 100644 index 00000000..4bb189d1 --- /dev/null +++ b/docs/services/how-tos/how_to_set_up_service_in_config_editor_rest.md @@ -0,0 +1,11 @@ +# How to set-up a service in config editor rest +## Application properties +### Git repositories settings +### Ui Layout file name +### Synchronisation settings +#### Synchronisation type +#### Release zookeeper settings for service deployment +#### Topology image for topology deployment +### Authorisation +#### Authorisation for service users +#### Authorisation for service administrators diff --git a/docs/services/how-tos/how_to_setup_netflow_v9_parsing.md b/docs/services/how-tos/how_to_setup_netflow_v9_parsing.md new file mode 100644 index 00000000..70b71cc1 --- /dev/null +++ b/docs/services/how-tos/how_to_setup_netflow_v9_parsing.md @@ -0,0 +1,7 @@ +# How to set-up netflow v9 parsing +## Collect netflow v9 from network devices +## Limitation of netflow v9 protocol +## Use a key to identify a device source in a kafka message +## Create a parser configuration +## Create parsing application +## Deploy parsing application \ No newline at end of file diff --git a/docs/services/how-tos/how_to_write_response_plugin.md b/docs/services/how-tos/how_to_write_response_plugin.md new file mode 100644 index 00000000..f87ebf67 --- /dev/null +++ b/docs/services/how-tos/how_to_write_response_plugin.md @@ -0,0 +1,7 @@ +# How to write response plugin +## Prepare maven project with siembol dependencies +## Implement respondig evaluators +#### RespondingEvaluatorFactory interface +### Implement ResponsePlugin interface +## Built package +## Copy package with application properties into siembol response \ No newline at end of file diff --git a/docs/services/images/parser_flow.svg b/docs/services/images/parser_flow.svg new file mode 100644 index 00000000..455c913f --- /dev/null +++ b/docs/services/images/parser_flow.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/services/images/response_evaluation.svg b/docs/services/images/response_evaluation.svg new file mode 100644 index 00000000..b100b6bc --- /dev/null +++ b/docs/services/images/response_evaluation.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/services/images/router_parsing.svg b/docs/services/images/router_parsing.svg new file mode 100644 index 00000000..7d119ac8 --- /dev/null +++ b/docs/services/images/router_parsing.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/services/siembol_alerting_services.md b/docs/services/siembol_alerting_services.md new file mode 100644 index 00000000..51bc5714 --- /dev/null +++ b/docs/services/siembol_alerting_services.md @@ -0,0 +1,88 @@ +# Siembol Alerting Services +## Overview +Siembol alert is a detection engine used to filter matching events from an incoming data stream based on a configurable rule set. The correlation alert allows you to group several detections together before raising an alert. +## Alert service +### Common rule fields +The fields thar are common to alert and correlation alert. +- `rule_name` - Rule name that uniquely identifies the rule +- `rule_author` - The author of the rule - the user who last modified the rule +- `rule_version` - The version of the rule +- `rule_description` - This field contains a single text input that allows you set a description for the alert. This should be a short, helpful comment that allows anyone to identify the purpose of this alert. +- `tags` - Tags are optional but recommended as they allow you to add tags to the event after matching the rule. Each tag is a key-value pair. Both the key and the value inputs are completely free form allowing you to tag your rules in the way which works best for your organisation. You can use substitution in the value input to set the tag value equal to the value of a field from the event. The syntax for this is `${field_name}` + - `tag_name` - The name of the tag + - `tag_value` - The value of the tag. + +``` +Note: if you want to correlate an alert in correlation engine that use the tag with name "correlation_key". This alert will be silent if you do not set the tag with name "correlation_alert_visible" +``` +- `rule_protection` - Rule Protection allows you to prevent a noisy alert from flooding the components downstream. You can set the maximum number of times an alert can fire per hour and per day. If either limit is exceeded then any event that matches is sent to error instead of output topic until the threshold is reset. Rule Protection is optional. If it is not configured for a rule, the rule will get the global defaults applied. + - `max_per_hour` - Maximum alerts allowed per hour + - `max_per_day` - Maximum alerts allowed per day + +### Alert rule +- `source_type` - This fields allows you to determine the type of data you want to match on. It is essentially a matcher for the "source_type" field. This field does not support regex - however, using `*` as an input matches all source types. The `source_type` field is set during parsing and is equal to the name of the last parser which was used to parse the log. + +``` +Tip: if you want to match on multiple data sources, set the source type to be * and add a regex matcher (in the matcher section) to filter down to your desired source types. +``` + +#### Matchers +Matchers allow you to select the events you want the rule to alert on. +- `matcher_type` - Type of matcher, either `REGEX_MATCH` or `IS_IN_SET` +- `is_negated`- The matcher is negated + private Boolean negated = false; +- `field` - The name of the field on which the matcher will be evaluated + +There are two types of matchers: +- `REGEX_MATCH` - A regex_match allows you use a regex statement to match a specified field. There are two string inputs: + - `data`: the regex statement in Java using syntax from [https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) except allowing to to use underscores in the names of captured groups Named capture groups in the regex are added as fields in the event. They are available from the next matcher onwards and are included in the output event. +- `IS_IN_SET` - An "is_in_set" matcher compares the value of a field to a set of strings defined in `data`. if the value is in the set then the matcher returns true. + - `data` - A list of strings to compare the value to. New line delimited. Does not support regex - each line must be a literal match however, field substitution is supported in this field. The global tag with the name `detection_source` is used to identify the detection engine that triggers the alert. +### Global Tags and Rule Protection +Global tags and global rule protection are defined in the deployment of the rules. These are added to the alert after matching unless are overridden by individual rule settings. +## Correlation Rule +### Overview +The correlation alert allows you to group several detections together before raising an alert. The primary use case for this is when you have a group of detections which individually shouldn't be alerted on (e.g. high volume or detections with high false positive rate) you can group several together to get more reliable alerts. +### Correlation alert rule +`correlation_attributes` field allows you to configure which detections to correlate together. + - `time_unit` - A field that allows you to configure the time unit to use, this is a fixed option with the choices: + - `hours` + - `minutes` + - `seconds` + - `time_window` - A field to set the time window in the selected time unit for the correlation + - `time_computation_type` - You can configure how the time window is calculated + - `event_time` - The time window is calculated using the `timestamp` field in the events, the `timestamp` field is usually computed during parsing from the log + - `processing_time` - The time window is calculated using the current time (when an alert is evaluated), the events need to be processed by the correlation alert component within the time window + - `max_time_lag_in_sec` - The event with timestamp older than the current time minus the lag (in seconds) will be discarded + - `alerts_threshold` - The alerts threshold allows you to configure how many detections (you can specify which detections later) need to trigger in the time window for the alert to trigger. This field accepts an integer value, if it is left empty then all detections need to to trigger before an alert is created + - `alerts` - The list of alerts for correlation + - `alert` - The alert name used for correlation + - `threshold` - The number of times the alert has to trigger in the time window + - `mandatory` - The alert must pass the threshold for the rule to match +## Admin config +### Common admin config fields +- `alerts.topology.name` - The name of storm topology +- `alerts.input.topics` - The list of kafka input topics for reading messages +- `kafka.error.topic` - The kafka error topic for error messages +- `alerts.output.topic` - The kafka output topic for publishing alerts +- `alerts.correlation.output.topic` - The kafka topic for alerts used for correlation by correlation rules +- `kafka.producer.properties` - Defines kafka producer properties, see [https://kafka.apache.org/0102/documentation.html#producerconfigs](https://kafka.apache.org/0102/documentation.html#producerconfigs) +- `zookeeper.attributes` - The zookeeper attributes for updating the rules + - `zk.url` - Zookeeper servers url. Multiple servers are separated by comma + - `zk.path` - Path to a zookeeper node +- `storm.attributes` - Storm attributes for the enrichment topology + - `bootstrap.servers` - Kafka brokers servers url. Multiple servers are separated by comma + - `first.pool.offset.strategy` - Defines how the kafka spout seeks the offset to be used in the first poll to kafka + - `kafka.spout.properties` - Defines kafka consumer attributes for kafka spout such as `group.id`, `protocol`, see [https://kafka.apache.org/0102/documentation.html#consumerconfigs](https://kafka.apache.org/0102/documentation.html#consumerconfigs) + - `poll.timeout.ms`- Kafka consumer parameter `poll.timeout.ms` used in kafka spout + - `offset.commit.period.ms` - Specifies the period of time (in milliseconds) after which the spout commits to Kafka, see [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/storm-moving-data/content/tuning_kafkaspout_performance.html](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/storm-moving-data/content/tuning_kafkaspout_performance.html) + - `max.uncommitted.offsets`- Defines the maximum number of polled offsets (records) that can be pending commit before another poll can take place + - `storm.config` - Defines storm attributes for a topology, see [https://storm.apache.org/releases/current/Configuration.html](https://storm.apache.org/releases/current/Configuration.html) +- `kafka.spout.num.executors` - The number of executors for reading from kafka input topic +- `alerts.engine.bolt.num.executors` - The number of executors for evaluating alerting rules +- `kafka.writer.bolt.num.executors` - The number of executors for producing alerts to output topic +### Alert admin config +- `alerts.engine` - This fields should be set to `siembol_alerts` +### Correlation alert admin config +- `alerts.engine` - This fields should be set to `siembol_correlation_alerts` +- `alerts.engine.clean.interval.sec` - The period in seconds for regular cleaning a rule correlation data that are not needed for the further rule evaluation diff --git a/docs/services/siembol_enrichment_service.md b/docs/services/siembol_enrichment_service.md new file mode 100644 index 00000000..98f66683 --- /dev/null +++ b/docs/services/siembol_enrichment_service.md @@ -0,0 +1,83 @@ +# Siembol Enrichment Service +## Overview +Siembol Enrichment is an enrichment engine used to add useful data to events to assist in detection and investigations. + +The data that is used to enrich events is stored in JSON files in a file store in the following format: +``` +{ + "key" : + { + "column1":"value", + "column2":"value2", + ... + } +} + ``` + + When creating a rule you can specify the table to use, the column to join on, and the column to add to the event. + +### Enrichment rule +- `rule_name` - Rule name that uniquely identifies the rule +- `rule_author` - The author of the rule, i. e., the user who last modified the rule +- `rule_version` - The version of the rule +- `rule_description` - This field contains a single text input that allows you set a description for the rule. This should be a short, helpful comment that allows anyone to identify the purpose of this rule +- `source_type` - This fields allows you to determine the type of data you want to match on. It is essentially a matcher for the `source_type` field. This field does not support regex - however, using `*` as an input matches all source types. The source_type field is set during parsing and is equal to the name of the last parser which was used to parse the log +- `matchers` - Matchers allow you to further filter the events that the enrichment will be applied to +- `table_mapping` - Mappings for enriching events + +#### Matchers +Matchers allow you to further filter the events that the enrichment will be applied to. You can add as many matchers as you want. +- `matcher_type` - Type of matcher, either `REGEX_MATCH` or `IS_IN_SET` +- `is_negated`- The matcher is negated +- `field` - The name of the field on which the matcher will be evaluated + +There are two types of matchers: +- `REGEX_MATCH` - A regex_match allows you use a regex statement to match a specified field. There are two string inputs: + - `data` - The regex statement in Java syntax [https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) except allowing to to use underscores in the names of captured groups Named capture groups in the regex are added as fields in the event. They are available from the next matcher onwards and are included in the output event + +- `IS_IN_SET` - It compares the value of a field to a set of strings defined in `data`. if the value is in the set then the matcher returns true. + - `data` - A list of strings to compare the value to. New line delimited. Does not support regex - each line must be a literal match however, field substitution is supported in this field + +#### Table Mapping +The table mapping tab is where you configure the enrichment you want to perform. + +- `table_name` - The name of the table which contains the data you want to enrich the event with +- `joining_key` - The string used to join the event with the table (the key json field). This field supports substitution eg `${field_name}` or `http://${host_field_name}/${path_field_name}`. This is used to filter the key field of the table +- `tags`- Tags are added into the event after successful joining the table with the joining key. You can add as many tags as you want + - `tag_name` - The name of the tag + - `tag_value` - The value of the tag + +- `enriching_fields` - Fields from the enriching table that are added after successful joining the table with the joining key. You can add as many enriching fields as you want + - `table_field_name` - The column in the enrichment table that you want to add + - `event_field_name` - The name you want the field to have in event after enriching +``` +Note: you can only enrich from one table per rule. If you want to enrich the same event from multiple table, you need to create multiple rules. +``` +## Admin config +- `topology.name`- The name of storm topology +- `kafka.spout.num.executors` - The number of executors for kafka spout +- `enriching.engine.bolt.num.executors` - The number of executors for enriching rule engine +- `memory.enriching.bolt.num.executors` - The number of executors for memory enrichments from tables +- `merging.bolt.num.executors` - The number of executors for merging enriched fields +- `kafka.writer.bolt.num.executors` - The number of executors for producing output messages +- `enriching.rules.zookeeper.attributes` - The zookeeper attributes for updating enrichment rules + - `zk.url` - Zookeeper servers url. Multiple servers are separated by comma + - `zk.path` - Path to a zookeeper node +- `enriching.tables.zookeeper.attributes` - The zookeeper attributes for notifying the update of enrichment tables + - `zk.url` - Zookeeper servers url. Multiple servers are separated by comma + - `zk.path` - Path to a zookeeper node +- `kafka.batch.writer.attributes` - Kafka batch writer attributes for producing output messages + - `batch.size` - The max size of batch used for producing messages + - `producer.properties` - Defines kafka producer properties, see [https://kafka.apache.org/0102/documentation.html#producerconfigs](https://kafka.apache.org/0102/documentation.html#producerconfigs) +- `storm.attributes` - Storm attributes for the enrichment topology +- `bootstrap.servers` - Kafka brokers servers url. Multiple servers are separated by comma + - `first.pool.offset.strategy` - Defines how the kafka spout seeks the offset to be used in the first poll to kafka + - `kafka.spout.properties` - Defines kafka consumer attributes for kafka spout such as `group.id`, `protocol`, see [https://kafka.apache.org/0102/documentation.html#consumerconfigs](https://kafka.apache.org/0102/documentation.html#consumerconfigs) + - `poll.timeout.ms`- Kafka consumer parameter `poll.timeout.ms` used in kafka spout + - `offset.commit.period.ms` - Specifies the period of time (in milliseconds) after which the spout commits to Kafka, see [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/storm-moving-data/content/tuning_kafkaspout_performance.html](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/storm-moving-data/content/tuning_kafkaspout_performance.html) + - `max.uncommitted.offsets`- Defines the maximum number of polled offsets (records) that can be pending commit before another poll can take place + - `storm.config` - Defines storm attributes for a topology, see [https://storm.apache.org/releases/current/Configuration.html](https://storm.apache.org/releases/current/Configuration.html) +- `enriching.input.topics`- The list of kafka input topics for reading messages +- `enriching.output.topic` - Output kafka topic name for correctly processed messages +- `enriching.error.topic` - Output kafka topic name for error messages +- `enriching.tables.hdfs.uri` - The url for hdfs cluster where enriching tables are stored \ No newline at end of file diff --git a/docs/services/siembol_parsing_services.md b/docs/services/siembol_parsing_services.md new file mode 100644 index 00000000..3a1f05cb --- /dev/null +++ b/docs/services/siembol_parsing_services.md @@ -0,0 +1,151 @@ +# Siembol Parsing Services +## Overview +Siembol provides parsing services for normalising logs into messages with one layer of key/value pairs. Clean normalised data is very important for further processing such as alerting. +### Key concepts +- `Parser` is a siembol configuration that defines how to normalise a log +- `Parsing app` is a stream application (storm topology) that combines one or multiple parsers, reads logs from kafka topics and produces normalised logs to output kafka topics +### Common fields +These common fields are included in all siembol messages after parsing: +- `original_string` - The original log before normalisation +- `timestamp` - Timestamp extracted from the log in milliseconds since the UNIX epoch +- `source_type` - Data source - the siembol parser that was used for parsing the log +- `guid` - Unique identification of the message +## Parser config +The configuration defines how the log is normalised +- `parser_name` - Name of the parser +- `parser_version` - Version of the parser +- `parser_author` - Author of the parser +- `parser_description`- Description of the parser +### Parser Attributes +- `parser_type` - The type of the parser + - Netflow v9 parser - parses a netflow payload and produces a list of normalised messages. Netflow v9 parsing is based on templates and the parser is learning templates while parsing messages. + - Generic parser - Creates two fields + - `original_string` - The log copied from the input + - `timestamp` - Current epoch time of parsing in milliseconds. This timestamp can be overwritten in further parsing + - Syslog Parser + - `syslog_version` - Expected version of the syslog message - `RFC_3164`, `RFC_5424`, `RFC_3164, RFC_5424` + - `merge_sd_elements` - Merge SD elements of the syslog message into one parsed object + - `time_formats` - Time formats used for time formatting. Syslog default time formats are used if not provided + - `timezone` - Time zone used in syslog default time formats +### Parser Exctractors +Extractors are used for further extracting and normalising parts of the message. +![parser_flow](images/parser_flow.svg) +#### Overview +An extractor reads an input field and produces the set of key value pairs extacted from the field. Each extractor is called in the chain and its produced messages are merged into the parsed message after finishing the extraction. This way the next extractor in the chain can use the outputs of the previous ones. If the input field of the extractor is not part of the parsed message then its execution is skipped and the next one in the chain is called. A preprocessing function of the extractor is called before the extraction in order to normalise and clean the input field. Post-processing functions are called on extractor outputs in order to normalise its output messages. +#### Common extractor attributes +- `name` - The name of the extractor +- `field` - The field on which the extractor is applied +- `pre_processing_function` - The pre-processing function applied before the extraction + - `string_replace` - Replace the first occurrence of `string_replace_target` by `string_replace_replacement` + - `string_replace_all` - Replace all occurrences of `string_replace_target` by `string_replace_replacement`. You can use a regular expression in `string_replace_target` +- `post_processing_functions` - The list of post-processing functions applied after the extractor + - `convert_unix_timestamp` - Convert `timestamp_field` in unix epoch timestamp in seconds to milliseconds + - `format_timestamp` - Convert `timestamp_field` using `time_formats` + - `validation_regex` - validation regular expression for checking format of the timestamp, if there is no match the next formatter from the list is tried + - `time_format` using syntax from [https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) + - `timezone` - Time zone used by the time formatter + - `convert_to_string` - Convert all extracted fields as strings except fields from the list `conversion_exclusions` +- `extractor_type` - The extractor type - one from `pattern_extractor`, `key_value_extractor`, `csv_extractor`, `json_extractor` +- flags + - `should_overwrite_fields` - Extractor should overwrite an existing field with the same name, otherwise it creates a new field with the prefix `duplicate` + - `should_remove_field` - Extractor should remove input field after extraction + - `remove_quotes` - Extractor removes quotes in the extracted values + - `skip_empty_values` - Extractor will remove empty strings after the extraction + - `thrown_exception_on_error`- Extractor throws an exception on error (recommended for testing), otherwise it skips the further processing +#### Pattern extractor +Extracting key value pairs by matching a list of regular expressions with named-capturing groups, where names of the groups are used for naming fields. Siembol supports syntax from [https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) except allowing to use underscores in the names of captured groups +- `regular_expressions` - The list of regular expressions +- `dot_all_regex_flag` - The regular expression `.` matches any character - including a line terminator +- `should_match_pattern` - At least one pattern should match otherwise the extractor throws an exception +#### Key value Extractor +Key value extractor extracts values from the field which has the form `key1=value1 ... keyN=valueN` + - `word_delimiter`- Word delimiter used for splitting words, by default ` ` + - `key_value_delimiter`- Key-value delimiter used for splitting key value pairs, by default `=`; + - `escaped_character`- Character for escaping quotes, delimiters, brackets, by default `\\`; + - `quota_value_handling` - Handling quotes during parsing + - `next_key_strategy` - Strategy for key value extraction where key-value delimiter is found first and then the word delimiter is searched backward + - `escaping_handling` - Handling escaping during parsing +#### CSV extractor +- `column_names` - Specification for selecting column names, where `skipping_column_name` is a name that can be used to not include a column with this name in the parsed message +- `word_delimiter` - Word delimiter used for splitting words +#### Json Extractor +Json extractor extracts valid json message and unfolds json into flat json key value pairs. +- `path_prefix` - The prefix added to the extracted field names after json parsing +- `nested_separator` - The separator added during unfolding of nested json objects +### Parser Transformations +#### Overview +All key value pairs generated by parsers and extractors can be modified by a chain of transformations. This stage allows the parser to clean data by renaming fields, removing fields or even filtering the whole message. +#### +#### field name string replace +Replace the first occurrence of `string_replace_target` in field names by `string_replace_replacement` + +#### field name string replace all +Replace all occurrences of `string_replace_target` by `string_replace_replacement`. You can use a regular expression in `string_replace_target` +#### field name string delete all +Delete all occurrences of `string_replace_target`. You can use a regular expression in `string_replace_target` +#### field name change case +Change case in all field names to `case_type` +#### rename fields +Rename fields according to mapping in `field_rename_map`, where you specify pairs of `field_to_rename`, `new_name` +#### delete fields +Delete fields according to the filter in `fields_filter`, where you specify the lists of patterns for `including_fields`, `excluding_fields` +#### trim value +Trim values in the fields according to the filter in `fields_filter`, where you specify the lists of patterns for `including_fields`, `excluding_fields` +#### lowercase value +Lowercase values in the fields according to the filter in `fields_filter`, where you specify the lists of patterns for `including_fields`, `excluding_fields` +#### uppercase value +Uppercase values in the fields according to the filter in `fields_filter`, where you specify the lists of patterns for `including_fields`, `excluding_fields` +#### chomp value +Remove new line ending from values in the fields according to the filter in `fields_filter`, where you specify the lists of patterns for `including_fields`, `excluding_fields` +#### filter message +Filter logs that are matching the `message_filter`, where `matchers` are specified for filtering. +## Parsing application +### Overview +Parsers are integrated in a stream application (storm topology) that combines one or multiple parsers, reads a log from input kafka topics and produces a normalised log to output kafka topics when parsing is successful or to an error topic on error. +- `parsing_app_name` - The name of the parsing application +- `parsing_app_version` - The version of the parsing application +- `parsing_app_autho` - The author of the parsing application +- `parsing_app_description`- Description of the parsing application +- `parsing_app_settings` - Parsing application settings + - `parsing_app_type`- The type of the parsing application - `router_parsing` or `single_parser` + - `input_topics` - The kafka topics for reading messages for parsing + - `error_topic`- The kafka topic for publishing error messages + - `input_parallelism` - The number of parallel executors for reading messages from the input kafka topics + - `parsing_parallelism` - The number of parallel executors for parsing messages + - `output_parallelism` - The number of parallel executors for publishing parsed messages to kafka + - `parse_metadata` - Parsing json metadata from input key records using `metadata_prefix` added to metadata field names, by default `metadata_` +- `parsing_settings` - Parsing settings depends on parsing application type +### Single Parser +The application integrates a single parser. +- `parser_name` - The name of the parser from parser configurations +- `output_topic`- The kafka topic for publishing parsed messages +### Router parsing +![router_parsing](images/router_parsing.svg) +The application integrates multiple parsers. First, the router parser parses the input message, from its output the routing field is extracted and used to select the next parser from the list of parsers by using pattern matching. The parsers are evaluated in order and only one is selected per log. +- `router_parser_name` - The name of the parser that will be used for routing +- `routing_field` - The field of the message parsed by the router that will be used for selecting the next parser +- `routing_message` - The field of the message parsed by the router that will be routed to the next parser +- `merged_fields` - The fields from the message parsed by the router that will be merged to a message parsed by the next parser +- `default_parser` - The parser that should be used if no other parsers is selected with `parser_name` and `output_topic` +- `parsers` - The list of parsers for further parsing + - `routing_field_pattern` - The pattern for selecting the parser + - `parser_properties` - The properties of the selected parser with `parser_name` and `output_topic` +## Admin Config +- `topology.name.prefix` - The prefix that will be used to create a topology name using the application name, by default `parsing` +- `client.id.prefix` - The prefix that will be used to create a kafka producer client id using the application name +- `group.id.prefix`- The prefix that will be used to create a kafka group id reader using the application name +- `zookeeper.attributes` - Zookeeper attributes for updating parser configurations + - `zk.url` - Zookeeper servers url. Multiple servers are separated by a comma + - `zk.path` - Path to a zookeeper node +- `kafka.batch.writer.attributes` - Global settings for the kafka batch writer used if they are not overridden + - `batch.size` - The max size of batch used for producing messages + - `producer.properties` - Defines kafka producer properties, see [https://kafka.apache.org/0102/documentation.html#producerconfigs](https://kafka.apache.org/0102/documentation.html#producerconfigs) +- `storm.attributes` - Global settings for storm attributes used if they are not overridden + - `bootstrap.servers` - Kafka brokers servers url. Multiple servers are separated by a comma + - `first.pool.offset.strategy` - Defines how the kafka spout seeks the offset to be used in the first poll to kafka + - `kafka.spout.properties` - Defines the kafka consumer attributes for the kafka spout such as group.id, protocol, see [https://kafka.apache.org/0102/documentation.html#consumerconfigs](https://kafka.apache.org/0102/documentation.html#consumerconfigs) + - `poll.timeout.ms`- Kafka consumer parameter `poll.timeout.ms` used in the kafka spout + - `offset.commit.period.ms` - Specifies the period of time (in milliseconds) after which the spout commits to Kafka, see [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/storm-moving-data/content/tuning_kafkaspout_performance.html](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/storm-moving-data/content/tuning_kafkaspout_performance.html) + - `max.uncommitted.offsets`- Defines the maximum number of polled offsets (records) that can be pending commit before another poll can take place + - `storm.config` - Defines storm attributes for a topology, see [https://storm.apache.org/releases/current/Configuration.html](https://storm.apache.org/releases/current/Configuration.html) +- `overridden_applications`- List of overridden settings for individual parsing applications. The overriden application is selected by `application.name`, `kafka.batch.writer.attributes` and `storm.attributes` diff --git a/docs/services/siembol_response_service.md b/docs/services/siembol_response_service.md new file mode 100644 index 00000000..b6809f43 --- /dev/null +++ b/docs/services/siembol_response_service.md @@ -0,0 +1,105 @@ +# Siembol Response Service +## Overview +Siembol response is a service for defining a response to an alert. It brings a functionality: +- To integrate siembol with other systems such as jira, ldap, elk, the hive, cortex etc. +- Enriching the alert with possibility of filtering it in order to speed up the incident response process and to reduce false positives +- Supporting alert throttling to reduce noise from alerts that are usually triggered in a bulk of alerts +- The pluggable interface allows to easily develop and integrate a plugin with custom evaluators if needed +## Siembol Response Rule +### Evaluation +The rules are ordered and evaluated similarly to firewall table - first match will stop further evalation. Each rule can return +- `match` - The alert was matched by the rule and the evaluation of the alert has been finished +- `no_match` - The alert was not matched by the rule and the next rule in the list will be evaluated +- `filtered` - - The alert was filtered by the rule and the evaluation of the alert has been finished + +A rule contains a list of evaluators which are evaluated during the rule evaluation. An evaluator can return the same values as a rule - `match`, `no_match`, `filtered`. A rule returns `match` if all its evaluators return `match`. +![parser_flow](images/response_evaluation.svg) +### Response Rule +- `rule_name` - Rule name that uniquely identifies the rule +- `rule_author` - The author of the rule - the user who last modified the rule +- `rule_version` - The version of the rule +- `rule_description` - This field contains a single text input that allows you set a description for the alert. This should be a short, helpful comment that allows anyone to identify the purpose of this rule +- `evaulators` - The list of evaluators for the rule evaluation. Each evaluator contains: + - `evaluator_type` - The type of the response evaluator + - `evaluator_attributes` - The attributes of the evaluator +### Provided evaluators +#### Fixed result +Fixed evaluator always returns evaluation result from its attributes. +- `evaluator_type` - The type equals to `fixed_result` +- `evaluator_attributes` + -`evaluation_result` - The evaluation result returned by the evaluator +#### Matching +Matching evaluator evaluates its matchers and returns evaluation result from its attributes. +- `evaluator_type` - The type equals to `matching` +- `evaluator_attributes` + - `evaluation_result` - The evaluation result returned by the evaluator after matching from `match`, `filtered`, `filtered_when_no_match` + - `matchers` - You can add as many matchers as you want. + - `matcher_type` - Type of matcher, either `REGEX_MATCH` or `IS_IN_SET` + - `is_negated`- The matcher is negated + - `field` - The name of the field on which the matcher will be evaluated + +There are two types of matchers: +- `REGEX_MATCH` - A regex_match allows you use a regex statement to match a specified field. There are two string inputs: + - `data` - The regex statement in Java syntax [https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) except allowing to to use underscores in the names of captured groups Named capture groups in the regex are added as fields in the event. They are available from the next matcher onwards and are included in the output event + +- `IS_IN_SET` - It compares the value of a field to a set of strings defined in `data`. If the value is in the set then the matcher returns true. + - `data` - A list of strings to compare the value to. New line delimited. Does not support regex - each line must be a literal match however, field substitution is supported in this field +#### Json path assignment +Json path assignment evaluator allows you to assign values from json path evaluation of a current alert into a field from its attributes. +- `evaluator_type` - The type equals to `json_path_assignment` +- `evaluator_attributes` + - `assignment_type` - The type of the assignment based on json path evaluation define the return value. The values are from `match_always`, `no_match_when_empty`, `error_match_when_empty` + - `field_name` - The name of the field in which the non-empty result of the json path evaluation will be stored + - `json_path`- Json path for evaluation using syntax from [https://github.com/json-path/JsonPath](https://github.com/json-path/JsonPath) +#### Markdown table formatter +Markdown table formatter evaluator formats json object into markdown table that can be used in the description of tickets in tracking systems such as Jira or the Hive. +- `evaluator_type` - The type equals to `markdown_table_formatter` +- `evaluator_attributes` + - `field_name` - The name of the field in which the computed markdown table will be stored + - `table_name` - The name of the table + - `fields_column_name` - The name of the column of the generated table with key names + - `values_column_name` - The name of the column of the generated table with object values + - `field_filter` - The field filter used for defining the alert fields in the table, where you specify the lists of patterns for `including_fields`, `excluding_fields` +#### Array markdown table formatter +Array Markdown table formatter evaluator formats json array into markdown table that can be used in the description of tickets in tracking systems such as Jira or the Hive. +- `evaluator_type` - The type equals to `array_markdown_table_formatter` +- `evaluator_attributes` + - `field_name` - The name of the field in which the computed markdown table will be stored + - `table_name` - The name of the table + - `array_field` - The array field of the alert that will be formatted in the table + - `field_filter` - The field filter used for defining the alert fields in the table, where you specify the lists of patterns for `including_fields`, `excluding_fields` +#### Array reducer +Array reducer evaluator allows to reduce a json array to a field. Json arrays are usually generated by evaluators for searching databases, performing API calls etc. +- `evaluator_type` - The type equals to `array_reducer` +- `evaluator_attributes` + - `array_reducer_type`- The type of the array reducer from `first_field`, `concatenate_fields` + - `array_field` - The field name of the array that will be used for reducing + - `prefix_name` - The prefix for creating field names where the + - `field_name_delimiter`- The delimiter that is used for generating field name from the prefix and an array field name + - `field_filter` - The field filter used for defining fields for computation, where you specify the lists of patterns for `including_fields`, `excluding_fields` +#### Alert throttling +Alert throttling evaluator allows to filter similar alerts in defined time windows. This can help to reduce noise from alerts that are usually triggered in a bulk of alerts +- `evaluator_type` - The type equals to `alert_throttling` +- `evaluator_attributes` + - `suppressing_key`- The key for suppressing alerts in specified time window + - `time_unit_type`- The type of time unit from `minutes`, `hours`, `seconds` + - `suppression_time` - The time for alert to be suppressed in the time units +#### Sleep +Sleep evaluator is postponing the further evaluation of the rule by time provided in its attributes. +- `evaluator_type` - The type equals to `sleep` +- `evaluator_attributes` + - `time_unit_type`- The type of time unit from `seconds`, `milli_seconds` + - `sleeping_time` - The time of sleeping in the time units +## Plugins +### Plugin architecture +### Evaluators implemented internally at GR that we are planning to open source +#### Elk search +#### Elk store +#### The hive alert +#### The hive case +#### Ldap search +#### Cortex analysis +#### Jira search +#### Jira create issue +#### Papermill notebook +## Application Properties \ No newline at end of file diff --git a/docs/siembol_ui/how-tos/how_to_add_links_to_siembol_ui_home_page.md b/docs/siembol_ui/how-tos/how_to_add_links_to_siembol_ui_home_page.md new file mode 100644 index 00000000..527af653 --- /dev/null +++ b/docs/siembol_ui/how-tos/how_to_add_links_to_siembol_ui_home_page.md @@ -0,0 +1,4 @@ +# How to add links to siembol ui home page +## Prepare links +## Prepare icons +## Edit config editor ui properties diff --git a/docs/siembol_ui/how-tos/how_to_add_new_config_in_siembol_ui.md b/docs/siembol_ui/how-tos/how_to_add_new_config_in_siembol_ui.md new file mode 100644 index 00000000..97cb5b3b --- /dev/null +++ b/docs/siembol_ui/how-tos/how_to_add_new_config_in_siembol_ui.md @@ -0,0 +1,4 @@ +# How to add new config in siembol ui +## Add new config +## Clone existing config +## Validate and Submit config to Store diff --git a/docs/siembol_ui/how-tos/how_to_deploy_configurations_in_siembol_ui.md b/docs/siembol_ui/how-tos/how_to_deploy_configurations_in_siembol_ui.md new file mode 100644 index 00000000..9d58d37c --- /dev/null +++ b/docs/siembol_ui/how-tos/how_to_deploy_configurations_in_siembol_ui.md @@ -0,0 +1,8 @@ +# How to deploy configurations in siembol ui +## Upgrade a config +## Remove a config +## Change order of configs in the deployment +## Create Pull Request with new deployment in siembol ui +### Validate deployment syntax +### Add additional metadata (when applicable) +### Test deployment (when applicable) \ No newline at end of file diff --git a/docs/siembol_ui/how-tos/how_to_modify_ui_layout.md b/docs/siembol_ui/how-tos/how_to_modify_ui_layout.md new file mode 100644 index 00000000..b16cd634 --- /dev/null +++ b/docs/siembol_ui/how-tos/how_to_modify_ui_layout.md @@ -0,0 +1,8 @@ +# How to modify ui layout +## Prepared layout files per service types +## How to change +### Title +### Description +### Adding help link +### Adding regular expression for validation a field +## How to change layout file per service \ No newline at end of file diff --git a/docs/siembol_ui/how-tos/how_to_setup_oauth2_oidc_in_siembol_ui.md b/docs/siembol_ui/how-tos/how_to_setup_oauth2_oidc_in_siembol_ui.md new file mode 100644 index 00000000..eab5632c --- /dev/null +++ b/docs/siembol_ui/how-tos/how_to_setup_oauth2_oidc_in_siembol_ui.md @@ -0,0 +1,13 @@ +# How to setup oauth2/oidc in siembol ui +## Scopes +## Claims +### groups +### email +## Supported flows +### Pkce with authorization code grant +## Config editor UI properties +## Config editor rest application properties +## Auhtorisation of services based on groups claim +### Authorisation to a service for service users +### Authorisation to a service for service administrators + diff --git a/docs/siembol_ui/how-tos/how_to_submit_config_in_siembol_ui.md b/docs/siembol_ui/how-tos/how_to_submit_config_in_siembol_ui.md new file mode 100644 index 00000000..5f07ec9c --- /dev/null +++ b/docs/siembol_ui/how-tos/how_to_submit_config_in_siembol_ui.md @@ -0,0 +1,3 @@ +# How to submit config in siembol ui +## Edit Config +## Validate and Submit config to Config Store diff --git a/docs/siembol_ui/how-tos/how_to_test_config_in_siembol_ui.md b/docs/siembol_ui/how-tos/how_to_test_config_in_siembol_ui.md new file mode 100644 index 00000000..8c00a341 --- /dev/null +++ b/docs/siembol_ui/how-tos/how_to_test_config_in_siembol_ui.md @@ -0,0 +1,10 @@ +# How to test config in siembol ui +## Test Config +## Test specification +## Test Output +## Test Case +### Overview +### Evaluate list of test cases +### Add new test case +### Edit existing test case +### Run and evaluate test case diff --git a/docs/siembol_ui/how-tos/how_to_test_deployment_in_siembol_ui.md b/docs/siembol_ui/how-tos/how_to_test_deployment_in_siembol_ui.md new file mode 100644 index 00000000..c453ace1 --- /dev/null +++ b/docs/siembol_ui/how-tos/how_to_test_deployment_in_siembol_ui.md @@ -0,0 +1,6 @@ +# How to test deployemnt is siembol ui +## Overview +## See differences in deployemnts +## Test Specification +## Running the test +## The test output diff --git a/docs/siembol_ui/siembol_ui.md b/docs/siembol_ui/siembol_ui.md new file mode 100644 index 00000000..b217f480 --- /dev/null +++ b/docs/siembol_ui/siembol_ui.md @@ -0,0 +1,17 @@ +# Siembol UI +## Authentication +## Home page +### Services +### Recently visited +### Explore Siembol + +## Service configurations view +### Config Store +### Deployment +### Filtering +### File History +## Admin configuration view +## Editing service config +### Config Editor +### Testing Configuration +### Test Cases diff --git a/siembol-common/src/main/java/uk/co/gresearch/siembol/common/model/StormAttributesDto.java b/siembol-common/src/main/java/uk/co/gresearch/siembol/common/model/StormAttributesDto.java index a37047f4..cd2c48bf 100644 --- a/siembol-common/src/main/java/uk/co/gresearch/siembol/common/model/StormAttributesDto.java +++ b/siembol-common/src/main/java/uk/co/gresearch/siembol/common/model/StormAttributesDto.java @@ -12,7 +12,7 @@ import java.util.List; @Attributes(title = "storm attributes", description = "Attributes for storm configuration") public class StormAttributesDto { @JsonProperty("bootstrap.servers") - @Attributes(required = true, description = "Kafka brokers servers url. Multiple servers are separated by coma") + @Attributes(required = true, description = "Kafka brokers servers url. Multiple servers are separated by comma") private String bootstrapServers; @SchemaIgnore @JsonIgnore diff --git a/siembol-common/src/main/java/uk/co/gresearch/siembol/common/model/ZookeeperAttributesDto.java b/siembol-common/src/main/java/uk/co/gresearch/siembol/common/model/ZookeeperAttributesDto.java index cf644e98..45443770 100644 --- a/siembol-common/src/main/java/uk/co/gresearch/siembol/common/model/ZookeeperAttributesDto.java +++ b/siembol-common/src/main/java/uk/co/gresearch/siembol/common/model/ZookeeperAttributesDto.java @@ -8,7 +8,7 @@ import java.io.Serializable; @Attributes(title = "zookeeper attributes", description = "Zookeeper attributes for node cache") public class ZookeeperAttributesDto implements Serializable { @JsonProperty("zk.url") - @Attributes(required = true, description = "Zookeeper servers url. Multiple servers are separated by coma") + @Attributes(required = true, description = "Zookeeper servers url. Multiple servers are separated by comma") private String zkUrl; @Attributes(required = true, description = "Path to a zookeeper node") @JsonProperty("zk.path")