Skip to main content

Categories

Categories are data patterns that are searched in a response body. They can be regexes, raw strings, excluding rules, correlate rules, or accelerated native matchers.

The following matching strategies are considered individually, and a match by any matching strategy constitutes a match of the category:

  • raw (String),
  • rawInsensitive (String),
  • regex (Regex),
  • internal (native matcher). a complete list of native matchers can be found here

The following have special properties that will override the behavior of other rules in the category:

  • and (list of other match rules),
  • except_regex (Regex),
  • except (String),
  • exceptInsensitive (String),
  • correlate (correlate rule),

Writing regexes

Since the policies are written in YAML, any backslash in a regex must be escaped (\d is invalid, where \\d matches a digit group as expected). In some cases, it may be more readable to specify character ranges ([0-9] as opposed to \\d).

Email regex with an ignored email

This example matches emails, but ignores the specific email someone@example.com

categories:
email:
- regex: "[a-zA-Z0-9_.+-]{2,}@[a-zA-Z0-9-]{3,}\\.[a-zA-Z0-9-.]{2,}"
- except: someone@example.com

Suspicious strings

This example matches some suspicious string literals. These are usually used in JSON keys, or other structured data formats.

categories:
suspicious_strings:
- raw: credit_card
- raw: social_security_number
- raw: password_hash

In practice, they might be broken down into multiple rules for more detailed reporting. For instance, the previous example can be rewritten as follows.

categories:
personal_information:
- raw: credit_card
- raw: social_security_number
security_data:
- raw: password_hash

Case Insensitive Suspicious strings

Similar to above, this matches suspicious string literals while ignoring ASCII case.

categories:
suspicious_strings:
- raw_insensitive: credit_card
- raw_insensitive: social_security_number
- raw_insensitive: password_hash

Unformatted phone numbers

This example matches 10 digit unformatted phone numbers

categories:
phone_number:
- regex: "[^0-9][0-9]{10}[^0-9]"

Correlate rules

A Correlate rule contains a secondary rule group, and only signals a match if the parent group and secondary group match within a certain distance of one another. This effects the behavior of the entire category.

Unformatted phone numbers near "phone"

This example matches 10 digit unformatted phone numbers within 64 bytes of the "phone" string. The interest field denotes which of the two groups should be reported as interesting: primary (default), secondary, or all for all characters between both matches.

categories:
phone_number_near_label:
- regex: "[^0-9][0-9]{10}[^0-9]"
- correlate:
interest: all
max_distance: 16
matches:
- raw: phone

Notably, these groups can also be references to other matching rules:

categories:
phone_number:
- regex: "[^0-9][0-9]{10}[^0-9]"
phone_number_near_label:
- raw: "number"
- correlate:
interest: secondary
max_distance: 16
match_group: phone_number

Multiple Correlates

You can specify multiple correlates within the same category, which will be evaluated separately from each-other. Nesting correlate rules inside other correlate rules is not supported and will throw an error during policy deserialization.

categories:
ssn:
- regex: "\\b\\d{3}[ .-]\\d{2}[ .-]\\d{4}\\b"
- correlate:
interest: primary
max_distance: 16
matches:
- raw_insensitive: ssn
- correlate:
interest: secondary
max_distance: 16
matches:
- raw_insensitive: social
- raw_insensitive: security

Category Tags

There is an alternative form of parsing for categories that allows you to set a tag for the corresponding matchers. if used, the tag will be added to the metadata of any produced matches.

The below example will set "tag": "routing" in the metadata field of any matches produced by routing or routing_2. As contrast, the phone_number category still uses the normal form of category parsing.

categories:
phone_number:
- regex: "[^0-9][0-9]{10}[^0-9]"
routing:
matchers: !internal routing_number
tag: "routing"
routing_2:
matchers:
- regex: "\\b\\d{9}\\b"
- regex: "\\b\\d{5}\\b"
- and: !internal routing_number
tag: "routing"