ONYPHE Processing Pipeline
Once you have installed the CLI tools, you have access to onyphe program but also to opp program. OPP stands for ONYPHE Processing Pipeline, like jq but easier to use.
OPP works with the concept of the UNIX pipeline and has been thought to be very similar in philosophy as the Splunk Processing Language. The pipe basically processes JSON to apply transforms on the input to output another view of the original JSON data.
Each of these pipe uses one procedure called a proc in OPP terminology.
Sample usage
Output a unique list of values taken from domain field:
# From a standalone JSON file:
opp 'uniq domain' < /tmp/input.json
# From ONYPHE Search API
onyphe -search 'protocol:http | uniq domain'
Output this unique list, and add a count on the results:
# From a standalone JSON file:
opp 'uniq domain | addcount' < /tmp/input.json
# From ONYPHE Search API
onyphe -search 'protocol:http | uniq domain | addcount'
Then, output on console in text human-readable format:
# From a standalone JSON file:
opp 'uniq domain | addcount | output' < /tmp/input.json
# From ONYPHE Search API
onyphe -search 'protocol:http | uniq domain | addcount | output'
Output a list of deduplicated Web site related data:
# From a standalone JSON file:
opp 'dedup ip,port,forward | fields ip,port,forward | addcount' < /tmp/input.json
# From ONYPHE Search API
onyphe -search 'protocol:http | dedup ip,port,forward | fields ip,port,forward | addcount'
List of procs
Addcount
Will add a count field on JSON output. This can be seen on a count of processed JSON documents.
| addcount
Count
Will count processed JSON documents. The output will only be the count of processed documents, you will loose input data.
| count
Dedup
Will dedup input documents based on a dedup key. That key may be comprised of multiple fields. Output won’t be changed, just the latest entry will be rendered. The oldest one won’t be kept.
| dedup domain
| dedup ip,port
| dedup ip,port,forward
Expand
Some fields may have an array of values. Sometimes, you want to explode, or expand, to create as many documents as the uniq values from arrays. You can only expand one field at a time, but you can pipe to another expand proc as you wish:
| expand domain
Fieldcount
As fields may be an array of values, you may want to count the number of values for a given field. It will add a new field on output which is automatically built as field namecount. For instance, fieldcount domain will add a domaincount field on output:
| fieldcount domain
Fields
When you want to reduce the number of kept fields on output, you can use fields proc to reduct the volume of data. Perfect to integrate in a SIEM where the pricing is at the processed volume of data:
| fields ip,port,domain,protocol
Filter
The filter proc allows to apply filters based on integer. For instance, you can use a fieldcount on domain field and decide to keep documents only when there is a given threshold:
| filter domaincount:>2
| fieldcount domain | filter domaincount:>2
Flatten
JSON documents can be complex. We like to describe them as 3D entries, not like a 2D entry in CSV format, for instance. The flatten proc allows to switch back to a 2D entry when possible:
| flatten
Noop
Just does nothing. It may be useful sometimes, you will see:
| noop
Output
Format JSON document output in a text format manner. Perfect to create text input files to be used with some ONYPHE’s APIs, like the Discovery API.
| output
Splitsubnet
Some subnet may be huge, like /8 subnets. In ONYPHE APIs, you can search by CIDR but are limited to smaller or equal subnets as /16. If you want to search against a /15, splisbunet proc will create two documents as /16 CIDR documents:
| splitsubnet
Top
Statistics. You may want to identify the top organization found as output of your search. Just pipe with top proc on the wanted field:
| top organization
Uniq
uniq proc is like dedup proc, but it will only output the values of the given field. Other JSON input fields will be lost after this pipe.
| uniq domain
Lookup
lookup proc is used to enrich JSON documents from a local inventory. For instance, you may want to add email contact for a given domain or IP network block.
NOTE: you can only lookup with one field at a time, no AND matching possible.
echo domain,contact > /tmp/lookup.csv
echo example.com,contact@example.com >> /tmp/lookup.csv
| lookup /tmp/lookup.csv
Does the same with an IP network block:
echo ip,contact > /tmp/lookup.csv
echo 8.8.8.0/24,contact@example.com >> /tmp/lookup.csv
| lookup /tmp/lookup.csv
Allowlist
allowlist proc allows to only keep JSON documents having a set of matching field/values pairs. For instance, you may want to filter to only keep documents from a specific domain & organization after a call to pivots proc:
echo domain,organization > /tmp/allowlist.csv
echo example.com,BLOOMINGDALE-COMMUNICATIONS >> /tmp/allowlist.csv
| allowlist /tmp/allowlist.csv
Does the same with an IP network block:
echo domain,ip > /tmp/allowlist.csv
echo example.com,192.222.25.0/24 >> /tmp/allowlist.csv
| allowlist /tmp/allowlist.csv
Blocklist
blocklist does the opposite as allowlist. Exactly the same syntax, we strip results when matches are found.
ONYPHE specific procs
ONYPHE specific procs will basically use some ONYPHE APIs to do their work. That means you can use them to perform correlation searches using ONYPHE Query Language, for instance. It is like using the output from a previous proc to execute more targeted searches with placeholders as values.
Pivots
pivots is one of the latest additions to discover an attack surface. It will merge some fields to create points of interest, or pivots, you can use to expand your view on a given organization.
| pivots
Discovery
Will leverage the Discovery API to execute searches:
| discovery category:vulnscan -exists:cve
| discovery category:datascan tag:open device.class:database
For a more complete example with ONYPHE CLI as input:
onyphe -search 'tag:cac40 | pivots | discovery category:datascan tag:open device.class:database | expand domain | expand cpe | fields ip,port,domain,cpe | output'
That’s starting to look very neat, right?
Search
Use results from a previous search against a more specific search. Correlation for the win. Here, we want to search for exposed protocol:modbus devices that also have an exposed Web interface. The output for this search proc will be the output of this one, data before the pipe will be lost:
| search category:datascan protocol:http ip:$ip
The complete example:
onyphe -search 'category:datascan protocol:modbus | search category:datascan protocol:http ip:$ip | expand domain | fields ip,port,domain,protocol | output'
Where
This one is like search proc, except output after the pipe will be JSON documents found before the pipe when a match is found:
| where category:datascan protocol:http ip:$ip
The complete example:
onyphe -search 'category:datascan protocol:modbus | where category:datascan protocol:http ip:$ip | expand domain | fields ip,port,domain,protocol | output'
Merge
This one is awesome. It will merge JSON documents fields into a unique JSON document based on data before and after the pipe.
| merge category:datascan protocol:http ip:$ip
First example, to merge protocol:modbus with protocol:http data from a correlation search:
onyphe -search 'category:datascan protocol:modbus | merge category:datascan protocol:http ip:$ip | expand domain | fields ip,port,domain,protocol | output'
Second example, uncloack .onion Web sites when they also expose their service on the clear-Web:
onyphe -search 'category:onionscan protocol:http !tag:default status:200 | merge category:datascan app.http.bodymd5:$app.http.bodymd5 | expand domain | dedup ip,onion | fields ip,onion'