Skip to main content

Plugins

A typical plugin runs the Event Processing shown below. During this process the plugin continually queries dispatcher for new jobs that are acquired from Kafka topics. During it's internal processing dispatcher saves a consumer that buffers where in each topic the plugin is up to. It then queries each topics and takes all the messages and filters the messages based on the plugins capabilities. This enables it to only provide the most relevant messages back to the plugin.

Event Processing

Plugins within Azul will register themselves when they startup providing their version number which can be used to uniquely identify which plugin and the specific version that generated data or features.

The plugin then runs an infinite loop continually polling for new information from a topic (process shown in the diagram.)

A plugin retrieves events one at a time (in case it crashes). It can publish multiple output events as a result of a single input event.

The dispatcher retrieves multiple events at once, so it can avoid always hitting Kafka (not shown in diagram).

Normal Plugins

The most common kind of plugin is a binary plugin. These plugins are written in Python and inherit from BinaryPlugin found in azul-runner. Other plugins inherit from Plugin also found in azul-runner.

Plugins that are not binary plugins but still behave similarly to binary plugins include:

  • azul-plugin-netinfo
  • azul-plugin-office
  • azul-plugin-unbox

Note: azul-plugin-unbox: Is a significant plugin, it extracts archives and resubmits them to Azul for processing. Otherwise it behaves the same as all plugins.

Special Python Plugins

There are also Python plugins that expose their own RestAPI to Azul and behave differently to other plugins as a result these are:

  • azul-plugin-assemblyline: Ingests data from assemblyline and feeds it into azul (still in development at time of writing)

  • azul-plugin-retrohunt: Retrohunt indexes data to enable it to be searched with bigyara this follows the normal plugin lifecycle. It then exposes a RestAPI which enables users to use bigyara (biggrep internally) to search over the indexed data.

  • azul-report-feeds (in development): Streams data from external sources and converts the source data into a format appropriate for Azul. It converts the external feature data into the Azul Feature format.

GoLang Plugins

There are three plugins written in GoLang and these use the same API endpoints to communicate with dispatcher:

  • azul-plugin-virustotal: Uses the appropriate API to acquire metadata from VirusTotal or VirusLocal and converts the ingested data into a format that Azul can use.
  • azul-plugin-entropy: Behaves just like a normal binary plugin but written in golang.
  • azul-plugin-goinfo: Behaves like a normal binary plugin but written in golang

Job Processing Diagram

A job is considered to be a single event processed by a single plugin. Hence, each plugin will traverse this state diagram for the same event in Kafka.

KafkaEvent sitting in kafka and waiting for dispatcher plugin pointer to catch up
DispatcherEvent read by dispatcher plugin pointer and evaluated to match plugin filters
SkipEvent failed dispatcher plugin filter check
PluginEvent being executed by plugin
CompletedEvent execution resulted in produced data (OK, SUCCESS, COMPLETED are all the same)
ExceptionEvent execution resulted in standard python stacktrace (plugin or library error)
Critical FailureEvent execution resulted in very serious error (core Azul code error)
OptoutPlugin determined event was not valid for plugin
StoreEvent produced data being stored (in kafka via dispatcher)

Event Processing - Detail

A plugin initially registers itself with the dispatcher (this event is detected by the ingestor-plugin).

A plugin retrieves events one at a time (in case it crashes). It can publish multiple output events as a result of a single input event.

The dispatcher retrieves multiple events at once, so it can avoid always hitting Kafka (not shown in diagram).

Plugin Consuming Events from ConsumerGroups (CGs)

Plugins continually request a single event from dispatcher which they then process. Dispatcher will create a ConsumerGroup which will then track how far a plugin has progressed through each topic.

The ConsumerGroups are actually stored on Kafka under a unique name in a custom naming format that indicates the name of the consumer and some information about the different information it's trying to consume. e.g historic/live/expedite. Because this format is standardised between dispatcher instances they all watch the same consumer groups.

Also noteworthy is the topics under the consumer group Historical and Live are the same however Historical subscribes to the topics starting at earliest and live subscribes to the topic starting at latest. (expedite also starts at latest)

The below diagrams displays how dispatcher selects what events to provide to a new client or plugin (noting a client can ignore historic event.):