Skip to main content

Azul Plugin Unbox

Unbox is a common package and abstraction used by Azul for plugins that handle archive and compression formats. It is also extended to handle some executable packing formats like UPX.

It includes a password guessing option for formats that support password protection. This is seeded from:

  • Common known password lists via config
  • Password dictionaries supplied in events from upstream plugins/services

Development Installation

To install azul-plugin-unbox for development run the command (from the root directory of this project):

Different plugins have their own system requirements for required application/tools. See: unbox install for more information or check out the Dockerfile.

# install system dependencies
apt-get install -y --no-install-recommends $(grep -vE "^\s*(#|$)" debian.txt | tr "\n" " ")
# install non apt dependencies
sudo ./install-custom-packages.sh
# install package
pip install -e .

Usage: azul-plugin-unbox

Acts as a multiplugin, it's multiplugins handle the following:

7zip

Currently configured to process:

  • ZIP (handles more compression modes than alternate zip plugin (AES))
  • 7ZIP
  • ISO
  • Windows Installer

Example Output:

----- SevenZip results -----
OK

Output features:
box_compression: test_dir/test_subdir_file.txt - LZMA:16 7zAES:19
test_file_1.txt - LZMA:16 7zAES:19
test_file_2.txt - LZMA:16 7zAES:19
box_password: password
password: password
box_insertdate: test_file_1.txt - 2016-07-14 06:08:01
test_file_2.txt - 2016-07-14 06:08:22
test_dir/test_subdir_file.txt - 2016-07-18 03:37:31
box_type: sevenzip
box_count: 3
box_filepath: test_dir/test_subdir_file.txt
test_file_1.txt
test_file_2.txt

Generated child entities (3):
{'action': 'extracted'} <binary: 77f285fd549654a7f9a54ec4dfc7597fcfd66b6622c3b31b60047204f0a213f0>
content: 45 bytes
{'action': 'extracted'} <binary: 816ac3e617f55165dd748c5b6ff16fd05b79bf5104bfd41274635019b239d02b>
content: 50 bytes
{'action': 'extracted'} <binary: c6eedc388ac70e3b6962f0626434fd664e2c4a6105cc544d36ab11d006bdd5a0>
content: 39 bytes

Feature key:
box_compression: Compression method used on this file entry
box_password: Password used to unbox this binary
password: Password used to unbox this binary
box_insertdate: Date the file was inserted into the archive
box_type: The binary is of this box type
box_count: Number of items found in the box
box_filepath: This entity contains this filepath

arj

Currently configured to process:

  • ARJ Supports password protected archives.

Example Output:

----- Arj results -----
OK

Output features:
password: password
box_count: 3
box_type: arj
box_filepath: test_file_1.txt
test_file_2.txt
test_subdir_file.txt
box_password: password

Generated child entities (3):
{'action': 'extracted'} <binary: 77f285fd549654a7f9a54ec4dfc7597fcfd66b6622c3b31b60047204f0a213f0>
content: 45 bytes
{'action': 'extracted'} <binary: 816ac3e617f55165dd748c5b6ff16fd05b79bf5104bfd41274635019b239d02b>
content: 50 bytes
{'action': 'extracted'} <binary: c6eedc388ac70e3b6962f0626434fd664e2c4a6105cc544d36ab11d006bdd5a0>
content: 39 bytes

Feature key:
password: Password used to unbox this binary
box_count: Number of items found in the box
box_type: The binary is of this box type
box_filepath: This entity contains this filepath
box_password: Password used to unbox this binary

cabinet

Processes Microsoft Cabinet Format (.CAB) files including self-extracting EXE's. Currently configured to process:

  • CAB

Example Output:

----- Cab results -----
OK

Output features:
box_count: 2
box_type: cab
box_filepath: test1.txt
test2.txt
box_insertdate: test2.txt - 2018-03-12 00:40:32
test1.txt - 2018-03-12 00:40:36

Generated child entities (2):
{'action': 'extracted'} <binary: a4df9c5a55aa25e967a45401b3fe6955dccc381403c2574b6ef1ef6a9136e063>
content: 9 bytes
{'action': 'extracted'} <binary: 837ea69644a4435aacb379c9b3b14087576d5cbeabe8442a35f592e71d42ca72>
content: 17 bytes

Feature key:
box_count: Number of items found in the box
box_filepath: This entity contains this filepath
box_type: The binary is of this box type
box_insertdate: Date the file was inserted into the archive

chm

Processes Microsoft Compiled HTML Help Files, extracting any embedded files. Currently configured to process:

  • CHM

Example Output:

----- CHM results -----
OK

Output features:
box_type: chm
box_filepath: /Content/Main.htm
/Content/Page.htm
/Project.hhc
/Project.hhk
/_#_README_#_
box_count: 5

Generated child entities (5):
{'action': 'extracted'} <binary: 348773b69aeb3549b7dca28e899adb488b50c9958e99ab26b494eb02646f3d3b>
content: 136 bytes
{'action': 'extracted'} <binary: 5e47de2c21ac971e405fcd0bc54888e080a9e317bd0d1737bcac52c1601f5f92>
content: 449 bytes
{'action': 'extracted'} <binary: 83302c10e4838a67ceb39d3f11250251135e56e221701f8eecf5263d6de30577>
content: 379 bytes
{'action': 'extracted'} <binary: 5edf10501797afcc8c8612a83f847c1f9f0a5c4eac401cab9a9ffab8e01a76c3>
content: 109 bytes
{'action': 'extracted'} <binary: 2f6ea5d512de1d24baac526aa837371e7a1b15c5f3f31edb52f88ded4eba57f5>
content: 78 bytes

Feature key:
box_type: The binary is of this box type
box_filepath: This entity contains this filepath
box_count: Number of items found in the box

pdf

Processes PDF files, extracting their child streams and handling password decryption. It utilises the qpdf tool to produce a decrypted version of the PDF. Owner passwords (permission restrictions) are trivially stripped and user passwords guessed based on supplied password dictionaries.

Currently configured to process:

  • PDF

Example Output:

----- Pdf results -----
OK

Output features:
password:
box_password:
pdf_object_dictionary: 10 - << /BBox [ -112 420 708 420.1 ] /Filter /FlateDecode /Group << /CS /DeviceRGB /K true /S /Transparency >> /Length 8 /Subtype /Form /Type /XObject >>
6 - << /Filter /FlateDecode /Length 236 >>
13 - << /Filter /FlateDecode /Length 319 >>
14 - << /Filter /FlateDecode /Length 8210 /Length1 12652 >>
box_type: pdf
box_count: 5

Generated child entities (4):
{'action': 'extracted'} <binary: e6b611d975aae6bbee8e87751f94eafb009ca3ac102f549e3249a96a3f91dec3>
content: 11084 bytes
{'action': 'extracted', 'object_id': '6', 'filter': 'FlateDecode'} <binary: 0a1d13ef4359b4f9458911df6e3a27639561ef86ad397702e9903f8cde86a6cb>
content: 411 bytes
{'action': 'extracted', 'object_id': '13', 'filter': 'FlateDecode'} <binary: 310f2f065725beace3f3b8bb249c5cfcb597d9c491f2e85be79a51ca7fead6e0>
content: 570 bytes
{'action': 'extracted', 'object_id': '14', 'filter': 'FlateDecode'} <binary: 1c84e399ca23ff59969af26a938c85aa92490ef38db571d58a845e3c05924617>
content: 12652 bytes

Feature key:
password: Password used to unbox this binary
box_password: Password used to unbox this binary
pdf_object_dictionary: Object dictionary/id for the extracted PDF stream
box_type: The binary is of this box type
box_count: Number of items found in the box

rar

Handles extracting files from RAR archive format. Currently configured to process:

  • RAR Supports password protected archives.

Example Output:

----- Rar results -----
OK

Output features:
rar_compression: test_file_1.txt - 51
test_file_2.txt - 51
password: password
box_filepath: test_file_1.txt
test_file_2.txt
box_insertdate: test_file_1.txt - 2016-07-14 16:08:01
test_file_2.txt - 2016-07-14 16:08:22
box_password: password
box_count: 2
box_type: rar

Generated child entities (2):
{'action': 'unrar'} <binary: 77f285fd549654a7f9a54ec4dfc7597fcfd66b6622c3b31b60047204f0a213f0>
content: 45 bytes
{'action': 'unrar'} <binary: 816ac3e617f55165dd748c5b6ff16fd05b79bf5104bfd41274635019b239d02b>
content: 50 bytes

Feature key:
rar_compression: Compression used on the contained file
password: Password used to unbox this binary
box_filepath: This entity contains this filepath
box_insertdate: Date the file was inserted into the archive
box_password: Password used to unbox this binary
box_count: Number of items found in the box
box_type: The binary is of this box type

Usage: azul-unixarchive

This plugin handles unix system archive and compression formats. Currently configured to process:

  • GZIP
  • TAR
  • BZIP2

Example Output:

----- UnixArchive results -----
OK

Output features:
box_type: archive
box_filepath: testdir/.testing
testdir/test.yaml
testdir/README.md
box_count: 3
archive_encoding: utf-8

Generated child entities (27):
{'action': 'extracted'} <binary: 89829064945e65947a902e8b0bb8cb3b58b0d469ac291a62a3058ae9ff266556>
content: 463 bytes
{'action': 'extracted'} <binary: 88fbd1ef10e1c27809297180d1ae0960f0b5bf1f52f826566d87bb7c6a408731>
content: 2183 bytes
{'action': 'extracted'} <binary: 1c0008dbcd3883f86fc4aa9c53f0cc4a7c5a146e731a58d1a337518d3539d9de>
content: 275 bytes

Feature key:
box_count: Number of items found in the box
archive_encoding: Character Encoding used by this archive
box_type: The binary is of this box type
box_filepath: This entity contains this filepath

upx

Unpacks UPX packed executables for several OSes (Windows, Linux, MacOS). Currently configured to process:

  • Win32 EXE
  • Win32 DLL
  • DOS EXE
  • ELF
  • Mach-O

Example Output:

----- UPX results -----
OK

Output features:
upx_version: 3.94
box_count: 1
box_type: upx

Generated child entities (1):
{'action': 'unpacked'} <binary: 38a241ffbc8665eca72bbbd15e1e04d79f745fec7e3c31c3b12c1eaf820abb1c>
content: 161792 bytes

Feature key:
box_count: Number of items found in the box
box_type: The binary is of this box type
upx_version: Detected upx version used to pack executable

Automated usage in system:

azul-upx --server http://azul-dispatcher.localnet/

zip

Extracts contents of zip files using python's inbuilt zipfile package. Currently configured to process:

  • ZIP Supports password protected archives. (more robust than 7zip and handles some files that 7zip won't)

Note: This does not support all possible encryption/compression modes and as such, it is generally recommended to use 7zip in preference, which has more comprehensive support.

Example Output:

----- Zip results -----
OK

Output features:
box_type: zip
password: password
box_password: password
box_count: 2
box_filepath: test_file_1.txt
test_file_2.txt
box_insertdate: test_file_1.txt - 2016-07-14 16:08:02
test_file_2.txt - 2016-07-14 16:08:22
zip_compression: 0

Generated child entities (2):
{'action': 'unzipped'} <binary: 77f285fd549654a7f9a54ec4dfc7597fcfd66b6622c3b31b60047204f0a213f0>
content: 45 bytes
{'action': 'unzipped'} <binary: 816ac3e617f55165dd748c5b6ff16fd05b79bf5104bfd41274635019b239d02b>
content: 50 bytes

Feature key:
box_type: The binary is of this box type
password: Password used to unbox this binary
box_password: Password used to unbox this binary
box_count: Number of items found in the box
box_filepath: This entity contains this filepath
box_insertdate: Date the file was inserted into the archive
zip_compression: Compression used on this zip file

Python Package management

This python package is managed using a setup.py and pyproject.toml file.

Standardisation of installing and testing the python package is handled through tox. Tox commands include:

# Run all standard tox actions
tox
# Run linting only
tox -e style
# Run tests only
tox -e test

Dependency management

Dependencies are managed in the requirements.txt, requirements_test.txt and debian.txt file.

The requirements files are the python package dependencies for normal use and specific ones for tests (e.g pytest, black, flake8 are test only dependencies).

The debian.txt file manages the debian dependencies that need to be installed on development systems and docker images.

Sometimes the debian.txt file is insufficient and in this case the Dockerfile may need to be modified directly to install complex dependencies.