Entropy Across Multiple Documents

The entropy across multiple documents can be used to analyze unknown binary file headers.
The entropy across multiple network packets can be used to analyze protocols data.


The previous graph allow to quickly differentiate the header part, similar to all documents (with low entropy and dispersion) and the data part, specific to each document (high entropy and high dispersion)

Notice : You can have high entropy and dispersion in the header in case of checksum. The typical case is high entropy for a single value (depending of the checksum size, this can be 1, 2, 4 or 8 consecutive bytes)

To know the possible values for a specific index, simply move the mouse cursor over the dispersion graph.
The values will be display inside a tooltip with percent of usage for each value. The values are ordered by percent of usage (higher percent is at top)

As many protocols and file headers store information in single bit and not byte (see IP, UDP and TCP headers for example), cross documents bits entropy and dispersion are computed to, to refine the analysis once the header size has been found.


Network Protocols Analyzis

Entropy across multiple packets can be computed for different network layers, so you can analyze data for a specific network layer.
The supported layers are: Raw, Ethernet, IP (v4 or v6), UDP, TCP, ICMP echo request and ICMP echo reply (for IPv4 or IPv6).

Filtering ensure you get only specific packets. Notice : IP filters support IPv4 and IPv6 formats.

An important thing to do for network protocol analysis, is to treat query and response separately because they can potentially have different headers.

Query and response splitting can be done thanks to filtering of source IP, destination IP, and for UDP and TCP source port and destination port.

Currently only Wireshark packets capture file are supported (pcap and pcapng)




File Headers Analyzis

The entropy for multiple files can be computed for a list of files, or for directory