How does ssDeep work?

Table of Contents

ssDeep [1] is a fuzzy hashing algorithm which employs a similarity digest in order to determine whether the hashes that represent two files have similarities. For instance, if a single byte of a file is modified, the ssDeep hashes of the original file and the modified file are considered highly similar.

What is ssDeep in Python?

This is a straightforward Python wrapper for ssdeep by Jesse Kornblum, which is a library for computing context triggered piecewise hashes (CTPH). Also called fuzzy hashes, CTPH can match inputs that have homologies.

What is fuzzy hashing?

Fuzzy hashing is a type of compression function for calculating the similarity between digital files. It attempts to automate the process of grouping similar malware. Fuzzy hash functions hold a certain tolerance for changes, and can tell how different two files are by comparing the similarity of their outputs.

What is ssdeep in Linux?

ssdeep is a tool for recursive computing and matching of Context Triggered Piecewise Hashing (aka Fuzzy Hashing). Fuzzy hashing is a method for comparing similar but not identical files.

What is ssdeep value?

SSDEEP creates a hash value that attempts to detect the level of similarity between two files at the binary level. This is different from a cryptographic hash (like SHA1) because a cryptographic hash can check exact matches (or non-matches).

What is the context triggered piecewise hashing method used for?

Context triggered piecewise hashing is a powerful new method for computer forensics. It will enable examiners to associate files that previously would have been lost in vast quantities of data that now make up an investigation.

What is Ssdeep hash?

Introduction. ssdeep is a program for computing context triggered piecewise hashes (CTPH). Also called fuzzy hashes, CTPH can match inputs that have homologies. Such inputs have sequences of identical bytes in the same order, although bytes in between these sequences may be different in both content and length.

What is malware hashing?

Hashing is a common method used to uniquely identify malware. The malicious software is run through a hashing program that produces a unique hash that identifies that malware (a sort of fingerprint).

What is djb2?

If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the “well known” functions such as PJW, K&R[1], etc. Also see tpop pp.

How many Ssdeep hashes are needed for clustering?

Furthermore, clustering (or grouping) based on ssDeep requires every ssDeep hash to be compared against every other hash. This means that if you are clustering 1,000 ssDeep hashes, 499,500 (the number of pairs among 1,000 elements) ssDeep comparison function calls are required.

How do you use Ssdeep?

ssDeep is useful when searching for similar files. For instance, two malware samples generated by the same builder which inserts configuration statically into a stub sample, may be easy to identify as having a high similarity. In the past, I have used ssDeep to preprocess a large number of samples.

What is Ssdeep hashing algorithm?

How do I optimize Ssdeep comparisons at scale?

My methodology for optimizing ssDeep comparisons at scale focuses on reducing the number of ssDeep hashes that need to be compared, which reduces the search space. This methodology avoids the need for a custom-developed library to conduct ssDeep comparisons.