Lena | Bright Lena |
---|---|

I made some research on image similarity comparison these days. Finally, I found a fast and simple method, namely dHash.

## Introduction

There are basicly three approaches to get similarities between two images:

- Keypoint matching, SIFT/SURF
- Histogram, compare color histogram
- Image hash, like aHash/dHash/pHash

### Keypoint matching (Feature detection)

Compute the abstraction of the image information and make a local decision at every image point to see if there is an image feature of the given type existing in that point. Compare the detected feature between two images.

These methods can match images under different scales, rotations, and lighting.

One downside is the running time of a naive implementation: O(n^2m), where n is the number of keypoints in each image, and m is the number of images in the database.

### Histogram comparison

One of the simplest & fastest methods. The idea is that a forest will have a lot of green, and the sea will have a lot of blue. So, if you compare two pictures with forests, you’ll get some simmilarity between histograms, because you have a lot of green in both.

It is too simplistic. A banana and a beach will look the same, as both are yellow.

### Image hash

The main idea is that each image is reduced down to a small hash code or ‘fingerprint’ by identifying salient features in the original image file and hashing a compact representation of those features (rather than hashing the image data directly). Finally, count the number of bit positions that are different (Hamming distance).

Origin | Reduce size | Reduce color | Hash |
---|---|---|---|

00011000010101001100…1010100110001000000000 | |||

00011000010100001100…1010100110001000000000 |

## aHash

Average hash, for each of the pixels output 1 if the pixel is bigger or equal to the average and 0 otherwise.

### Reduce size

The fastest way to remove high frequencies and detail is to shrink the image. In this case, shrink it to 8x8 so that there are 64 total pixels. Don’t bother keeping the aspect ratio, just crush it down to fit an 8x8 square. This way, the hash will match any variation of the image, regardless of scale or aspect ratio.

### Reduce color

The tiny 8x8 picture is converted to a grayscale. This changes the hash from 64 pixels (64 red, 64 green, and 64 blue) to 64 total colors.

Average the colors. Compute the mean value of the 64 colors.

### Compute the bits

Each bit is simply set based on whether the color value is above or below the mean.

### Construct the hash

Set the 64 bits into a 64-bit integer. The order does not matter, just as long as you are consistent.

## pHash

Perceptive hash, does the same as aHash, but first it does a Discrete Cosine Transformation and works in the frequency domain.

### Reduce size

Like Average Hash, pHash starts with a small image. However, the image is larger than 8x8; 32x32 is a good size.

This is really done to simplify the DCT computation and not because it is needed to reduce the high frequencies.

### Reduce color

The image is reduced to a grayscale just to further simplify the number of computations.

Result is a matrix ( 32 * 32 ), and looks like this:

$$

\left[

\begin{matrix}

88 & 85 & \cdots & 72 \\

87 & 87 & \cdots & 114 \\

\vdots & \vdots & \ddots & \vdots \\

12 & 76 & \cdots & 21 \\

\end{matrix}

\right]

$$

### Compute the DCT

The DCT separates the image into a collection of frequencies and scalars.

Result is a matrix ( 32 * 32 ), and looks like this:

$$

\left[

\begin{matrix}

8365 & -1554 & \cdots & -17 \\

-254 & 338 & \cdots & 13 \\

\vdots & \vdots & \ddots & \vdots \\

-424 & 74 & \cdots & 1.4 \\

\end{matrix}

\right]

$$

### Reduce the DCT

This is the magic step. While the DCT is 32x32, just keep the top-left 8x8. Those represent the lowest frequencies in the picture.

### Compute the average value

Like the Average Hash, compute the mean DCT value (using only the 8x8 DCT low-frequency values and excluding the first term since the DC coefficient can be significantly different from the other values and will throw off the average).

The dct hash is based on the low 2D DCT coefficients starting at the second from lowest, leaving out the first DC term. This excludes completely flat image information (i.e. solid colors) from being included in the hash description.

### Further reduce the DCT

This is the magic step. Set the 64 hash bits to 0 or 1 depending on whether each of the 64 DCT values is above or below the average value. The result doesn’t tell us the actual low frequencies; it just tells us the very-rough relative scale of the frequencies to the mean. The result will not vary as long as the overall structure of the image remains the same; this can survive gamma and color histogram adjustments without a problem.

### Construct the hash

Set the 64 bits into a 64-bit integer. The order does not matter, just as long as you are consistent. To see what this fingerprint looks like, simply set the values (this uses +255 and -255 based on whether the bits are 1 or 0) and convert from the 32x32 DCT (with zeros for the high frequencies) back into the 32x32 image: = 8a0303769b3ec8cd

At first glance, this might look like some random blobs… but look closer. There is a dark ring around her head and the dark horizontal line in the background (right side of the picture) appears as a dark spot.

1 | 0001100001010100110001001100111010110100001010100110001000000000 |

## dHash

Gradient hash, calculate the difference for each of the pixel and compares the difference with the average differences.

As speed goes, dHash is as fast as aHash and very few false positives.

### Reduce size

The fastest way to remove high frequencies and detail is to shrink the image. In this case, shrink it to 9x8 so that there are 72 total pixels.

### Reduce color

Convert the image to a grayscale picture. This changes the hash from 72 pixels to a total of 72 colors. (For optimal performance, either reduce color before scaling or perform the scaling and color reduction at the same time.)

### Compute the difference

The dHash algorithm works on the difference between adjacent pixels. This identifies the relative gradient direction. In this case, the 9 pixels per row yields 8 differences between adjacent pixels. Eight rows of eight differences becomes 64 bits.

### Assign bits

Each bit is simply set based on whether the left pixel is brighter than the right pixel. The order does not matter, just as long as you are consistent.