The Gabor-jet model is a tool used to compute the psychophysical dissimilarity between images. This webpage is designed to serve as an interactive guided tour of the model. Interested in learning more? Jump ahead to the detailed discussion section below.

To test the Gabor-jet model yourself, please upload your own images below. To use our default images instead, please select "Use our defaults" at Step 1.

Please note that the prediction of psychophysical dissimilarity will be excellent if the stimuli vary only metrically. To the extent that there are qualitative, nonaccidental differences among the stimuli, e.g., one face has glasses and the other doesn’t, or one chair has a straight back and the other a curved back, then the psychophysical predictability would be reduced as the heightened sensitivity to these nonacciddental differences is, presumably, computed in later cortical areas than the early visual stages on which the Gabor-jet model is based.

Upload your images by clicking the "Choose File" buttons below. If you prefer, you can click the "Use our defaults" button below to use our default images instead. Once you finish uploading the images, a button will appear prompting you to grayscale the images-- we need a single value for each pixel. **Uploaded images with non-square dimensions will be resized to 256x256 pixels.**

A grid of red dots will appear overlaid on the grayscaled images. These dots denote the locations in the image space from which comparison values will later be extracted. Please press "Select Kernels" to continue to the next step.

In this implementation of the Gabor-jet model, 40 Gabor wavelet kernels (5 scales x 8 orientations) are generated for each location on the image (marked by red dots in the previous step). Because there are 100 locations, that gives us 4000 kernels per image! To reduce visual clutter in the next step of our demonstration, we ask that you select five of these kernels (referred to as kernels A-E) using the menu below. Don't worry, we'll still use all 4000 kernels for our calculations-- but we'll only show the values obtained using the five you select.

For each kernel, please specify the row and column, the orientation of the kernel, and the kernel's receptive field size (diameter in pixels). The fields are pre-populated with kernels clustered near the middle of the image at a variety of scales and orientations, but we encourage you to experiment with different combinations.

Below are the three uploaded images with the Gabor patches you selected superimposed over the images.

Next, we need to do the kernel convolution and extract values at the 100 red-dot grid locations. Please click the buttons in the order that they appear. With each button press, the corresponding magnitude for the five kernels selected above will appear in the bar graph below. When you finish, the compute button will appear, allowing you to generate the dissimilarity rankings!

The units along the ordinate access are arbitrary, and kernel magnitudes should be considered in relation to each other. For example, if two values for a given kernel are nearly identical, it is likely that the images had similar gradients at that position, scale and orientation.

Kernel | Image 1 | Image 2 | Image 3 |
---|---|---|---|

Kernel A | 0 | 0 | 0 |

Kernel B | 0 | 0 | 0 |

Kernel C | 0 | 0 | 0 |

Kernel D | 0 | 0 | 0 |

Kernel E | 0 | 0 | 0 |

All possible pairs of two images were generated from the three images provided. For each image, we calculated a vector with 4000 values, one for each kernel. The Euclidean distance between two vectors is taken as the perceptual distance between images. The most dissimilar pair is presented first, followed by pairs of decreasing dissimilarity. To start over with new images, please click the button provided or reload this page in your browser.

0

0

0

Imagine two familiar faces-- they could be friends, family or celebrities! Got it? Now think, how similar are those faces? Can you quantify that similarity?

One issue plaguing research on face perception has been the lack of a quantitative metric to describe the difference in perceived faces. Our implementation of the Gabor-jet model (Lades et al., 1993) allows us to compute a single value to describe how dissimilar two faces really are. In a recent study (Yue et al., 2012), the Image Understanding Lab found that these dissimilarity values explain over 92% of the variance in human responses on a face match-to-sample discrimination task.

The Gabor-jet model (Lades et al., 1993) is designed to capture the response properties of simple cells in V1 hypercolumns. The model computes a single value that represents the similarity of two images with respect to V1 simple and complex cell filtering. These values have been shown to almost perfectly predict psychophysical similarity in discriminating metrically varying complex visual stimuli such as faces, blobs (resembling teeth), and simple volumes (geons) (Yue et al., 2012; Amir et al., 2012). Under the assumption that V1 captures metric variation, the sensitivity to the “qualitative” differences of complex stimuli, such as NAPs vs. MPs, or differences in facial identity vs. expression (that are presumably rendered explicit in later stages) can be more rigorously evaluated. Here we introduce a new tool to conveniently explain and test this model in what is designed to be an engaging, interactive context. The tool can thus serve both methodological and didactic functions.

Gabor-like filters develop as part of the linear decomposition of natural images (Olshausen and Field, 1996) so the Gabor-like filtering characteristic of V1 simple cells is not unexpected. These basis sets emerge in the first layer of leading convolution neural networks for image recognition (e.g., Krizhevsky et al., 2012) or are simply assumed as in the GIST model of Oliva and Torralba (2001), that adopts the use of multi-scale, multi-orientation Gabor filters to create a sparse description of image locations in much the same way that each jet in the Gabor-jet model is composed of a set of Gabor filters at different scales and orientations which share a common center in the image space. Similarly, the first layer of HMAX (Riesenhuber and Poggio, 1999) convolves image pixels with oriented Gabor filters before pooling responses (and then repeating those operations). So although the Gabor-jet model was developed almost a quarter of a century ago, its offering of an explicit measure of V1-based image similarity is still relevant given the widespread incorporation of Gabor filtering as the input stage in contemporary neurocomputational models of vision.

Our implementation of the Gabor-jet model follows that of Lades et al. in which each Gabor “jet” is modeled as a set of Gabor wavelet kernels at 5 scales and 8 orientations, with the center of their receptive fields tuned to the same point in the visual field (Figure 1). We employ a 10x10 square grid—and therefore 100 jets—to mark the positions in the image space from which kernel convolution values are extracted.

Given kernels of sufficiently large scale and a grid of sufficient density, some kernels' receptive fields (RFs) will overlap, as shown to the right. An Image Understanding Lab study (Xu et al., 2014) demonstrated that these large RFs account for the face configural effect "in which a difference in a single part appears more distinct in the context of a face than it does by itself." Example image used with permission from Xu et al.

After performing a 2D Fourier transform on the image, we multiply the Fourier-transformed image pixel-wise with these Gabor wavelet kernels (also in the Fourier domain) before transforming the image back out of the frequency domain. Next, we extract the magnitude and phase of the resultant image at given positions in the visual field-- in this case, we use the 10x10 grid to mark the positions from which values are extracted. We concatenate these two values for each kernel in the jet, yielding a 100x80 matrix of values for each image, which can then be collapsed into a vector of 8000 values. We consider the Euclidean distance between any two output vectors as the perceptual distance between those two face images which produced the vectors, because the Euclidean distance has been shown to more closely reflect human behavior than other metrics, such as correlation (Yue et al., 2012). A MATLAB implementation of this model with increased functionality can be found here.

If you use this web app in your own work, please cite as Margalit, E., Biederman, I., Herald, S. B., Yue, X., & von der Malsburg, C. (2016). An applet for the Gabor scaling of the differences between complex stimuli. Attention, Perception, & Psychophysics. 78(8), 2298-2306. doi:10.3758/s13414-016-1191-7.