DFT, DCT, and DST

Understanding it fast:

  • very good slides (in German) about DFT by Klaus Toennies with some examples of image → (amplitude/magnitude, phase) spectrum

What is this magnitude/amplitude? What is this phase?


Briefly, the MAGNITUDE tells “how much” of a certain frequency component is present and the PHASE tells “where” the frequency component is in the image

How to do a DCT with OpenCV?

There is a DFT sample in the OpenCV samples folder (see dft.cpp), but unfortunately no DCT sample :(

The reference docu about OpenCV's dct() function tells you how to call the method, but there are no details how to prepare the input image matrix :(

Here is an important note about the matrix type needed for the dct() function (CV_64F) which helped me a lot.

It took me an hour, but now I figured out how to do the DCT using OpenCV.

There are two important points:

  • image dimensions have to be a multiple of 2
  • image input matrix that goes into dct() has to be of type float

Here is the code:

// Read image
Mat img = imread("stanzi_and_me_bahamas.png", CV_LOAD_IMAGE_GRAYSCALE);
 
// Make sure the both image dimensions are a multiple of 2
Mat img2;
int w = img.cols;
int h = img.rows;
int w2,h2;
if (w % 2 == 0)
    w2 = w;
else
    w2 = w+1;
if (h % 2 == 0)
    h2 = h;
else
    h2 = h+1;
copyMakeBorder(img, img2, 0, h2-h, 0, w2-w, IPL_BORDER_REPLICATE);
 
// Grayscale image is 8bits per pixel,
// but dct() method wants float values!
Mat img3 = Mat( img2.rows, img2.cols, CV_64F);
img2.convertTo(img3, CV_64F);
imwrite("W:\\tmp\\orig.png", img3);
 
// Let's do the DCT now: image => frequencies
Mat freq;
dct(img3, freq);
 
// Save a visualization of the DCT coefficients
imwrite("W:\\tmp\\dct.png", freq);
 
for (int Start=100; Start>0; Start-=1)
{
    // Set some frequencies to 0
    for (int y=Start; y<freq.rows; y++)
    {
        for (int x=Start; x<freq.cols; x++)
        {
            freq.at<double>(y,x) = 0.0;
        }
    }
 
    // Do inverse DCT: (some low) frequencies => image
    Mat dst;
    idct(freq, dst);
 
    // Show frame nr
    char txt[100];
    sprintf(txt, "%04d", Start);
    cv::putText( dst, txt, Point(10,20),
     CV_FONT_HERSHEY_SIMPLEX, 0.5, CV_RGB(0,0,0) );
 
    // Save visualization of reconstructed image where
    //we have thrown away some of the high frequencies
    char fname[500];
    sprintf(fname, "W:\\tmp\\idct_%04d.png", Start);
    imwrite(fname, dst);
}

The original image

will be transformed to a frequency matrix which looks visualized using gray-values like this:

When we iteratively take more and more higher frequencies as input for the idct() function, we can see how we can see more and more details:

The more high frequency cosine (gray value intensity) waves we superimpose the more details of the original image are represented in the inverse (idct()) DCT'ed (based on frequency representation reconstructed) image!

These images were generated with the above code which should show you how to use OpenCV's dct() function.

Note that using only 100×100 = 10000 frequencies values of the original 614×462 = 283668 pixel gray values, i.e. using only 10000/283668 = 0.03 = 3% of the original data size, we get a quite good reconstructed image :)

So let's save only these 100×100 low frequencies. That's compression!

How to do a DFT with OpenCV?

Read here!

Starting with the original dft.cpp OpenCV example, I made some modifications (computing magnitude spectrum for the image coming from a webcam, showing a zoom-into the center of the magnitude spectrum, …)

Here you can see the result:

You need to install a Flash Player to watch this video!

You can see at least 3 things:

  • as you rotate your object, the magnitude spectrum rotates as well
  • shrinking an object → magnitude spectrum seems to zoom (near image big structures become far small structures, i.e. low frequencies are “shifted” to high frequencies). Ok, more or less ;-)
  • the different geometric objects produce typical magnitude spectrum patterns

And that's the corresponding code of the demo:

cv::VideoCapture videoCapture; // in case a video media is used, its manager is declared here
videoCapture.open(1);
 
int counter = 0;
while (true)
{
    // Get image from camera
    Mat camimg;
    videoCapture >> camimg;
 
    // Convert that image to grayscale
    Mat img;
    cvtColor(camimg,img,CV_RGB2GRAY);
 
    // Prepare grayscal image <img> for DFT
    int M = getOptimalDFTSize( img.rows );
    int N = getOptimalDFTSize( img.cols );
    Mat padded;    
    copyMakeBorder(img, padded, 0, M - img.rows, 0, N - img.cols, BORDER_REPLICATE, Scalar::all(0));
 
    // Prepare matrix for DFT result: <complexImg>
    Mat planes[] = {Mat_<float>(padded), Mat::zeros(padded.size(), CV_32F)};
    Mat complexImg;
    merge(planes, 2, complexImg);
 
    // DFT
    dft(complexImg, complexImg);
 
    // Get magnitude spectrum from complex result matrix
    // Compute log(1 + sqrt(Re(DFT(img))**2 + Im(DFT(img))**2))
    split(complexImg, planes);
    magnitude(planes[0], planes[1], planes[0]);    
    Mat mag = planes[0];
    mag += Scalar::all(1);
    log(mag, mag);
 
    // Crop the spectrum, if it has an odd number of rows or columns
    mag = mag(Rect(0, 0, mag.cols & -2, mag.rows & -2));
 
    // Rearrange the quadrants of Fourier image
    // so that the origin is at the image center
    int cx = mag.cols/2;
    int cy = mag.rows/2;
    Mat tmp;
    Mat q0(mag, Rect(0, 0, cx, cy));
    Mat q1(mag, Rect(cx, 0, cx, cy));
    Mat q2(mag, Rect(0, cy, cx, cy));
    Mat q3(mag, Rect(cx, cy, cx, cy));    
 
    q0.copyTo(tmp);
    q3.copyTo(q0);
    tmp.copyTo(q3);
 
    q1.copyTo(tmp);
    q2.copyTo(q1);
    tmp.copyTo(q2);
 
    // Get some region about the center of the magnitude spectrum
    int r = 10;
    Mat roi(mag, Rect(mag.cols/2-r, mag.rows/2-r, 2*r,2*r));
    Mat MagPartly;
    roi.copyTo(MagPartly);
 
    // padded is an 8UC1 image with values in the range 0-255
    // Make mag a 8UC1 image with values in that range as well
    // so that we can copy both images to a bigger one
    normalize(mag, mag, 0, 255, CV_MINMAX);
    mag.convertTo(mag, CV_8UC1);
 
    // Separately normalize mag and MagPartly
    normalize(MagPartly, MagPartly, 0, 255, CV_MINMAX);
    MagPartly.convertTo(MagPartly, CV_8UC1);
 
    // Zoom MagPartly
    Mat MagZoomed;
    resize(MagPartly, MagZoomed, Size(padded.cols,padded.rows));
 
    // Prepare one big output image with:
    // | grayscale image |        magnitude spectrum image |
    // |    empty        | zoomed magnitude spectrum image |
    int w = padded.cols * 2;
    int h = padded.rows * 2;
    Mat bigimg(h,w, CV_8UC1);
    Mat roi1(bigimg, Rect(0,0, padded.cols, padded.rows));
    padded.copyTo(roi1);
    Mat roi2(bigimg, Rect(padded.cols,0, padded.cols, padded.rows));
    mag.copyTo(roi2);
    Mat roi3(bigimg, Rect(padded.cols,padded.rows, padded.cols, padded.rows));
    MagZoomed.copyTo(roi3);
 
    // Show images    
    imshow("big img", bigimg);
 
    // Save images
    char fname[500];
    sprintf_s(fname, "W:\\tmp\\dft%05d.png", counter);
    imwrite(fname, bigimg);
    counter++;
 
    waitKey(1);
}

And here is the corresponding code for a DCT demo:

cv::VideoCapture videoCapture; // in case a video media is used, its manager is declared here
videoCapture.open(1);
 
int counter = 0;
while (true)
{
    // Get image from camera
    Mat camimg;
    videoCapture >> camimg;
 
    // Convert that image to grayscale
    Mat img;
    cvtColor(camimg,img,CV_RGB2GRAY);
 
    // Make sure the both image dimensions are a multiple of 2
    Mat img2;
    int w = img.cols;
    int h = img.rows;
    int w2,h2;
    if (w % 2 == 0)
        w2 = w;
    else
        w2 = w+1;
    if (h % 2 == 0)
        h2 = h;
    else
        h2 = h+1;
    copyMakeBorder(img, img2, 0, h2-h, 0, w2-w, IPL_BORDER_REPLICATE);
 
    // Grayscale image is 8bits per pixel,
    // but dct() method wants float values!
    Mat img3 = Mat( img2.rows, img2.cols, CV_64F);
    img2.convertTo(img3, CV_64F);
 
    // Let's do the DCT now: image => frequencies
    Mat freq;
    dct(img3, freq);
    freq += Scalar::all(1);
    log(freq, freq);
 
    // Get some region starting at the origin of the DCT spectrum
    int r = 25;
    Mat roi(freq, Rect(0,0,2*r,2*r));
    Mat FreqPartly;
    roi.copyTo(FreqPartly);
 
    normalize(freq, freq, 0, 255, CV_MINMAX);
    freq.convertTo(freq, CV_8UC1);
 
    normalize(FreqPartly, FreqPartly, 0, 255, CV_MINMAX);
    FreqPartly.convertTo(FreqPartly, CV_8UC1);
 
    // Zoom FreqPartly
    Mat FreqZoomed;
    resize(FreqPartly, FreqZoomed, Size(img2.cols,img2.rows));
 
    // Prepare one big output image
    int width_bigimg  = img2.cols * 2;
    int height_bigimg = img2.rows * 2;
    Mat bigimg(height_bigimg,width_bigimg, CV_8UC1);
    Mat roi1(bigimg, Rect(0,0, img2.cols, img2.rows));
    img2.copyTo(roi1);
    Mat roi2(bigimg, Rect(img2.cols,0, img2.cols, img2.rows));
    freq.copyTo(roi2);
    Mat roi3(bigimg, Rect(img2.cols,img2.rows, img2.cols, img2.rows));
    FreqZoomed.copyTo(roi3);
 
    // Show images    
    imshow("big img", bigimg);
 
    // Save images
    char fname[500];
    sprintf_s(fname, "W:\\tmp\\dct%05d.png", counter);
    imwrite(fname, bigimg);
    counter++;
 
    waitKey(1);
}

What is the difference between DFT & DCT?

Here I found some good answers regarding this question:


The Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT) perform similar functions: they both decompose a finite-length discrete-time vector into a sum of scaled-and-shifted basis functions. The difference between the two is the type of basis function used by each transform; the DFT uses a set of harmonically-related complex exponential functions, while the DCT uses only (real-valued) cosine functions.

The DFT is widely used for general spectral analysis applications that find their way into a range of fields. It is also used as a building block for techniques that take advantage of properties of signals' frequency-domain representation, such as the overlap-save and overlap-add fast convolution algorithms.

The DCT is frequently used in lossy data compression applications, such as the JPEG image format. The property of the DCT that makes it quite suitable for compression is its high degree of “spectral compaction;” at a qualitative level, a signal's DCT representation tends to have more of its energy concentrated in a small number of coefficients when compared to other transforms like the DFT. This is desirable for a compression algorithm; if you can approximately represent the original (time- or spatial-domain) signal using a relatively small set of DCT coefficients, then you can reduce your data storage requirement by only storing the DCT outputs that contain significant amounts of energy.


In particular, it is well known that any discontinuities in a function reduce the rate of convergence of the Fourier series…the smoother the function is, the fewer terms in its DFT or DCT are required to represent it accurately, and the more it can be compressed… However, the implicit periodicity of the DFT means that discontinuities usually occur at the boundaries… In contrast, a DCT where both boundaries are even always yields a continuous extension at the boundaries. This is why DCTs…generally perform better for signal compression than DFTs and DSTs. In practice, a type-II DCT is usually preferred for such applications, in part for reasons of computational convenience.


Cosine transforms are nothing more than shortcuts for computing the Fourier transform of a sequence with special symmetry (e.g. if the sequence represents samples from an even function).


The difference between a Discrete Fourier Transform and a Discrete Cosine transformation is that the DCT uses only real numbers, while a Fourier transform can use complex numbers. The most common use of a DCT is compression. It is equivalent to a FFT of twice the length.


From Wikipedia:

In particular, a DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some variants the input and/or output data are shifted by half a sample. There are eight standard DCT variants, of which four are common.

Differences:

  • DFT: gives you complex numbers
  • DCT: gives you only real numbers
  • DCT: better for compression applications than DFT
 
public/dft_dct_dst.txt · Last modified: 2012/02/22 01:08 (external edit) · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki