Thursday, 26 March 2009

Canny Edge Detection Steps

As promised in my previous post and in response to a comment  , here is a step by step explanation of the Canny Edge Detector. I have some images from a previous post available here.

Firstly we need to apply a Gaussian Point Spread Function (PSF) to our image. This will make the image seem slightly blurry. Increasing the width of the filter will of course increase the blurriness of the image. A wider filter tends to cause the Canny Edge detector to detect longer smoother lines while a narrow filter allows it to find shorter lines.

I almost always use a Gaussian filter of 5x5 with a sigma of 1.4 which gives a matrix of:

2   4    5    4    2
4   9    12  9    4
5   12  15  12  5
4   9    12  9    4
2   4    5    4    2

I find it gives generally nice results and finds enough detail for my purposes. If you looking for more regular / longer features its worth trying a wider filter.

Don't forget to multiple your result by the reciprocal of the sum of the entries in your matrix. For this one its 1/159.  And remember to NOT write the result back to your source image while you are still busy calculating.

The Canny edge detector is quite sensitive to noise in an image so the above stage of making the image blurry does help in this regard.

Next step is to actual find some edges.  The sobel edge detector is probably the most commonly used one but you can use others.

The sobel matrix (kernel) for calculating an approximate to the derivative in the y direction is:

1      2     1
0      0     0
-1  -2   -1

after we convolve with the image (essentially mutiply with the source image and then add together and store the result) we call this Gy.  

Do the same in the x direction using the following kernel:

1     0     -1
2     0     -2
1     0     -1

(notice it is the same as the y kernel but is sensitive to changes in x direction) We call this Gy.

Basically we have estimated the rate of change of values in the underlying image (derivative) to get the edges which are areas where values change rapidly. This implies higher values in the result indicate regions of more rapid change.

So we now have a estimate of the first derivative for the x and y directions for each point, using this we can calculate the angle at the point using: 

theta = arctan (Gy/Gx)

And the gradient strength or magnitude (G) by:

G = sqrt ( Gx^2 + Gy^2)      ie the distance from 0 in both directions

Generally theta is rounded to the vertical, horizontal and diagonal directions but you dont need to do this if you are trying to make something more senstive to different angles.

Non maximal suppression can now be applied to the image to get rid of any additional "angle" noise. We do this by checking if the gradient magnitude is a local maximum in the gradient direction. Do this by passing a nxn window (typically 3x3) over the results from the previous stage.  This last step serves to make the edges more defined.

I hope this more in depth explanation helps explain the process a bit better. I have left out implementation / optimization as they are quite dependant on your underlying architecture.  A good article for optimizing a "blur" filter on a CUDA capable GPU is here.

Remember you can use any PSF (blur function) and any edge detector function. I have only shown the most common (Gaussian and Sobel) here.


  1. All the rest is easy but you gloss over the edge thinning problem. I tried doing an implementation of NMS based on descriptions I found on the net. Quite consistently I find that I end up either discarding too much information or not enough of it. Any pointers - or pseudo code, C code, or Delphi code, or VB code... - for NMS would be hugely appreciated.

  2. Non-Maximal Suppression is rather tricky as it is easy to discard too little information or too much.
    I can't go into too many details as it gets very close to what I do for work but I tend to base it on the document type (if known) or the "crispness" of the straight lines in the image. Luckily I only really deal with scanned text so my problem space is vastly reduced.

    I can recommend "Algorithms for Image Processing and Computer Vision" by J.R. Parker, although it is an older book and doesnt include a lot of the more recent developments it is a very good starting point and includes source code in C.

  3. can u pls tell me how will i run gaussian mask over the image.
    i mean do i have to run it in the same manner as derivative mask are run??

  4. Hi Ajay,

    In the simplist implementation you would take an element of your image, find the surrounding 24 points, apply the Gaussian equation to each point and sum. Divide by the factor you calculated for your mask and then save the value in another image. Repeat until done.

    Of course this is a very poor way of doing it as the Gaussian is a convolution and is therefore seperable. You will find sample Gaussian convolution code here:
    Here we are calculating the Gaussian for row elements at a time and when we have enough of them completed we compute the column.

    Hope this helps


  5. Hi Barrett
    Thanx for replying. I was thinking about 1st method which is inefficient as u told.
    The second method u suggested is better than previous.
    Can u pls explain as what are we exactly doing in the procedure??

  6. Hi Ajay,

    Sorry for delay in reply....

    I assume you have the CUDA kernel which does both x and y convolution at once. It is a rather tricky kernel to understand as it is quite optimized.

    The basic principle is to compute the x direction gaussian first using a 1D gaussian function (equation for it is here:

    once all the 1D gaussians have been computed in the x direction take their results and apply the 1D function again in the y direction. The difference is that you are now applying the function to the results obtained from the in the first step and not on the raw data.

    The result will be the same as if you had applied the 2D Gaussian to the data.

    The cuda kernel is a little different in that it starts computing the y direction as soon as enough data is available rather than waiting for the x direction to complete fully. This ensures a higher throughput with greater shared mem usage.


  7. hi barrett,
    i knew you will dont be sorry..
    i read ur reply.

    when we apply sobel operator. mask [1 2 1]' is used for x direction and then
    [1 0 -1] is applied to get final o/p as it would be obtained had we applied the whole mask at atime. you know what i mean.([1 2 1]' is tranpose of [1 2 1]).

    if u got my point in the above statement then pls tell me if i have to split the smoothing filter into x and y direction then what will be masks for respective directions.

    thanx Ajay

  8. Hi Ajay,

    I see what you mean now :)

    For the smoothing filter (Gaussian in this example) you can just apply the function in each direction.
    Once the result from both directions have been calculated I then multiply the Amplitude in and divide by the reciprocal of the sum of all the factors.
    You can do the Amplitude multiplication in each dir by then multiply by the sqrt of A. The same goes for the divide.
    If you look at the equation of the 2D gaussian and the 1D versions you will see why you need to handle the Amplitude differently.
    I just do the mul and divide at the end to save myself a few operations.


  9. hi barrett
    look at this URL
    see page no 7.i used the same concept ,of splitting 2D mask ,in the mask
    stated by you but did not get desired result.
    pls suggest something on this.

  10. Hi Ajay,

    On page 7 they are using a Difference of Gaussian - ie they compute different sized gaussian filters and then subtract them. Its not quite the same as discussed here. Differences of Gaussian are rather useful in smoothing out high freq noise. It may also even be used in our own neurons for visual processing. They are also rather useful for smoothing random noise out of image documents before ocr'ing although you may need to rebuild some letters afterwards (i's for example)

    On page 3 they are using a similar technique. Although they are using state machines it's essentially what I am doing in the cuda kernel. Note that they don't seem to state their A or sigma in computing the coefficients, that is why results may vary.
    They also only divide in the final stage, but the divisor varies from mine - probably due to differences in the co-efficients.


  11. hi barrett
    Gaussian blurs are separable into row and column operations.
    this is what the abstract of the URL says (give above).

    So what i was doing is that for horizontal direction i was taking
    as operative mask and then applied mask 5 12 15 12 5 to get o/p but it dint work.
    Can you pls tell what wrong iam doing???

  12. Hi
    I have completed steps in canny edge detection till non maximum suppression but as said non maximum suppression should make edge more defined but it is not the same in my case. The edges are not very clear. Can you pls tell me what is problem at my end??
    I followed this link to implement non maximum suppression.

    pls reply i need your help urgently.

  13. Hi Ajay,

    Without actually seeing the results of the individual steps of your Canny Edge I can only guess at possible problems.

    Is your Gaussian blur code correct? Ie dividing by the correct divisor for your mask - a rather easy mistake to make if you hand key your masks in.

    Have you tried changing your Gausian blur parameters? Ie changing the width and std deviation of your mask? The example I gave in the article works in a wide variety of cases but may not be perfect for you.

    It is quite easy to make an error in the non-maximum suppression code. The one mentioned in the wikipedia article should work, but there are other ways of doing it. Basically you are looking for a local maximum in the gradient direction you determined in the Canny step. The direction is the angle obtained by: theta = arctan (Gy/Gx). So if theta is 0 degrees see if the intensity of the pixels above and below are greater than or less than, if both less than we have a local maximum in the correct direction.

    Hope this helps you a bit.