@ George Madeley @ Electrical and Electronic Engineering @ 9/24/23
These are the notes that I, George Madeley, took whilst studying EE40054: Digital Image Processing during my final year studying my MEng Computer Systems Engineering at the University of Bath.
Section 1: Digital Image Processing
Complex biological system that enables human interpretation of electromagnetic radiation in the visible spectrum. 10^6^ years old! Why do we need to know about it?
-
If the output of a DIP system is to be viewed by a human.
-
For developing DIP systems (biologically inspired image processing)
-
Rods -- approx. 100 million. Long and thin. Black and white/scotopic vision. Share neural connections.
-
Cones -- approx. 6.5 million. Short and fat. Colour/photopic vision. Three types with peak response to green, yellow-green, and blue wavelengths. Single neural connection.
Found experimentally that there are three types of cone with different absorption characteristics: α, β, any γ ≈ Bue, Green, and Red respectively, provides a basis for tri-chromatic theory of vision.
Image formation acts like a pinhole camera with point C being the optical center. At this point, the image is flipped on the back of our eyes. The image on the back of the eye is measured to be 2.55mm.
The human can respond to a high range of light; up to 10^10^ range of intensity. The Scotopic threshold means there is nothing we can see, pitch black. The Glare limit is the max brightness we can see; all white. The scotopic line adjust depending on the brightness of the room. You can see this when you enter a dark room, and your eyes slowly adjust to the low-level brightness.
Cones are less sensitive to light compared to rods. As a result, you low light level vision is not colour vision.
The Mach Band Effect is the effect that colours appear lighter or darker depending on the colours adjacent to them.
But why? Light enters the neural signal that gets transformed by a log then multiplied by a weight and summed together. It is the weighted sum of the spatially adjacent rods. The weighting factors give us the impulse response of the eye (rods). See graph below. This is known as the Monochrome Visual Model.
It is a different system for cones, however. Each of our cones has its own neuron which is why our vision is sharpest with lots of colours.
-
Image acquisition -- something sensitive to electromagnetic radiation (LCD array, IR, SAR, US)
-
Display -- monitor, printer
-
Processor -- general purpose PC
-
Storage -- short term (framebuffer, RAM), medium term (Disc), long term (magtape, CD, DVD, cloud)
-
Communications
Interface that converts (analogue) camera signal into set of digital numbers. This involves DC restoration. The look-up table will perform a predefined conversion.
Pinhole model of perspective camera
Affine camera model and the perspective projection. Need to relate real world coordinates to image coordinates via consistent linear geometry. From figure:
- Nonlinear in
$z_{p}$ and$f$ .
Where
With the previous image geometry, the world co-ordinates are fixed to the image co-ordinates. In addition, translation and perspective are different mathematical functions. Homogeneous coordinates provide a linear and general coordinate system where image transformations are expressed as simple matrix operations.
Gives:
Clockwise rotation about z axis by angle ϴ, use rotation vector Rz.
Image
Two components:
-
Illumination
$i(x,y)\ \ 0 < i < \infty$ -
Reflectance
$r(x,y)\ \ 0 < i < \infty$
Discrete values of
After sampling and quantisation image looks like...
Sampling and quantization are two processes that convert a continuous image into a digital image. Sampling digitizes the position of each pixel, while quantization digitizes the intensity or colour of each pixel. Sampling and quantization affect the resolution, contrast, quality, size, and processing time of the digital image.
The effects of sampling and quantization are as follows:
Sampling determines the spatial resolution of the digital image, which is the number of pixels per unit area. The higher the sampling rate, the more details are captured in the image, but the larger the image file and the longer the processing time.
Quantization determines the number of grey levels or colours in the digital image, which affects the contrast and quality of the image. The higher the quantization level, the more shades or hues are available in the image, but the larger the image file and the longer the processing time.
Sampling and quantization can also affect the image compression and enhancement techniques, which are methods to reduce the size or improve the quality of the image. For example, lower sampling and quantization rates can lead to more lossy compression and less effective enhancement.
To improve subjective appeal of image (for some application). Two methods: point operators and group operators.
Negating Images
The x-plane are the input grey levels, and the y-plane is the output grey levels. This is just a mapping. It gives a binary output based on the levels. This is great if some colours have a certain greyscale level that requires highlighting e.g., blood in a scan may require highlighting whilst leaving all nonblood the same value to know where the relative position of the blood.
You can split an 8-bit image up based on the position of the bit. We can then create 8 binary images, one for the image created from all the bits in the MSB, then the next and so until we get to the LSB.
Histogram modification -- most common point operator. How to make a histogram? You just count up the number of pixels of each greyscale value.
-
M is the number of columns
-
N is the number of rows
R is a random variable representing grey levels in image.
Histogram normalisation also known as contrast stretching. We do this if the maximum greyscale value in the image is low preventing us from seeing detail. Histogram normalization is a technique that changes the pixel intensity range in an image to a desired range. This can improve the contrast and quality of the image by making the histogram more even. Histogram normalization preserves the original brightness and appearance of the image.
-
O is the old image with min referring to the minimum greyscale value and max referring to the maximum greyscale value
-
N is the new image with min referring to the minimum greyscale value and max referring to the maximum greyscale value
-
R is the original highest greyscale value in the image
Histogram equalization is a technique that enhances the contrast of an image by redistributing the pixel intensities more uniformly. It uses a transformation function that depends on the histogram of the image, which is a graph that shows the frequency of each intensity level. Histogram equalization can improve the visual quality of low contrast images, but it may also create unrealistic effects. Produces a uniform pdf, whatever the original distribution => contrast enhancement.
Achieved by mapping
- W is a dummy variable
For discrete values:
-
M is the number of columns
-
N is the number of rows
This maps each pixel with grey level
e.g.,
Use local neighbourhood, as defined by a template centred on a point of
interest which is convolved with image. Always use an odd sized
convolution matrix such as
What about edges? -See MATLAB's "padarray" function.
There are three types of filters:
-
Linear -- used for smoothing and edge detection
-
Nonlinear -- used for rank/order statistics and adaptive linear
-
Filter Masks
A linear filter is a technique that applies a linear transformation to the pixel values of an image. A linear transformation is one that satisfies the superposition principle, which means that the output of the filter is the weighted sum of the inputs. A linear filter can be represented by a matrix, an impulse response, or a convolution operation. A linear filter can be used for smoothing, blurring, sharpening, or enhancing an image
As seen from the example above, the convolution matrix is an averaging filter which blurs the image. There are other filter matrices such as the Gaussian lowpass filter which has been considered optimal for image smoothing. Based on the Gaussian function:
For σ = 1, mask is:
The general formulation:
Linear Filters can smooth but all blurs. Have poor impulse noise performance.
A sharpening filter is a technique that enhances the edges and details of an image by boosting the high-frequency components of the image. A sharpening filter can be applied by using a linear or non-linear transformation function to the pixel values of the image. A common example of a sharpening filter is the Laplacian filter, which is a second-order derivative operator that detects the changes in intensity along both horizontal and vertical directions. Sharpening filters can improve the visual quality and contrast of an image, but they may also amplify the noise and create unrealistic effects.
Linear filters can be applied to images in the frequency domain. This is computational more efficient than applying the filters in the spatial domain.
The Fourier Transform.
Images consist of real numbers, whose FT is generally complex.
If f(0), f(1), ... f(N-2) are a sequence of N uniformly spaced samples of a continuous function:
What about in the Discrete Fourier Transform function in the 2-dimensional plane?
- It is separable
The shift theorem for 2D discrete Fourier transform is a property that relates the effect of shifting a 2D signal in the spatial domain to the effect of multiplying the corresponding 2D spectrum by a complex exponential in the frequency domain. The shift theorem can be used to analyse the phase and magnitude of the 2D spectrum and to perform operations such as circular convolution, filtering, and translation.
If:
Then:
Special case:
Fourier Transform requires N^2^ multiplications and N -- 1 additions.
Fast Fourier Transform reduces this to Nlog2N.
N Direct FT FFT Comp. Advantage
2 4 2 2
4 16 8 2
16 256 64 4
4096 16777k 2048 341.33
Adaptive linear filters are techniques that use a linear filter with variable parameters to enhance or restore an image by removing noise without blurring the structures in the image. Adaptive linear filters adjust their parameters according to an optimization algorithm that minimizes the error between the filter output and the desired signal.
Use local image statistics to control aggressiveness of filter. E.g., unsharp masking filter.
-
x -- original value,
-
x ̅ - mean in window,
-
x ̂ - output value,
-
σ -- standard deviation in window
-
SNR = x ̅/σ
k varies from 0 to 1. Constant (not adaptive). Controlled statistics within window, e.g., SNR.
Nonlinear filters are techniques that modify or enhance an image in a nonlinear way, meaning that the output is not a linear function of the input. Nonlinear filters can perform complex tasks such as noise removal, edge enhancement, signal restoration, or shape processing. Nonlinear filters do not follow the superposition principle, and they can change the structure and appearance of the image in more flexible ways than linear filters. Some examples of nonlinear filters are median filter, bilateral filter, Volterra filter, morphological filter, and anisotropic diffusion.
Order statistic filters are nonlinear filters that use the rank order information of a set of pixels to process an image. They are based on order statistics, which are mathematical tools derived from robust estimation theory. Order statistic filters have excellent robustness properties in the presence of impulsive or signal-dependent noise, and they tend to preserve the edges and fine details of an image better than linear filters. General formulation (linear combination of ordered values i.e., weight depends on position in ranked list)
Trimmed mean filter. Remove n lowest and n highest values. Find mean of remainder.
A median filter is a non-linear digital filtering technique that removes noise from an image or signal by replacing each pixel value with the median of its neighbouring pixel values. A median filter preserves the edges and fine details of an image better than a linear filter, and it is more robust to impulsive or signal-dependent noise
-
particularly good at removing impulse noise,
-
reduces variance in image,
-
changes things smaller than window,
-
preserves edge location and shape,
-
no new grey levels generated,
-
changes mean intensity of image,
-
deterministic properties determined by root signal,
-
statistical properties by o/p pdf and breakdown probability
A weighted median filter is a nonlinear filter that assigns different weights to the neighbouring pixels of a pixel before computing the median value. A weighted median filter can be seen as a generalization of the median filter, where each pixel has the same weight. A weighted median filter can be used to remove noise, smooth edges, or enhance details in an image. A weighted median filter can be designed by using different criteria, such as minimizing the mean absolute error, maximizing the signal-to-noise ratio, or preserving the image structure. A weighted median filter can also be adaptive, meaning that the weights can change according to the local characteristics of the image.
Centre weighted median filter has w > 1 at centre and w = 1 elsewhere.
- Weight depends on position in window.
An adaptive weighted median filter is a nonlinear filter that combines the advantages of the median filter and the weighted mean filter to remove impulse noise from an image. An adaptive weighted median filter assigns different weights to the neighbouring pixels of a pixel according to an optimization criterion, and then computes the median value of the weighted pixels. An adaptive weighted median filter can adapt to the local characteristics of the image, such as the noise level, the edge strength, and the texture complexity. An adaptive weighted median filter can preserve the edges and details of the image better than a median filter or a weighted mean filter, and it can also remove the residual noise that may remain after applying a median filter.
Weight depends on image statistics and position in mask (d).
-
$c$ - constant, -
$d$ -- distance from centre of mask, -
$w_{i,j}$ -- weight at position (i, j), -
$W_{K + 1,K + 1}$ -- central weight
Example using 5 × 5 mask.
When window is:
Weights are:
A truncated median filter is a nonlinear filter that modifies the median filter by discarding some of the extreme values in the neighbourhood of a pixel before computing the median. A truncated median filter can reduce the blurring effect of the median filter and preserve more details in the image. A truncated median filter can also remove impulse noise and smooth other types of noise. A truncated median filter can be implemented by using a threshold parameter that determines how many values are truncated from each end of the sorted neighbourhood. A truncated median filter is also known as a hybrid median filter or a sigma filter.
Approximates the mode by manipulation of local intensity histogram.
Filter action also "crispens" edges.
It is really just a type of filter. Non-linear linked with median and stack filters. Works by changing shapes of objects. Can be used to design idempotent filters.
Erosion is a morphological operation that reduces the size and removes the boundary pixels of objects in an image. Erosion is usually performed on binary images, where the pixels are either 0 (black) or 1 (white). Erosion uses a small shape called a structuring element to probe the image and remove the pixels that do not fit the shape. Erosion can be used to eliminate noise, separate objects, or simplify shapes in an image.
Dilation is a morphological operation that enlarges the size and adds pixels to the boundaries of objects in an image. Dilation is usually performed on binary images, where the pixels are either 0 (black) or 1 (white). Dilation uses a small shape called a structuring element to probe the image and add pixels that fit the shape. Dilation can be used to fill holes, connect gaps, or enhance features in an image. Dilation is the opposite of erosion, which removes pixels from the boundaries of objects.
Opening is a morphological operation that reduces the size and removes the boundary pixels of objects in an image. Opening is usually performed on binary images, where the pixels are either 0 (black) or 1 (white). Opening uses a small shape called a structuring element to probe the image and remove the pixels that do not fit the shape. Opening can be used to eliminate noise, separate objects, or simplify shapes in an image. Opening is the opposite of closing, which adds pixels to the boundaries of objects.
Closing is a process that involves first applying dilation and then erosion on an image using the same structuring element. The purpose of closing is to smooth the contour of the distorted image and fuse back the narrow breaks and long thin gulfs. Closing is also used for getting rid of the small holes of the obtained image.
Take the following image of this shape:
Applying erosion then dilation (opening) results in the following:
Applying dilation then erosion (closing) results in the following:
Greyscale morphology is a technique used to process and analyse the structures within greyscale images. The image below involves the transformation of the original greyscale image (left) through certain operations that highlight specific shapes or structures, resulting in a modified image (right). The graph above the images represents the intensity profile along a line segment of the original image, highlighting variations in pixel values which correspond to different structural elements within the image.
All points above umbra stay at 0. All points below umbra stay at 1.
Opening/closing by reconstruction is a technique that involves applying a morphological opening or closing operation followed by a reconstruction operation. The reconstruction operation restores the original shape and size of the objects that are not completely removed by the opening or closing operation. The purpose of opening/closing by reconstruction is to smooth the contour of the image and eliminate small details without affecting the larger structures. Morphological reconstruction processes the marker image based on characteristics of (the mask) image.
-
High points, or peaks, in marker image specify where processing begins,
-
Processing continues until the image values op changing.
Example:
-
Image pre-processing,
-
Enhancing object structure,
-
Quantitative object description,
-
Image noise reduction
Skeletonization is a process that reduces foreground regions in a binary image to a skeletal remnant that largely preserves the extent and connectivity of the original region while throwing away most of the original foreground pixels. Skeletons are important shape descriptors in object representation and recognition. A skeleton that captures essential topology and shape information of the object in a simple form is extremely useful in solving various problems such as character recognition, 3D model matching and retrieval, and medical image analysis. Skeletonization can be performed by means of morphological operators, such as erosion, dilation, opening, and closing.
Subject to:
-
Not removing end points,
-
Not breaking connectivity,
-
Not causing excessive erosion.
Greyscale sieve is a technique that simplifies greyscale images by scale. It involves finding the extrema (local maxima or minima) of the pixel values and removing them according to a predefined criterion, such as size, shape, or contrast. The result is a hierarchical representation of the image that preserves the scale-space causality property, meaning that larger structures are not affected by smaller details. Greyscale sieve is based on mathematical morphology and graph theory and can be used for various applications such as counting objects, detecting edges, or enhancing features.
Extrema merged with closest neighbour. Known as are opening and closing (AOC). Filter structures:
- An alternating filter (AF) is a type of morphological filter that applies a sequence of opening and closing operations to a grayscale image, using different structuring elements or sizes. An AF can be used to smooth the image, enhance the contrast, and remove noise or small details, while preserving the shape and size of the main objects:
An alternating sequential filter (ASF) is a type of morphological filter that applies a sequence of opening and closing operations to a grayscale image, using different structuring elements or sizes. An ASF can be used to smooth the image, enhance the contrast, and remove noise or small details, while preserving the shape and size of the main objects:
An area opening (resp. closing) removes all connected light (resp. dark)
structures of size
Respectively, where
Area opening to an area equal to 2 with 4 connectivity there are 4 structing elements.
Used a connected graph representation of the image
Vincent's queue algorithm.
-
Area Openings
-
Find local maxima,
-
Replace by maximum of connected neighbours
-
-
Area Closing,
-
Find local minima,
-
Replace by minimum of connected neighbours.
-
Greyscale sieve algorithm:
-
Identify all extremal regions,
-
Merge scale 1 extrema with neighbouring regions with closest intensity,
-
Repeat the previous step with increasing scale.
An edge is the boundary between two regions with relatively distinct grey-level properties. It is also a sharp intensity transition between neighbouring pixels. It is important because information can be concentrated at edges and its used in computer vision for feature extraction, segmentation etc. There can be step edges, ramp edges, and ridge edges.
An edge is a property attached to an individual pixel and is calculated from the image behaviours in a neighbourhood of pixel.
Vector variable with two components:
-
Magnitude (M),
-
Direction (
$\theta$ )
Basic idea: to compute a local derivative operator
As images are discrete, derivatives must be approximated by differences. Simplest form:
Combining:
Alternative form is Centred Difference:
Prewitt templates are a set of 3x3 kernels that are used in image processing to approximate the gradient of the image intensity function. They are based on convolving the image with two filters, one for horizontal changes and one for vertical changes. The result of applying the Prewitt templates is either the corresponding gradient vector or the norm of this vector at each point in the image. The Prewitt templates are used for edge detection, as they can identify the regions where the image brightness changes sharply.
$$M_{x} = \begin{bmatrix}
- 1 & 0 & 1 \
- 1 & 0 & 1 \
- 1 & 0 & 1 \end{bmatrix},\ \ \ M_{y} = \begin{bmatrix} 1 & 1 & 1 \ 0 & 0 & 0 \
- 1 & - 1 & - 1 \end{bmatrix}$$
These two responses can be combined to give edge magnitude
$$M_{x} = \begin{bmatrix}
- 1 & 0 & 1 \
- 2 & 0 & 2 \
- 1 & 0 & 1 \end{bmatrix},\ \ \ M_{y} = \begin{bmatrix} 1 & 2 & 1 \ 0 & 0 & 0 \
- 1 & - 2 & - 1 \end{bmatrix}$$
$$M_{1} = \begin{bmatrix} 1 & 0 \ 0 & - 1 \end{bmatrix},\ \ \ M_{2} = \begin{bmatrix} 0 & 1 \
- 1 & 0 \end{bmatrix}$$
Binary edge maps are obtained by thresholding
Morphological gradient is a technique that involves finding the difference between the dilation and the erosion of a given image. It is an image where each pixel value (typically non-negative) indicates the contrast intensity in the close neighbourhood of that pixel. It is useful for edge detection and segmentation applications. Defined by the difference between a dilation and an erosion.
Sensitive to noise, a problem addressed by the Median Centred Difference edge detector.
The Laplacian operator is a second-order differential operator in n-dimensional Euclidean space, denoted as ∇². It is the divergence of the gradient of a function. In the context of image processing, this operator is applied to intensity functions of an image, which can be thought of as a two-dimensional signal with intensity values at each pixel. The Laplacian operator highlights regions of rapid intensity change and is therefore often used for edge detection.
Seldon used in practice as:
-
Overly sensitive to noise,
-
Produces a double response to a single edge,
-
Cannot give edge detection.
The Laplacian of a Gaussian (LoG) is a technique that combines the Laplacian operator and the Gaussian filter to enhance the edges in an image. The LoG operation can be performed in two ways: either by applying the Gaussian filter first and then the Laplacian operator, or by convolving the image with a single kernel that approximates the LoG function. The LoG operation produces a zero-crossing image, where the edges are located at the zero-crossing points of the filtered image.
Assuming:
Therefore:
To normalize the Laplacian of a Gaussian (LoG) operator , we need to
multiply the LoG output by the scale parameter sigma squared. This is
done to make the LoG operator invariant to scales, meaning that it can
detect edges or features at different levels of detail. The reason for
multiplying by sigma squared is that it cancels out the scaling factor
of the Gaussian smoothing filter that is applied before the Laplacian
operator. This way, the LoG output is proportional to the second
derivative of the image intensity function, which measures the rate of
change of the gradient.
Binary edge map obtained by zero crossing detection
If the sum in the four quadrants has the same sign then no zero crossing has occurred at the centre pixel. If there is a sing change then the centre pixel is an edge point.
Threshold used to determine magnitude of zero crossing required to be edge point.
Canny edge detector is a technique that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Developed from first principles and is optimal for step edges corrupted by white noise. Aims to address the difficulty of choosing a single optimal threshold value that can detect all the true edges and eliminate all the false edges in an image. A single threshold value may be too high or too low for different regions of the image, resulting in missed edges or noisy edges. Optimality relates to three criteria:
-
Good detection -- important edges should not be missed, and that there should e no spurious responses.
-
Good localisation -- distance between the actual and located position of the edge should be minimal.
-
Single response -- minimizes multiple responses to a single edge.
In practice, the optimal detector can be approximated by the first derivative of a Gaussian, giving the first two steps of Canny's edge detector.
-
Step 1 -- Convolve Gaussian mask with image,
-
Step 2 -- Differentiate using first derivative edge detector.
-
Steo 3 -- Non-maximal suppression
-
Edges give rise to ridges in the gradient magnitude image;.
-
A thin edge can be produced by setting all pixels not on ridge top to zero.
-
How can this be achieved in practice?
-
Step 4 -- Thresholding with hysteresis
-
Reduces streaking (the breaking up on an edge contour caused by the operator output fluctuating above and below the threshold).
-
Two thresholds used,
$T1$ and$T2$ , with$T2 < T1$
-
Many image processing techniques work on an individual pixel level. Major problem is computation of scale.
The initial stage of image analysis. Its goal is to split the image into parts that match the objects or regions in the real world. Fully automatic segmentation is an incredibly challenging problem in image processing/computer vision. (total or partial segmentation). A collection of separate regions that each represent an image object. Regions that are not related to image objects but share a certain characteristic. Methods are classified into:
-
Global,
-
Edge (breaks),
-
Region (likeness).
Feature extraction is a technique that aims to reduce the dimensionality and complexity of the image data by selecting and combining the most relevant and informative variables or features. Features are characteristics or attributes of the image that can describe its content, structure, or quality. Examples of features are edges, corners, blobs, colours, textures, shapes, etc. Feature extraction helps to improve the performance and efficiency of various image processing tasks, such as classification, segmentation, recognition, retrieval, etc. by eliminating redundant or noisy data and preserving the essential information.
If an image consists of known objects, whose size and shape may not be known, segmentation can be viewed as the problem of finding the objects within the image. i.e., searching for specific patterns.
Techniques to be studied:
-
Simple line finding,
-
Three advanced techniques:
-
Template Matching,
-
Hough Transform,
-
Active Contours.
-
Aim is to find image boundaries directly using convolution. This method uses a formula that involves applying a maximum operation on the convolution of the image with 14 different kernels. The kernels are matrices that can detect horizontal and vertical edges in the image.
This method improves the results of the previous method by applying edge
thinning and edge linking techniques. Edge thinning reduces the
thickness of the edges by using a low threshold for the gradient
magnitude. Edge linking connects the edge points by using a high
threshold for the gradient magnitude and filling the gaps. Edge points
rarely fully characterise image boundaries. Edge thinning (use low
threshold for
The notes provide some example matrices that demonstrate the effects of edge thinning and edge linking on a simple image. The matrices show the pixel values of the image before and after applying the techniques.
Edge linking (use high threshold for
Edge Linking based on
Template matching is a technique that involves finding small parts of an image that match a given template image. It can be used for various purposes, such as quality control, navigation, or edge detection. Template matching can be done in two ways: feature-based or template-based.
Feature-based template matching relies on extracting image features, such as shapes, textures, and colours, which match the target image or frame.
Template-based template matching involves finding the best match by minimizing the mean-squared error or maximizing the area correlation between the template and the image. This approach is often more effective for templates without strong features, or when the template image constitutes the matching image as a whole. Template-based template matching can be implemented efficiently using the Discrete Fourier Transform and phase correlation.
Convolve template with image and find best match.
Say
Goodness of match determined using similarity or dissimilarity measure or metric.
Cross-correlation Coefficient
Where:
Speeding it up:
-
Multi-resolutional search
-
Frequency domain
-
Take FFT of image and template,
-
Multiply,
-
Inverse FFT
-
The Hough transform is a technique that uses a voting procedure to detect imperfect instances of objects within a certain class of shapes, such as lines, circles, or ellipses. The technique works by transforming the image space into a parameter space, where the shapes can be identified as local maxima in an accumulator space that is constructed by the algorithm. The Hough transform is useful for finding shapes that are distorted, incomplete, or partially hidden in the image.
Form of template matching for analytic shapes. Evidence gathering technique. Robust to noise and occlusions.
Hough Transform of a line:
Hough Transform procedure:
-
Determine all possible pixels that lie online,
-
For each point, cast line of votes in parameter space,
-
Detect those positions with the most votes.
Computational attractiveness comes from subdivision of parameter space.
Number of votes
If two lines are perpendicular (
Hough Transform of a circle:
Hough Transform of an ellipses:
Parameter Space Decomposition
For lines:
For circles:
Parameter Decomposition for circles
Technique for finding arbitrary shapes. Geometry similar to circle case. Uses R-table. Final Hough Transform note: finding the peaks can be tricky!
Active contour models, also called snakes, are a framework that uses deformable curves or surfaces to find the boundaries of objects or regions in an image. Active contour models are influenced by internal forces that control the smoothness and shape of the curve or surface, and external forces that attract the curve or surface to the features of interest in the image, such as edges or lines. Active contour models can be classified into two categories: parametric active contours, which use energy minimization techniques, and geometric active contours, which use level set methods and curve evolution techniques. Active contour models are widely used for applications such as edge detection, segmentation, shape recognition, and object tracking.
A sophisticated approach to contour extraction. Used in image segmentation and understanding + suitable for dynamic or 3D data. A Snake is an energy minimizing spline:
-
Snake's energy depends on its shape and position in image,
-
Seeks local energy minima rather than global,
-
Local minima correspond to desired image properties,
-
Works on paradigm that presence of edge depends not only on gradient at specific point but also the spatial distribution of the gradient.
Snake as a contour. An open or closed contour can be described as:
Snake energy is a term used to describe the energy function that defines the shape and behaviour of an active contour model or snake .
Two terms, called functionals:
-
Internal Energy -- Controls natural behaviour of snake. Designed to minimize snake's curvature and make it behave in elastic manner.
-
Image Energy -- attracts snake to desired features in image, i.e., edges, lines etc.
Iterative process and solution given by local minimum.
-
If
$\beta(s) = 0$ at a point$s$ , snake can form corner at$s$ , -
If
$\alpha(s)$ and$\beta(s) = 0$ at$s$ , snake can become discontinuous at$s$ .
Line function:
Edge function:
Termination Functional. Let
Then:
Scale-based edge operator (Marr Hildreth). The scale-based edge operator (Marr Hildreth) is a technique that uses a bandpass filter to detect edges at a specific scale in an image. The technique works by convolving the image with the Laplacian of the Gaussian (LoG) function, which is a second-order derivative that highlights regions of rapid intensity change. Then, zero crossings are detected in the filtered result to obtain the edges. The scale of the edges can be controlled by adjusting the standard deviation of the Gaussian function. The Marr Hildreth operator is useful for finding shapes that are distorted, incomplete, or partially hidden in the image.
Additional Energy Terms:
The greedy algorithm is a technique that works by making the locally optimal choice at each step, without considering the global consequences. It is often used for optimization problems, such as finding the best solution among a set of possible solutions. Greedy algorithms can be applied to various image processing problems, such as image compression, denoising, and segmentation. Greedy algorithms can be amazingly fast and efficient, but they may not always produce the optimal solution for every problem.
Effect of removing control by spacing (
Effect of removing low curvature control (
Morphological approach to segmentation. Combination of edge- and region-based. The watershed transform is a technique that uses a metaphor of a topographic map to segment an image into different regions or objects. The technique works by imagining the image as a surface where the pixel values represent the height, and then finding the lines that run along the tops of the ridges, which separate the catchment basins, or the areas drained by different rivers. The watershed transform can be applied to greyscale images or to the gradient of an image, and it can be computed using various algorithms, such as flooding, distance transform, or graph-based methods. The watershed transform is useful for finding shapes that are distorted, incomplete, or partially hidden in the image.
When applied to the image gradient, the catchment basins should theoretically correspond to the homogeneous image regions.
Why use colour? Simplifies object identification. Eye can only distinguish 20 0 30 shades of grey, but 1000z of colours. Humans see in colour.
Can be full colour of pseudo-colour. Fundamentals:
-
All colours are combinations of primary colours,
-
Int. Committee on Radiation (1931) defined: Red (700nm), Green (546.1nm), and Blue (435.9nm).
-
Secondary colours given by combinations of primary colours,
-
Colour can also be described by its brightness, hue, and saturation.
Tristimulus values:
Chromaticity diagram is a plot of
Each colour defined by primary spectral components (normalised to unit cube); Red, Green, Blue.
Similar to RGB but uses secondary colours, Cyan, Magenta, Yellow.
Used in television.
Again, the luminance
YUV variants:
Useful as intensity decoupled from chrominance and
Given R, G, and B with
-
Intensity
$I = \frac{1}{3}(R + G + B)$ -
If
$I \neq 0$ , then the saturation$S = 1 - \frac{3}{R + G + B} \bullet min(R,\ G,\ B)$ -
If
$S \neq 0$ , then the hue$H = \cos^{- 1}\left( \frac{\frac{1}{2}\left\lbrack (R - G) + (R - B) \right\rbrack}{\sqrt{(R - G)^{2} + (R - B)(G - B)}} \right)$ -
If
$\frac{B}{I} > \frac{G}{I}$ , then correct hue by setting$h = 360{^\circ} - h$
- Calculate
$r$ ,$g$ ,$b$ :
- Calculate
$RGB$ :
This image illustrates the concept of pseudo colour image processing, specifically intensity slicing. In this technique, different intensities of grey in a greyscale image are mapped to different colours to enhance the visibility of certain features. The graph on the left shows a slicing plane that selects a range of grey levels (intensities) in the original greyscale image. The middle and right images show the results of applying pseudo colouring to highlight specific intensity ranges, making certain features more visible and easier to analyse.
This image is showing how to transform a grey level image into a colour image using separate transformations for the red, green, and blue colour channels. , this involves applying different transformation functions to the original grey level values to obtain the respective colour channel values. The transformed images are then combined to create a full-colour version of the original grey level image. The image shows the following steps:
-
Red transformation: This process maps the grey level values to the red channel values using a function ($f_{R}(x,y)$). The function can be chosen based on the desired effect or the characteristics of the image.
-
Green transformation: This process maps the grey level values to the green channel values using a function ($f_{G}(x,y)$). The function can be chosen based on the desired effect or the characteristics of the image.
-
Blue transformation: This process maps the grey level values to the blue channel values using a function ($f_{B}(x,y)$). The function can be chosen based on the desired effect or the characteristics of the image.
-
Colour image: This is the result of combining the red, green, and blue channel images into a single colour image. The colour image exhibits enhanced details with different colours highlighting different features or areas.
This image is showing how to apply frequency filtering to a greyscale image and convert it into a pseudo-colour image for better visualization and analysis. Frequency filtering is a technique that uses the Fourier transform to manipulate the frequency components of an image. The image shows the following steps:
-
Fourier transform -- This step converts the spatial domain image into the frequency domain, where each pixel represents a sinusoidal wave with a certain frequency, amplitude, and phase.
-
Filter -- This step applies one or more filters to the frequency domain image, such as low-pass, high-pass, band-pass, or notch filters, to enhance or suppress certain frequencies. Different filters can produce different effects on the image, such as smoothing, sharpening, or removing noise.
-
Inverse Fourier transform -- This step converts the filtered frequency domain image back into the spatial domain, where each pixel represents the intensity value of the image.
-
Other processing -- This step applies other processing techniques to the spatial domain image, such as contrast enhancement, edge detection, or segmentation, to improve the quality or extract the features of the image.
-
Colour display -- This step converts the processed spatial domain image into a pseudo-colour image, where different intensities of grey are mapped to different colours. Pseudo-colouring can help to highlight certain features or regions in the image that are not easily visible in greyscale.
Convert from RGB to HIS. Process the
Process in original colour domain. Two approached can be used:
-
Process each channel independently and then combine,
-
Directly process colour pixels (as Vectors)
The results of the two approaches may or may not be identical, depending on the operation.
Spatial masks for greyscale and RGB images.
Vectors mean:
What about the vector median?
The
For a vector difference:
The three most common used norms are:
-
$L_{1}$ norm (City block distance), -
$L_{2}$ norm (Euclidean distance), -
$L_{\infty}$ norm (Chessboard distance)
A vector median filter is a type of nonlinear filter that operates on
colour images, which are represented as vectors of pixel values in
different colour channels. A vector median filter replaces each pixel
with the median of the neighbouring pixels, based on some distance
measure between vectors. A vector median filter can preserve the edges
and details of the image, while removing noise and outliers. A common
distance measure used for vector median filtering is the
The vector median of a set of
Vector median example using the
For
For
For
For
Greyscale edge detection only accounts for 90% of total colour edge points; colour edge detection is required to resolve the remaining 10%. This graph outlines a process for colour edge detection in digital images. It shows how to decompose an image into different colour channels, apply vector methods and model matching to each channel, and then recombine them to produce an edge map. Colour edge detection is a technique that uses the colour information of an image to enhance the edge detection results. It can be useful for images with complex textures, patterns, or shading. Here are some key points about the graph:
-
Image decomposition -- This step splits the image into its red, green, and blue components, which are then processed separately.
-
Vector methods -- This step applies multidimensional gradient methods to each colour component, such as the Sobel, Prewitt, or Roberts operators, to find the edge strength and direction.
-
Model matching -- This step compares the edge vectors of each colour component with a predefined model, such as a line, a curve, or a corner, to find the best match.
-
Image recombination -- This step merges the edge vectors of each colour component into a single vector, using a weighted average or a maximum operation.
-
Edge decision -- This step decides whether a pixel belongs to an edge or not, based on a threshold or a fuzzy logic rule.
-
Output fusion methods -- This step combines the edge decisions of each colour component into a final edge map, using a logical OR, AND, or XOR operation.
Vector order statistics colour edge detectors are a technique for
detecting edges within colour images. They work by reducing the ordering
according to aggregate distances,
Reduced ordering is a technique for comparing multidimensional vectors without using a total order relation. It is based on the idea of reducing the dimensionality of the vectors by projecting them onto a lower-dimensional subspace, and then applying a scalar order relation on the projected values.
Reduced ordering according to aggregate distances,
Vectors ordered to that when
Example:
Minimum VR:
Min Vector Deviation:
For
The Morphological Gradient is a technique that measures the difference between the dilation and erosion of an image, resulting in an image that highlights the edges or boundaries of the objects. It can be defined as:
The Colour Morphological Gradient (CMG) is an extension of the Morphological Gradient to colour images. It does not require an explicit pixel ordering and is easily computed by using the maximum absolute difference between the colour components of the pixels in a neighbourhood. It can be defined as:
The CMG can be used to detect edges in colour images, especially when the intensity or hue information is not sufficient. It can also be used to enhance the contrast or colour distribution in the image.
Consider the CMG performance at a step edge corrupted by Gaussian noise. Gaussian noise is a type of noise that adds random variations to the pixel values of the image, making it less clear or sharp. The graph below shows how the CMG can detect the edge despite the noise, as indicated by the peak in the intensity curve. The graph also shows the effect of different neighbourhood sizes on the CMG.
Produces improved performance by rejecting outliers and finding median centred difference:
Where
Typical values:
-
$s = 1$ or$2$ for$3 \times 3$ mask, -
$s = 8$ or$9$ for$5 \times 5$ mask.
This is a technique that measures the quality of an image after applying a noise reduction method. It calculates the ratio of the number of correctly detected edge pixels to the number of incorrectly detected edge pixels, using a reference image as a ground truth. The formula for FOM is given by:
where
Gaussian Correlated Noise: This is a type of noise that adds random variations to the pixel values of the image, making it less clear or sharp. The noise is correlated, meaning that the noise values of neighbouring pixels are similar or dependent on each other. The graph shows how the FOM values of two methods (MNO and RCMG) change with different SNR values. SNR is a measure of the ratio of the signal power to the noise power in the image. A higher SNR means a lower noise level. The graph shows that the RCMG method performs better than the MNO method in reducing Gaussian correlated noise, as it has higher FOM values for all SNR values.
Impulsive Correlated Noise: This is a type of noise that adds random spikes or outliers to the pixel values of the image, making it appear distorted or corrupted. The noise is correlated, meaning that the noise values of neighbouring pixels are similar or dependent on each other. The graph shows how the FOM values of two methods (MNO and RCMG) change with different SNR values. SNR is a measure of the ratio of the signal power to the noise power in the image. A higher SNR means a lower noise level. The graph shows that the MNO method performs better than the RCMG method in reducing impulsive correlated noise, as it has higher FOM values for all SNR values.
This might be a sample image used for testing or demonstrating the noise reduction techniques. It shows a complex pattern with different colours and shapes, which might be affected by noise or enhanced by noise reduction.
Applications: Scene labelling, using constraint propagation. Assume that regions in image detected that correspond to objects and some form of inter-relationships exist between objects. The goal is to assign a label (meaning) to each image object to achieve image interpretation.
There are two main approaches:
-
Discrete Relaxation -- allows only one label for each object in final labelling, aims to achieve consistent labelling over entire image.
-
Probabilistic Relaxation -- allows multiple labels for each object with labels' probability giving confidence.
The difference is robustness. Discrete relaxation always finds a consistent labelling or detects if no consistent labelling is possible and often fails to find a consistent interpretation, even if there are only a few errors. Probabilistic relaxation always gives an interpretation and a confidence measure. Normally better, even if locally inconsistent.
Set of objects:
Finite set of labels:
Finite set of relations between objects. Compatibility function between interacting objects: look for local interactions and then use constraint propagation; using iterative scheme, local consistencies lead to global consistencies.
Example: six objects:
-
Window (
$W$ ), -
Table (
$T$ ), -
Drawer (
$D$ ) (x2), -
Phone (
$P$ ), -
Background (
$B$ )
Unary relationships:
-
$W$ is rectangular, -
$T$ is rectangular, -
$D$ is rectangular
Binary inter-relationships:
-
$W$ is located above$T$ , -
$P$ is located above$T$ , -
$D$ is located above$T$ , -
$B$ is adjacent to image border.
-
Step 1: Apply all labels to all objects and apply unary constraints,
-
Step 2: Repeat until globally consistent,
-
Step 3: If any object has no label, stop -- no consistent labelling found
-
Step 4: Select one object to have labels updated
-
Step 5: Modify (delete) inconsistent labels using binary constraints.
Let
The compatibility between assignments
For label
-
If
$P_{hk}$ is high and$c(i,\ j;h,\ k)$ is positive then. -
If
$P_{hk}$ is high and$c(i,\ j;h,\ k)$ is negative then, -
If
$P_{hk}$ is low and$c(i,\ j;h,\ k)$ is$\approx$ 0 then.
Therefore, the product of
Average net increment:
To make
Where:
-
Need initial estimates of
$P_{ij}^{0}$ -
Range of objects should be limited
-
If
$P^{r} = 0$ or 1 it cannot change -
$P_{ij}^{r}$ tend to converge to 0 or 1
Example applications: solid mounds, dam walls, glaciers, clouds. Object characteristics: large, non rigid, inaccessible.
Correction surface:
Correlation/Relaxation Labelling motion estimation. Template matching
used to establish multiple candidates.
Initial probabilities derived from correlation coefficients.
Where
Probability updating:
Where the support function:
Where





























































































































