Digital image processing is the use of a digital computer to process digital images through an algorithm. ~ Definition right off the page from Wikipedia
Let’s begin by simplifying the understanding of Image Processing and moving on further with Image algorithms and different processes followed by implementing them using the PIL library (quite a popular package of Python!).
GIF Credits: Google
Image processing is just a simple way to look at a digital image, be it grey scaled or colored or any different mode and performing various operations/processes like image scaling/resizing, rotation, cropping and adjusting contrast, and many such familiar operations/processes.
Providing the entire content table so you’re free to choose which topic you want to read! Enjoy reading. π
In this tutorial to perform different image processing on various images, we’re using the PIL python library. Now certainly PIL is not just pre-installed on your machine. Thus we got to install python, pip and last and the most important, PIL.
If you have anaconda installed on your machine, you don’t have to do any of the steps mentioned below.
When you download python, PIP is installed along with it.
For installing PIL, we will install Pillow (Pillow is a maintained fork of PIL):
$ pip install Pillow
If you want to code side by side along with this blog, I would suggest doing the following in your terminal:
$ mkdir pil-image-processing && cd pil-image-processing
$ mkdir images
We will be writing the programs within the pil-image-processing
directory. You can use any code editor however I will make use of vim
.
Also, if you want to use the same images as I have in this blog, you can download the youtube icon and sunset picture and place them in your images
folder.
I will give a brief on what the PIL library is and what we’ll cover with it.
PIL library is quite a popular package provided by Python that handles all the image processing and functions that I mentioned earlier. It’s in short called Pillow also(I don’t know why!).
Pillow is the βfriendlyβ PIL fork by Alex Clark and Contributors. ~ As mentioned in the documentation
Pixel is the smallest & the most important component of an image. A collection of pixels makes an image altogether. A pixel value always ranges from 0-255.
Picture Credits: Google
An image always has a width (w) and height (h) and can be understood by its comparison through a matrix with w rows and h columns.
We’ll move on a step further to understand the different modes of images. Earlier I mentioned an image can be represented through a matrix. A matrix is 2-dimensional and this single matrix forms a single channel of the image. That’s a new term alert!
When we fill the matrix in the single-channel with values ranging from 0-255, it will form a greyscale image. Great so now we reached a level where we created a greyscale image. Greyscale image size is simply represented with width x height
format.
Picture Credits: Google
Let’s move on to form the colored digital images observed these days.
For understanding the colored images, we’ll go to legos. In legos, we build a tower by placing one lego over another. In this manner, each lego is like a single-channel. In RGB/colored images, we’ve 3 channels that are placed on top of each other to create a single image.
In RGB, we have got the Red, Green, and Blue channels. Each of these channels is filled with values ranging from 0-255. Together, when these channels are together placed, we get a colored image. It’s as simple as that theoretically. In practice, we know that an image has a width (w) represented as rows and height (h) represented as columns but in an RGB image, we also need to mention the channels (c).
The RGB image size is represented as width x height x channels
.
Picture Credits: Google
There are various other modes apart from RGB & greyscale such as CYMK and RGBA (and so many others).
In PIL, there are several shortcodes for modes in their documentation with which we will refer and perform simple operations.
from PIL import Image
img = Image.open('images/image.png').convert('LA')
img.save('images/greyscale.png')
Now, this just converted our colored image to greyscale, damn! I will explain what this code does:
Image
package from the PIL library to use the in-built functions.img
.LA is not Los Angeles guys but a mode specified by PIL in their documentation.
L - (8-bit pixels, black, and white)
LA (L with alpha)
Picture taken from Google, updated by myself
Here, a colored image has 3 layers as mentioned before: Red, Green & Blue. What we’ll do is extract each channel separately from the image and save those individual channels. Simple!!
from PIL import Image
img = Image.open('images/sunset.png')
data = img.getdata()
Picture Credits: Google
Here, we open the image and assign it to variable img. After that, we’re extracting the data of a colored image. We also mentioned before that a channel of an image is formed through a matrix that encompasses the pixel values ranging from 0-255.
Now for one channel, we’ve one matrix filled with pixel values. But in a colored image, we’ve 3 channels (red, green & blue) so we’ve three matrixes filled with a pixel value.
Again, a matrix is represented in a 2-D array with rows and columns. Now we’ve to include the values of 3 matrixes so we do it using a 3-D array and it comes in this format.
[[
[r(0,0), g(0,0), b(0,0)], [r(0,1), g(0,1), b(0,1)]],
[[r(1,0), g(1,0), b(1,0)], [r(1,1), g(1,1), b(1,1)]
]]
This format is better understood diagrammatically as shown:
Thus, the data we extract from the image will be of the format:
[[[202 112 104]
[202 112 104]
[202 112 104]
...
[200 111 105]
[200 111 105]
[199 110 104]]
[[202 112 104]
[202 112 104]
[202 112 104]
...
[200 111 103]
[200 111 103]
[199 110 102]]]
Let’s use this data and split this data so we can get the r pixel values for the red channel, g pixel values for the green channel, and b pixel values for the blue channel.
red = [(pixel[0], 0, 0) for pixel in data]
green = [(0, pixel[1], 0) for pixel in data]
blue = [(0, 0, pixel[2]) for pixel in data]
As we have seen the data is of 3-D array format, thus let’s focus on extracting only the red matrix for now. In the 3-D array, the internal array is of the format [r, g, b]
and we need to take the first value of this array to get the red pixel values. We do this similarly for creating the green matrix and blue matrix.
Let’s use these 3 matrixes and save them. We can then see the separate red, green, and blue channels of the original image.
img.putdata(red)
img.save('images/red_sunset.png')
img.putdata(blue)
img.save('images/blue_sunset.png')
img.putdata(green)
img.save('images/green_sunset.png')
Here, we use the putdata()
in-built function to copy the matrix content into the original image and then saving it.
For image processing techniques such as rescaling/resizing and rotation are formed based on principals of - nearest neighbor, bilinear transformation & other principles I will explain in detail.
Let’s start with the simplest principle and climb up with more complex ones. Just note that all these principles I will be explaining theoretically in the simplest form and not mathematically.
In this principle, we have an original image(A) and we apply an image processing technique to convert it to a modified image(B).
I will explain this by resizing an image from A (3x3) to B (9x9). The first step to do is take the coordinates of the A image, multiple it by 3, then we will get a coordinate in the B image. This new coordinate of image B will have the same pixel value as the original coordinate in image A had.
A(0, 0) => 110
A(0, 1) => 120
>>> x3 <<<
B(0, 0) => 110
B(0, 3) => 120
Now, we need to find the pixel value B(0, 1), and to do that we do as followed:
B(0, 1) => ?
>>> /3 <<<
A(0, 1/3) => no such coordinate exists, right?
You’re right no such coordinate exists but the nearest neighbor asks that is A(0, 1/3) closer to A(0, 0) or A(0, 1). Here, it is closer to A(0, 1) thus the pixel value of B(0, 1) will be the same as A(0, 0). Similarly, other coordinates are calculated as well! This is not so efficient, certainly!
As you would have understood if we use the nearest neighbor principle to resize or rotate an image, it won’t produce an accurate output and will be pixelated. Thus, we use another principle - Bilinear Transformation.
Let’s take the sample example of resizing an image A(3x3) to image B(9x9).
Before we understand bilinear transformation, we need to understand linear transformation. We’ll take the same example as we did in the nearest neighbor. So we need to find B(0, 1) and the way we do it is found A(0, 1/3) and the manner we do it is:
We have the pixel value A(0, 0) and A(0, 1) so first we find 1/3 * value @ A(0, 1)
and 2/3 * value @ A(0, 0)
. Add those up and round it, you got the pixel value for A(0, 1/3) and that’s your value for B(0, 1).
We use this method to find any coordinate between 2 given coordinates. In bilinear, we make use of 4 consecutive coordinates in the original image A. With these coordinates, we calculate an ideal coordinate in image A and scale it to a coordinate in image B.
Once we find the coordinate of the triangle block, we just map it to image B.
So that’s the basic idea of how bilinear works. In this manner, we can find any coordinate between 2 coordinates in our image B. With this principle, we can get much accurate pixel values and a smoother modified image.
In bicubic interpolation, we make use of a cubic equation. However, as I have mentioned before we’ll focus on the theoretical rather than the mathematical understanding of the principle.
Let’s look back at linear/bilinear interpolation to transform from image A to image B.
The below diagram in the 2-D plane shows that b/w any 2 given coordinate is just a line and you can find any coordinate between those 2 using the linear interpolation method:
However, this linear interpolation doesn’t always provide a highly accurate or smooth result. Thus, we make use of bicubic.
From this diagram, if I use the bicubic method, what it implies is if there are 2 given coordinates and there is the line joining those 2, then we will use the tangent/derivative at those 2 coordinates and make the curve and find any coordinate between those 2.
For calculating the tangent at those 2 given coordinates, we need the neighboring coordinates of those 2 as well! So easy, right?
We use the linear interpolation method to calculate the ideal coordinate in bilinear interpolation using 4 coordinates.
If we want to find the ideal coordinate between those 4 using bicubic, we’ll need to use the extension as shown in the diagram.
For bicubic, we’ll do as followed:
The coordinate of the triangle block is scaled to the coordinate in image B. This gives a much smoother transformation as compared to bicubic interpolation.
Let’s cover our last principle before we jump to implementation using PIL.
This is an important principle! It’s important ‘cause today principles like bi-cubic, bilinear is based upon the convolution principle itself.
Let’s begin by understanding convolution. In convolution let’s say we have an image I(7x7) and what we’ll do here is slide a kernel window K(3x3) over the image I such that we’ll get a new image J(5x5) = I(7x7) * K(3x3)
Picture Credits: Google
In the diagram, we see that the kernel K slides over image I. Then when the kernel K is over a section of image I, it multiplies the pixel values of K and I and then adds them all up in the section, and we get a convoluted image pixel value. I know this sounds confusing so we’ll understand this with an example.
In that image, we have a section of image I and we’ve our kernel K. From the diagram, it’s understandable how the 9-pixel values result in a 1 convoluted pixel value. In this manner, we continue to slide the kernel K over the entire image I and get a convoluted image J (I * K).
This is a heavy process, don’t let anyone tell you different!
Before moving further, I would like to clarify that these methods that I specified are used as filters to perform the transformation in images.
According to the documentation of PIL:
Image resizing methods resize()
and thumbnail()
take a resample argument, which tells which filter should be used for resampling. Possible values are PIL.Image.NEAREST
, PIL.Image.BILINEAR
, PIL.Image.BICUBIC
and PIL.Image.ANTIALIAS
.
~
The upscaling performance of the LANCZOS filter has remained the same. For the BILINEAR filter, it has improved by 1.5 times and for BICUBIC by four times.
From those 2 contexts from documentation, we know what Nearest, Bilinear, and Bicubic are, so that’s cool! Now, what’s Antialias and Lanczos? Antialias is a high-quality filter based on convolutions (we learned this!). Now Antilias filter is just an alias of Lanczos. So Lanczos and antialias are based on convolution.
Let’s implement resizing using these filters with PIL.
In the documentation, for using the filters while resizing and rotation we can use their names explicitly or the number assigned for them.
NEAREST = NONE = 0
BILINEAR = LINEAR = 2
BICUBIC = CUBIC = 3
LANCZOS = ANTIALIAS = 1
We’re going to re-use the sunset image that we did while splitting the image into different channels. Let’s get coding!
from PIL import Image
img = Image.open('images/sunset.png')
size = (350, 550)
nearest_resize = img.resize(size, 0)
lanczos_resize = img.resize(size, 1)
bilinear_resize = img.resize(size, 2)
bicubic_resize = img.resize(size, 3)
nearest_resize.save('images/nearest_sunset.png')
lanczos_resize.save('images/lanczos_sunset.png')
bilinear_resize.save('images/bilinear_sunset.png')
bicubic_resize.save('images/bicubic_sunset.png')
Here, we’re resizing the original image using different filters. With the current image, you may not be able to see much difference as shown.
Picture Credits: Google
However, this particular image provides a good difference between resizing between different filters. I would suggest looking at the filters that have been used by us in this blog and observe the difference.
We’ll be using the filters to rotate an image by any degree we wish to. However, there is a problem when rotation happens to let’s say 30 or 45 degrees, the edges get clipped off. If we don’t want that we’ll be setting expand=true
which will prevent that!
In resizing, we used numbers to depict what kind of filter we want to use over the image while resizing. Here, we’ll explicitly type their names out and again use the same sunset image.
from PIL import Image
img = Image.open('images/sunset.png')
nearest_rotate = img.rotate(45, resample=Image.NEAREST, expand=True)
bilinear_rotate = img.rotate(45, resample=Image.BILINEAR, expand=True)
bicubic_rotate = img.rotate(45, resample=Image.BICUBIC, expand=True)
nearest_rotate.save('images/nearest_rotate_sunset.png')
bilinear_rotate.save('images/bilinear_rotate_sunset.png')
bicubic_rotate.save('images/bicubic_rotate_sunset.png')
Here, the Lanczos filter can’t be used for rotation as it is not implemented by PIL itself and quite a heavy process so doesn’t hold any advantage over bicubic for now.
PIL is not limited to just resizing, rotation, splitting images but it can do so much more. Everything you perform on photo editing tools such as crop, copy-pasting, merging 2 images can be achieved through PIL just through few lines of code. In this, I have only used the Image
module but there are many modules for PIL itself.
For any additional resources for PIL, just look through their documentation, and certainly if stuck, stack overflow always provides every answer. PIL is an effective library and I have covered the 2 most commonly used operations and the principles that reside behind the transformation.
The sources for each code is present on Github as well, so you can refer from there and code side by side too!
If any questions arise regarding PIL or anything mentioned in the article, feel free to mention it in the comments! Till then, goodbye. π