Python

Getting familiarised with image processing

Getting familiarised with image processing

Digital image processing is the use of a digital computer to process digital images through an algorithm. ~ Definition right off the page from Wikipedia

Let’s begin by simplifying the understanding of Image Processing and moving on further with Image algorithms and different processes followed by implementing them using the PIL library (quite a popular package of Python!).

GIF Credits: Google

Image processing is just a simple way to look at a digital image, be it grey scaled or colored or any different mode and performing various operations/processes like image scaling/resizing, rotation, cropping and adjusting contrast, and many such familiar operations/processes.

πŸƒβ€β™€οΈ Table of Contents πŸƒβ€β™‚οΈ

Providing the entire content table so you’re free to choose which topic you want to read! Enjoy reading. 😁

  1. Prerequistes

  2. About the PIL

  3. Image Respresentation & its modes

    1. PIL Implementation
  4. Image Processing Principles

    1. Nearest Neighbor
    2. Bilinear Interpolation
    3. Bicubic Interpolation
    4. Convolution Method
  5. Transformation with PIL

    1. Resizing Image
    2. Rotating Image

Before starting anything, do this!

In this tutorial to perform different image processing on various images, we’re using the PIL python library. Now certainly PIL is not just pre-installed on your machine. Thus we got to install python, pip and last and the most important, PIL.

If you have anaconda installed on your machine, you don’t have to do any of the steps mentioned below.

When you download python, PIP is installed along with it.

For installing PIL, we will install Pillow (Pillow is a maintained fork of PIL):

$ pip install Pillow

If you want to code side by side along with this blog, I would suggest doing the following in your terminal:

$ mkdir pil-image-processing && cd pil-image-processing
$ mkdir images

We will be writing the programs within the pil-image-processing directory. You can use any code editor however I will make use of vim.

Also, if you want to use the same images as I have in this blog, you can download the youtube icon and sunset picture and place them in your images folder.

PIL Library

I will give a brief on what the PIL library is and what we’ll cover with it.

PIL library is quite a popular package provided by Python that handles all the image processing and functions that I mentioned earlier. It’s in short called Pillow also(I don’t know why!).

Pillow is the β€œfriendly” PIL fork by Alex Clark and Contributors. ~ As mentioned in the documentation

Image Representation & its modes

Pixel is the smallest & the most important component of an image. A collection of pixels makes an image altogether. A pixel value always ranges from 0-255.

Picture Credits: Google

An image always has a width (w) and height (h) and can be understood by its comparison through a matrix with w rows and h columns.

We’ll move on a step further to understand the different modes of images. Earlier I mentioned an image can be represented through a matrix. A matrix is 2-dimensional and this single matrix forms a single channel of the image. That’s a new term alert!

When we fill the matrix in the single-channel with values ranging from 0-255, it will form a greyscale image. Great so now we reached a level where we created a greyscale image. Greyscale image size is simply represented with width x height format.

Picture Credits: Google

Let’s move on to form the colored digital images observed these days.

For understanding the colored images, we’ll go to legos. In legos, we build a tower by placing one lego over another. In this manner, each lego is like a single-channel. In RGB/colored images, we’ve 3 channels that are placed on top of each other to create a single image.

In RGB, we have got the Red, Green, and Blue channels. Each of these channels is filled with values ranging from 0-255. Together, when these channels are together placed, we get a colored image. It’s as simple as that theoretically. In practice, we know that an image has a width (w) represented as rows and height (h) represented as columns but in an RGB image, we also need to mention the channels (c).

The RGB image size is represented as width x height x channels.

Picture Credits: Google

There are various other modes apart from RGB & greyscale such as CYMK and RGBA (and so many others).

RGBA => RGB + Alpha lchannel (handles opacity of the image)

PIL Implementation for different modes

In PIL, there are several shortcodes for modes in their documentation with which we will refer and perform simple operations.

1. Convert any image format to greyscale format

from PIL import Image
img = Image.open('images/image.png').convert('LA')
img.save('images/greyscale.png')

Now, this just converted our colored image to greyscale, damn! I will explain what this code does:

  1. Import the Image package from the PIL library to use the in-built functions.
  2. We open the image apply the LA mode to convert to greyscale and assign it to the variable img.
  3. We save this new image by specifying the path.

LA is not Los Angeles guys but a mode specified by PIL in their documentation.

L -  (8-bit pixels, black, and white)
LA (L with alpha)

Picture taken from Google, updated by myself

2. Splitting colored image to their respective channel

Here, a colored image has 3 layers as mentioned before: Red, Green & Blue. What we’ll do is extract each channel separately from the image and save those individual channels. Simple!!

from PIL import Image
img = Image.open('images/sunset.png')
data = img.getdata()

Picture Credits: Google

Here, we open the image and assign it to variable img. After that, we’re extracting the data of a colored image. We also mentioned before that a channel of an image is formed through a matrix that encompasses the pixel values ranging from 0-255.

Now for one channel, we’ve one matrix filled with pixel values. But in a colored image, we’ve 3 channels (red, green & blue) so we’ve three matrixes filled with a pixel value.

Again, a matrix is represented in a 2-D array with rows and columns. Now we’ve to include the values of 3 matrixes so we do it using a 3-D array and it comes in this format.

[[
  [r(0,0), g(0,0), b(0,0)], [r(0,1), g(0,1), b(0,1)]], 
  [[r(1,0), g(1,0), b(1,0)], [r(1,1), g(1,1), b(1,1)]
]] 

This format is better understood diagrammatically as shown:

Thus, the data we extract from the image will be of the format:

[[[202 112 104]
 [202 112 104]
 [202 112 104]
 ...
 [200 111 105]
 [200 111 105]
 [199 110 104]]

 [[202 112 104]
 [202 112 104]
 [202 112 104]
 ...
 [200 111 103]
 [200 111 103]
 [199 110 102]]]

Let’s use this data and split this data so we can get the r pixel values for the red channel, g pixel values for the green channel, and b pixel values for the blue channel.

red = [(pixel[0], 0, 0) for pixel in data]
green = [(0, pixel[1], 0) for pixel in data]
blue = [(0, 0, pixel[2]) for pixel in data]

As we have seen the data is of 3-D array format, thus let’s focus on extracting only the red matrix for now. In the 3-D array, the internal array is of the format [r, g, b] and we need to take the first value of this array to get the red pixel values. We do this similarly for creating the green matrix and blue matrix.

Let’s use these 3 matrixes and save them. We can then see the separate red, green, and blue channels of the original image.

img.putdata(red)
img.save('images/red_sunset.png')
img.putdata(blue)
img.save('images/blue_sunset.png')
img.putdata(green)
img.save('images/green_sunset.png')

Here, we use the putdata() in-built function to copy the matrix content into the original image and then saving it.

Image Processing - Behind The Scene Principles

For image processing techniques such as rescaling/resizing and rotation are formed based on principals of - nearest neighbor, bilinear transformation & other principles I will explain in detail.

Let’s start with the simplest principle and climb up with more complex ones. Just note that all these principles I will be explaining theoretically in the simplest form and not mathematically.

1. Nearest Neighbor Principle

In this principle, we have an original image(A) and we apply an image processing technique to convert it to a modified image(B).

I will explain this by resizing an image from A (3x3) to B (9x9). The first step to do is take the coordinates of the A image, multiple it by 3, then we will get a coordinate in the B image. This new coordinate of image B will have the same pixel value as the original coordinate in image A had.

A(0, 0) => 110
A(0, 1) => 120  
>>> x3 <<<
B(0, 0) => 110
B(0, 3) => 120 

Now, we need to find the pixel value B(0, 1), and to do that we do as followed:

B(0, 1) => ? 
>>> /3 <<< 
A(0, 1/3) => no such coordinate exists, right?

You’re right no such coordinate exists but the nearest neighbor asks that is A(0, 1/3) closer to A(0, 0) or A(0, 1). Here, it is closer to A(0, 1) thus the pixel value of B(0, 1) will be the same as A(0, 0). Similarly, other coordinates are calculated as well! This is not so efficient, certainly!

2. Bilinear Interpolation

As you would have understood if we use the nearest neighbor principle to resize or rotate an image, it won’t produce an accurate output and will be pixelated. Thus, we use another principle - Bilinear Transformation.

Let’s take the sample example of resizing an image A(3x3) to image B(9x9).

Before we understand bilinear transformation, we need to understand linear transformation. We’ll take the same example as we did in the nearest neighbor. So we need to find B(0, 1) and the way we do it is found A(0, 1/3) and the manner we do it is:

We have the pixel value A(0, 0) and A(0, 1) so first we find 1/3 * value @ A(0, 1) and 2/3 * value @ A(0, 0). Add those up and round it, you got the pixel value for A(0, 1/3) and that’s your value for B(0, 1).

We use this method to find any coordinate between 2 given coordinates. In bilinear, we make use of 4 consecutive coordinates in the original image A. With these coordinates, we calculate an ideal coordinate in image A and scale it to a coordinate in image B.

  1. We have pixel values of A(0, 0), A(0, 1), A(1, 0), and A(1, 1). First, we find the coordinates of the square blocks between the 2 coordinates of image A using the linear interpolation method.
  2. With the coordinates of the 2 square blocks, we can interpolate and find the coordinate of the triangle block(the chosen one!).

Once we find the coordinate of the triangle block, we just map it to image B.

So that’s the basic idea of how bilinear works. In this manner, we can find any coordinate between 2 coordinates in our image B. With this principle, we can get much accurate pixel values and a smoother modified image.

3. Bicubic Interpolation

In bicubic interpolation, we make use of a cubic equation. However, as I have mentioned before we’ll focus on the theoretical rather than the mathematical understanding of the principle.

Let’s look back at linear/bilinear interpolation to transform from image A to image B.

The below diagram in the 2-D plane shows that b/w any 2 given coordinate is just a line and you can find any coordinate between those 2 using the linear interpolation method:

However, this linear interpolation doesn’t always provide a highly accurate or smooth result. Thus, we make use of bicubic.

From this diagram, if I use the bicubic method, what it implies is if there are 2 given coordinates and there is the line joining those 2, then we will use the tangent/derivative at those 2 coordinates and make the curve and find any coordinate between those 2.

How are we calculating the derivatives at the coordinates?

For calculating the tangent at those 2 given coordinates, we need the neighboring coordinates of those 2 as well! So easy, right?

We use the linear interpolation method to calculate the ideal coordinate in bilinear interpolation using 4 coordinates.

If we want to find the ideal coordinate between those 4 using bicubic, we’ll need to use the extension as shown in the diagram.

For bicubic, we’ll do as followed:

  1. For each row, we will find the coordinate (square blocks) between 2 given coordinates and we need to find the tangent at those 2 given coordinates using the neighbor of the given 2.
  2. Once the coordinates of the square blocks are calculated, we use the same method in 1 and get the ideal coordinate of the triangle block.

The coordinate of the triangle block is scaled to the coordinate in image B. This gives a much smoother transformation as compared to bicubic interpolation.

Let’s cover our last principle before we jump to implementation using PIL.

4. Convolution method

This is an important principle! It’s important ‘cause today principles like bi-cubic, bilinear is based upon the convolution principle itself.

Let’s begin by understanding convolution. In convolution let’s say we have an image I(7x7) and what we’ll do here is slide a kernel window K(3x3) over the image I such that we’ll get a new image J(5x5) = I(7x7) * K(3x3)

Picture Credits: Google

In the diagram, we see that the kernel K slides over image I. Then when the kernel K is over a section of image I, it multiplies the pixel values of K and I and then adds them all up in the section, and we get a convoluted image pixel value. I know this sounds confusing so we’ll understand this with an example.

In that image, we have a section of image I and we’ve our kernel K. From the diagram, it’s understandable how the 9-pixel values result in a 1 convoluted pixel value. In this manner, we continue to slide the kernel K over the entire image I and get a convoluted image J (I * K).

This is a heavy process, don’t let anyone tell you different!

Before moving further, I would like to clarify that these methods that I specified are used as filters to perform the transformation in images.

Implementing transformation with a simple PIL

According to the documentation of PIL:

Image resizing methods resize() and thumbnail() take a resample argument, which tells which filter should be used for resampling. Possible values are PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC and PIL.Image.ANTIALIAS.

~

The upscaling performance of the LANCZOS filter has remained the same. For the BILINEAR filter, it has improved by 1.5 times and for BICUBIC by four times.

From those 2 contexts from documentation, we know what Nearest, Bilinear, and Bicubic are, so that’s cool! Now, what’s Antialias and Lanczos? Antialias is a high-quality filter based on convolutions (we learned this!). Now Antilias filter is just an alias of Lanczos. So Lanczos and antialias are based on convolution.

Let’s implement resizing using these filters with PIL.

In the documentation, for using the filters while resizing and rotation we can use their names explicitly or the number assigned for them.

NEAREST = NONE = 0
BILINEAR = LINEAR = 2
BICUBIC = CUBIC = 3
LANCZOS = ANTIALIAS = 1

1. Resizing an image using different filters

We’re going to re-use the sunset image that we did while splitting the image into different channels. Let’s get coding!

from PIL import Image
img = Image.open('images/sunset.png')
size = (350, 550)
nearest_resize = img.resize(size, 0)
lanczos_resize = img.resize(size, 1)
bilinear_resize = img.resize(size, 2)
bicubic_resize = img.resize(size, 3)

nearest_resize.save('images/nearest_sunset.png')
lanczos_resize.save('images/lanczos_sunset.png')
bilinear_resize.save('images/bilinear_sunset.png')
bicubic_resize.save('images/bicubic_sunset.png') 

Here, we’re resizing the original image using different filters. With the current image, you may not be able to see much difference as shown.

Picture Credits: Google

However, this particular image provides a good difference between resizing between different filters. I would suggest looking at the filters that have been used by us in this blog and observe the difference.

2. Rotating an image using different filters

We’ll be using the filters to rotate an image by any degree we wish to. However, there is a problem when rotation happens to let’s say 30 or 45 degrees, the edges get clipped off. If we don’t want that we’ll be setting expand=true which will prevent that!

In resizing, we used numbers to depict what kind of filter we want to use over the image while resizing. Here, we’ll explicitly type their names out and again use the same sunset image.

from PIL import Image
img = Image.open('images/sunset.png')

nearest_rotate = img.rotate(45, resample=Image.NEAREST, expand=True)
bilinear_rotate = img.rotate(45, resample=Image.BILINEAR, expand=True)
bicubic_rotate = img.rotate(45, resample=Image.BICUBIC, expand=True)

nearest_rotate.save('images/nearest_rotate_sunset.png')
bilinear_rotate.save('images/bilinear_rotate_sunset.png')
bicubic_rotate.save('images/bicubic_rotate_sunset.png') 

Here, the Lanczos filter can’t be used for rotation as it is not implemented by PIL itself and quite a heavy process so doesn’t hold any advantage over bicubic for now.

PIL is not limited to just resizing, rotation, splitting images but it can do so much more. Everything you perform on photo editing tools such as crop, copy-pasting, merging 2 images can be achieved through PIL just through few lines of code. In this, I have only used the Image module but there are many modules for PIL itself.

For any additional resources for PIL, just look through their documentation, and certainly if stuck, stack overflow always provides every answer. PIL is an effective library and I have covered the 2 most commonly used operations and the principles that reside behind the transformation.

The sources for each code is present on Github as well, so you can refer from there and code side by side too!

If any questions arise regarding PIL or anything mentioned in the article, feel free to mention it in the comments! Till then, goodbye. πŸ‘‹