Wednesday, 9 December 2015

Asymptotic Analysis - Part 1

Well, in our previous discussions, we found that how it is useful to study about algorithms. Now, let's dig down deeper. First, we want to know what does it mean to analyse an algorithm and  how do we do that ?

Analysis of Algorithms:  In computing, it is useful to know the complexity of any program will lead to solutions of some interesting questions like,

  • How long does it take to run a program with the given input set ?
  • How much space does it require to complete the problem ?
  • Is the problem solvable ?
Well, what do we do that having all those information ?
A good question indeed. To answer this question, let me take a hypothetical scenario of cleaning your house. You are in charge of cleaning your house and you have got some money in your hand to spend.

Now, what are the considerations, you want to do ?
  • How much time it should take to clean the house ?
  • What are the tools you should be using ?
  • When cleaning a room, do you need to shift all the furnitures in other room ?
  • Do you have enough space to shift the furnitures to other location ?
  • Is ir even possible to do that ?
So, you see, you are in a pretty good situation to take a decision on how to do that.

OK, let  me put some constraint on it, you have just one day to do that.

So, you are most probably to think in the direction where you will be vacating each room and clean that and again deploy all the furnitures you had in the room earlier.

Now, let me change the scenario, you have plenty of time and you have no extra room to clean your house, in that scenario, you mostly think of rearranging furnitures in the room and clean a part of the room and continue the stuff until the whole room gets clean and then you move to other room.

Now, let's put some serious condition that you have only one day and no extra room to clean your house.

What about this one ???

Well, its pretty complex to do that. I have to find a better way to do that.
Well, think of it...



Cool, I got some idea to do that, I will move my bed to ...
Congratulations !!! You have successfully found out the most efficient algorithm to clean your house.

Hey, but what does that relate to algorithm and computer science ? We do have plenty of resources nowadays. Quite fast CPU, Gigabytes of RAM. Why do we even care for time and space ?
Well, even with so fast computers and lots of resources, we are still in need of efficient algorithms and perform analysis of algorithms,

Let's talk about why we do that,

I had run a sorting algorithm which runs in a quadratic time (order of n2). So, as the input size grows larger, the running time grows quadratically and I had run the same sorting with another algorithm of order nlog(n).

If you take a look on previous article, you will find a table with run time of two different sorting algorithms with the same input set. You can obviously understand that, with 2.9 GHZ Processor Speed and dedicated 3 GB of RAM (Yes, I used 3 GB as the JVM Parameter for that program to run), the second algorithm (order of n2) failed to sort the numbers even after 9 days of execution while the first algorithm(order of nlogn), did that  in 8 seconds with an input size of only 50000000 numbers.

And sometimes, getting a correct answer late is considered as a wrong answer.

Well, that was a sort of test run against running time we can also have another parameter to test, the space usage. Many people use smartphones for day to day activities, may be some of you are reading this article on a smartphone or tablet. Many of the smartphones have less main memory than a desktop or laptop but in some scenarios, we are yet to perform the same set of operations on these devices too. So, in times, we have to think of the space usage along with the running time of the algorithms.

So, these are the basic needs of performing an analysis of algorithms where we take decisions which way to go. The situation is something like the below, where we have three sorting algorithms.

Conventions: Shorter line length means shorter running time, thicker line means more space usage.


So, we can see, that the first algorithm sorts the array in a long time with least memory usage and the third algorithm is fastest but it takes a greater memory footprint. While the second algorithm is efficient with a reasonable memory footprint. So, you need to understand, how much data is there, what are the expectation from the system.

That's all behind the decision making.

Now, we are in a good shape to understand that Analysis of Algorithms are necessary. Now let's see, how we can do that.

When analysing algorithms, we consider running time and space usage of a particular algorithm. For the time being, we'll just consider about the running time.
Now, to do that, we analyse on input size instead of,

  1. Running algorithms and measuring time, because for different machine the same algorithm can perform differently. Also, the programming language plays a key role in this.
  2. Count the steps in the  implementation, because the programming styles are different for different programmers.
As an alternative, we use Asymptotic Analysis to determine the complexity of an algorithm. Let's see what does this mean,

Asymptotic Analysis: Asymptotic  Analysis is a mathematical analysis which describes limiting behaviour.

In simple English, we can define that, it is a method of determine the rate of growth of running time as the input size grows.

So, we can tell that, if an algorithm runs in quadratically, then as the input grows in a linear fashion, the running time increases quadratically.

Well, that's all for now, we'll look into more depth in our subsequent articles.

Hope you enjoyed this article, happy learning...

Saturday, 5 December 2015

Primality Test, another problem

In previous article we looked into the problem on sorting, now we are going to look into another problem, the prime numbers.

What is it ?
A Prime Numbers is a natural number greater than 1 which has no positive divisor except 1 and itself. For example, 2,3,5,7,11,13,17,19,23 etc are prime numbers.

Why are we considering it here ?
One simple answer is that, to start off with a simple problem which is quite easy to understand
Another way to look at it is that, Prime Numbers are really very much useful in today's life and is a great tool in computer security and hashing functions.

How can a special set of number be a tool ?
Well, to understand this, we have to know a little bit of security. In today's world, many of use GMail, Facebook, Twitter, Online Bank accounts and one of the most common parts in all of these is security, you have to provide your credentials in the web page and if it is correct, you gain access to the portal. Also, one more thing to note is that, your bank account details are only visible to yourself. These all are handled through a special kind of study in computer science, known as Cryptography. In the field of cryptography, Prime Numbers play a major role.
Another major application of prime numbers are in the field of hashing, which basically is devoted to generate a unique piece of information. In case of  creating a unique hash, prime numbers are quite useful.
So, we'll be looking at the prime numbers in this article.

So, what's special in it ?
Obviously you are thinking that, 'It is very easy to find a prime number, you just have to check if it divides by any other integer or not.'
Well, the process is very easy if you have to think of a prime number in a small range. But, it is quite hard to find a prime number in large ranges. Cause, as the number goes up, the density of primes goes less.
Let's take an example to see how it looks like on increasing range and the number of primes in that range.

For numbers upto 10, number of primes are 4

For numbers upto 100, number of primes are 25

For numbers upto 1000, number of primes are 168

For numbers upto 10000, number of primes are 1229

For numbers upto 100000, number of primes are 9592

For numbers upto 1000000, number of primes are 78498

For numbers upto 10000000, number of primes are 664579

The statistics show you how the density of prime numbers go less in higher range.

So, you can think, if you have to find a prime number in higher range you have to do a lot of jobs and most probably you are not going to do that manually. We are most probably going to employ a computer to do that.

So, we are in need to find an efficient algorithm to deal with a higher range of numbers.

OK, let's start with a very basic algorithm of finding prime numbers which checks all numbers from 2 to n-1 as a proper divisor of a number n.

Process - 1
        public static boolean isPrime1(int num) {
  for (int i = 2; i < num; i++) {
   if (num % i == 0) {
    return false;
   }
  }
  return true;
 }

For example, to check primality of 36, we check with,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35

But if we check the comparisons, we can observe that there are some redundant checks in the system. Let's take out the divisors of 36, i.e 2, 3, 4, 6, 9, 12, 18.

So, we can observe that, we can improve the process by eliminating the higher order numbers. Simply we can check for the first half of n. Here is an implementation,

Process - 2
        public static boolean isPrime1(int num) {
  for (int i = 2; i < (num / 2); i++) {
   if (num % i == 0) {
    return false;
   }
  }
  return true;
 }


Now let's check if we can further improve the process. OK, let's prepare a table now,
2*18 = 18*2 = 36
3*12 = 12*3 = 36
4*9 = 9*4 = 36
6*6 = 6*6 = 36

So, we can check only upto 6, 36. In fact, not only 36, this happens to be true for any number. The divisors repeat themselves after n. Let's see an implementation here,

Process - 3
        public static boolean isPrime1(int num) {
  for (int i = 2; i * i <= num; i++) {
   if (num % i == 0) {
    return false;
   }
  }
  return true;
 }

Now, as per the patterns of prime numbers, we can see that any prime number can be expressed as (6*k + i). So, we can further improve our process by eliminating some of the numbers from the set of 2 to n. Now, we can check only for a number by 2, 3 and any form of 6k ± 1 \scriptstyle{}\leq\sqrt n. This process takes almost one third of the time than the first method to check for all numbers upto n. Let's look at the following implementation,

Process - 4
       public static boolean isPrime(int num) {
  if (num <= 1) {
   return false;
  } else if (num <= 3) {
   return true;
  } else if (num % 2 == 0 || num % 3 == 0) {
   return false;
  }
  int i = 5;
  while (i * i <= num) {
   if (num % i == 0 || num % (i + 2) == 0) {
    return false;
   }
   i = i + 6;
  }
  return true;
 }

Let's take an example here,

Let's say we have to check for the primality of number 9973, now according to process - 4, the following happens,
 Process - 4 Demonstration for 9973
Initially, 9973 is checked against 2 & 3. Check fails

Then we use a counter starting from 5,
9973 is checked against 5, 7 
Counter incremented to 11

9973 is checked against 11, 13 
Counter incremented to 17

9973 is checked against 17, 19 
Counter incremented to 23

9973 is checked against 23, 25 
Counter incremented to 29

9973 is checked against 29, 31 
Counter incremented to 35

9973 is checked against 35, 37 
Counter incremented to 41

9973 is checked against 41, 43 
Counter incremented to 47

9973 is checked against 47, 49 
Counter incremented to 53

9973 is checked against 53, 55 
Counter incremented to 59

9973 is checked against 59, 61 
Counter incremented to 65

9973 is checked against 65, 67 
Counter incremented to 71

9973 is checked against 71, 73 
Counter incremented to 77

9973 is checked against 77, 79 
Counter incremented to 83

9973 is checked against 83, 85 
Counter incremented to 89

9973 is checked against 89, 91 
Counter incremented to 95

9973 is checked against 95, 97 
Counter incremented to 101
The numbers marked in red are composite numbers. So, we could have safely ignored those comparisons. Cause we already had completed those checks. Because, a composite is the result of two prime number multiplication. So, if a number is divided by any composite number, the same goes true for the divisors of that composite number.
So, we can observe that, there are 9 redundant comparisons. So, we could have ignored 9 comparisons out of total 34 comparisons.

 Now, let's take another example of another prime number 99991. Let's see what happens,
Process - 4 Demonstration for 99991
Initially, 99991 is checked against 2 & 3. Check fails

Then we use a counter starting from 5,
99991 is checked against 5, 7 
Counter incremented to 11
99991 is checked against 11, 13 
Counter incremented to 17
99991 is checked against 17, 19 
Counter incremented to 23
99991 is checked against 23, 25 
Counter incremented to 29
99991 is checked against 29, 31 
Counter incremented to 35
99991 is checked against 35, 37 
Counter incremented to 41
99991 is checked against 41, 43 
Counter incremented to 47
99991 is checked against 47, 49 
Counter incremented to 53
99991 is checked against 53, 55 
Counter incremented to 59
99991 is checked against 59, 61 
Counter incremented to 65
99991 is checked against 65, 67 
Counter incremented to 71
99991 is checked against 71, 73 
Counter incremented to 77
99991 is checked against 77, 79 
Counter incremented to 83
99991 is checked against 83, 85 
Counter incremented to 89
99991 is checked against 89, 91 
Counter incremented to 95
99991 is checked against 95, 97 
Counter incremented to 101
99991 is checked against 101, 103 
Counter incremented to 107
99991 is checked against 107, 109 
Counter incremented to 113
99991 is checked against 113, 115 
Counter incremented to 119
99991 is checked against 119, 121 
Counter incremented to 125
99991 is checked against 125, 127 
Counter incremented to 131
99991 is checked against 131, 133 
Counter incremented to 137
99991 is checked against 137, 139 
Counter incremented to 143
99991 is checked against 143, 145 
Counter incremented to 149
99991 is checked against 149, 151 
Counter incremented to 155
99991 is checked against 155, 157 
Counter incremented to 161
99991 is checked against 161, 163 
Counter incremented to 167
99991 is checked against 167, 169 
Counter incremented to 173
99991 is checked against 173, 175 
Counter incremented to 179
99991 is checked against 179, 181 
Counter incremented to 185
99991 is checked against 185, 187 
Counter incremented to 191
99991 is checked against 191, 193 
Counter incremented to 197
99991 is checked against 197, 199 
Counter incremented to 203
99991 is checked against 203, 205 
Counter incremented to 209
99991 is checked against 209, 211 
Counter incremented to 215
99991 is checked against 215, 217 
Counter incremented to 221
99991 is checked against 221, 223 
Counter incremented to 227
99991 is checked against 227, 229 
Counter incremented to 233
99991 is checked against 233, 235 
Counter incremented to 239
99991 is checked against 239, 241 
Counter incremented to 245
99991 is checked against 245, 247 
Counter incremented to 251
99991 is checked against 251, 253 
Counter incremented to 257
99991 is checked against 257, 259 
Counter incremented to 263
99991 is checked against 263, 265 
Counter incremented to 269
99991 is checked against 269, 271 
Counter incremented to 275
99991 is checked against 275, 277 
Counter incremented to 281
99991 is checked against 281, 283 
Counter incremented to 287
99991 is checked against 287, 289 
Counter incremented to 293
99991 is checked against 293, 295 
Counter incremented to 299
99991 is checked against 299, 301 
Counter incremented to 305
99991 is checked against 305, 307 
Counter incremented to 311
99991 is checked against 311, 313 
Counter incremented to 317
Now, here we can observe that, we could have ignored 39 comparisons out of total 106 comparisons.

In fact, as the range goes on increasing, the number of redundant checks also keep on increasing.

Although Process - 4 is quite efficient and takes less number of comparisons, we can even speed up the process. We can further improve the process by eliminating the composite numbers upto n. If we can have a set of prime numbers within the range of n, we can further eliminate some more comparisons. The only drawback here is that, we have to mark a limit of the check. Let's say, we are having a set of all prime numbers upto 500. So, we can check primality for a number upto 250000.

So, we can see that, using different approaches, we can find the solution of a problem and also we found that using some mathematics, we can improve the approach.

But a million dollar question arises just at this point, 'How can you define that Process - 4 is better than Process - 1 ? Is it true that Process - 4 is the best one ?'

Well, to find answer of this question, we've to use some mathematics again, we will discuss over the process in our subsequent discussion known as 'Asymptotic Analysis'.

Till then, you can check with some prime numbers, generate series of prime numbers.
To ease your task, I have created an online tool. You can definitely take a look and play with this tool, Prime Series Generator.

Hope, you enjoyed this article, thanks for reading !!!

Prev     Next
Palash Kanti Kundu