### Division on a Microcontroller

I was using an AVR microcontroller to learn assembly language and encountered a neat trick for doing a division by a constant.

The processor I'm using is an atmega328p which is an 8 bit processor and while it does not have a divide instruction, it has a multiply instruction that is quite fast (2 clock cycles).

In this example I will be writing a short assembly routine which will divide an 8-bit number by 10.

## Things to keep in mind

A useful identity which will help us in this case is the fact that **a / b = a * (1 / b)**. What this means is that a division is equivalent to a multiplication by the reciprocal of the divisor (ex. **5 / 10 = 5 * 0.1**)

Another issue with the processor is that it can only handle integers. There is no support for floating point operations. However this can be remedied by using fixed point arithmetic. In fixed point arithmetic you place an implicit decimal point between two particular bits. As an example of this, imagine having to work with currency values. Instead of having your calculations done in euros, you calculate everything in cents. This way you will only have whole numbers to work with. The important thing is that whatever the current value, you always know that if you just write a decimal point between the 2nd and 3rd digits from the right, you will have that value in euros again. Example: **€1.5 + €2.3 = 150c + 230c**. The result for the 2nd expression is **380c**. If I imagine a decimal point between the 2nd and 3rd digits, I get the value in euros instead of cents (**€3.80**)

Addition and subtraction are very easy to do in fixed point arithmetic. You just add or subtract the values together, and everything else will fall into place. Multiplication is a bit trickier, but not by much. Let's say I have two numbers **X = 2.45** and **Y = 3.5** and I multiply them together. Their answer, which I'll call **Z** is **8.575**. An observation about this is where the decimal point ends up. It's actually pretty simple. **X** has 2 digits after the decimal place, while **Y** has 1 digit after the decimal place. **Z** has 3 digits after the decimal point. I told you it's not really that complicated! So when we do a multiplication we just have to keep in mind that the result will have as many digits in the fractional part, as the original numbers had combined.

## Approximating 0.1

The first thing to do is to convert **0.1** to binary. I won't go into detail how this is done here but there's a pretty good explanation over here. We run into yet another problem pretty soon. **0.1** cannot be accurately represented in binary! it is converted to 0.00011001100110011001100...

But we don't need to represent **0.1** entirely accurately (which is impossible). As stated earlier, the processor handles 8 bit values, therefore it can only handle numbers from 0 to 255. Therefore we only need our representation of **0.1** to be accurate enough to give good results for this range of numbers. In effect we won't be calculating **x / 10**, but we will only be approximating it. But approximating it in a way that happens to give the correct result for all values from 0 to 255.

Now the way integer division works is that it will truncate the fractional part. There is no rounding. In integer arithmetic **99 / 100 = 0** because the fractional part (0.99 in this case) is discarded. What this means is that my approximation of **0.1** must be larger than or equal than **0.1**. If I choose a value which is slightly smaller, I will get results which are smaller than they should be. Consider **10 / 10**: If we calculate this using an approximation that is slightly smaller than **0.1** then the result will definitely be smaller than 1 (since we're adding less than **0.1** for 10 times). After discarding the fractional part we're left with the result **0** which is wrong. It should be **1**.

With this revelation I can deduce another thing. Since **0.1** cannot be represented accurately in binary, each additional bit I calculate, will get me slightly closer to **0.1** but never actually get me to **0.1**. Therefore I will definitely have to round my approximation of this number up at some point. With this other revelation I can deduce that I only need to check positions where there is a 0 bit immediately followed by a 1 bit. If I round up at these positions, I will get a value slightly larger than 0.1. *01* will round up to *10*. (Example: The first four bits of the representation will be 0.0001, which has a value of 0.0625 which is less than my target value. If I round that up I get 0.0010 which has a value of 0.125, which is slightly larger.) If I round up wherever I encounter *11*, the result will be equivalent to one which I encountered earlier (there will always exist an *01* before a *11*). *00* and *10* have nothing to round since the second bit is 0

So we're making progress now. We know that the calculated value for **0.1** must be slightly larger than 0.1. But how will we know if it's precise enough? Another important question is how will I know that my value is not *too* precise (which would mean I'm using more bits than I need)? To do this I will measure the Error of my approximation. This is actually really simple. I just take the difference of my approximation and the target value. If I take the example in the above paragraph, there we arrived at a possible approximation of **0.125**. Therefore the Error in this case is **0.125 - 0.1 = 0.025**. Now how will I know if this value will work for all the possible values (0 to 255)?

In order to reason this out it helps to think of a multiplication as a repeated addition. To keep the example simple, let's assume I am creating an approximation of the number 3, which I will use to multiply with the number 4. Therefore I want to calculate **3 * 4**. Because multiplication is really just repeated addition, I can write it as **3 + 3 + 3 + 3** (3 * 4 means 3 for 4 times). Now let's imagine that when I was creating an approximation of the number **3** and I have arrived at the number **3.1**. As we've seen in the above paragraph, this has an Error of **0.1**. Let's do the multiplication we mentioned here with this approximation, but we'll do the additions one by one.

3.1 = 3.1 3.1 + 3.1 = 6.2 3.1 + 3.1 + 3.1 = 9.3 3.1 + 3.1 + 3.1 + 3.1 = 12.4

Now keep in mind that we originally wanted to multiply by **3** not **3.1**. As you can see the first line is off by 0.1, since the answer was supposed to be 3. The second answer is off by 0.2. The third is off by 0.3, etc. What I wanted to demonstrate with this is that the Error will accumulate the more "additions" you make (Again I'll repeat, multiplication is the same as repeated addition). In this example I chose these numbers as it is easier to demonstrate. Now let's get back to our original problem.

So now we've learned that the error accumulates. Previously we've arrived at an approximation of **0.125** which has an Error of **0.025**. What is the largest number of additions that we can do? The answer for that is **255**, since that is the largest number we can work with. So we can multiply the Error by **255** to find out the largest error that can arise in any of our calculations. The answer we get is **6.375**. This is definitely not good for us. Since the total error is larger than our target value of **0.1**, that would mean that If I were to add my approximation together for **255** times, it would be like I added **0.1** an extra 63 times! I only want to add it for 255 times. So the maximum accumulated error must definitely be smaller than my target value of **0.1**

So now we know for sure that we need to be more precise. Let's see what other positions we need to check:

0.000110011001100110011 ^ ^ ^ ^ ^

We need to check the marked positions, as they are positions that will allow us to round the value up. We've already worked through the first one and discovered that it's not accurate enough. I'll just work out the other positions until I find a suitable one

BINARY VALUE ERROR ACCUMULATED ERROR GOOD ENOUGH? 0.001 0.125 0.025 6.375 no 0.0001101 0.1015625 0.0015625 0.3984375 no 0.00011001101 0.10009765625 0.00009765625 0.02490234375 yes

So now we've managed to find an approximation, **00011001101**, of **0.1** which is good enough to work for all values from 0 to 255, because if we add it for 255 times (equivalent to multiplying it by 255), the total error is less than **0.1**. This means that If I add my "**0.1**" for 255 times, I can be certain that I have only added for exactly 255 times. Sure the fractional part will be incorrect... but we're going to be discarding it anyway, so we don't care if it is incorrect.

## Now what?

Now the fun stuff begins. I need to implement a division by 10 on my processor using this value I just calculated. Here I encounter yet another problem. The value I calculated has 11 bits (You don't count the 0 before the point) but my processor can only handle 8 bit values! Remember that I will have no decimal point in there (since we will be working in fixed point arithmetic). Let's temporarily represent that number as a 16 bit number instead of 8 bit. We get **00000000 11001101**. The upper byte is all 0! And since we will treat it as a whole number instead of a fraction, leading 0s mean absolutely nothing. So we don't need to store them. We can store the lower 8 bits (which will fit in our processor) only. However it is very important to remember that there were 11 digits after the decimal point, the reason for which will become apparent shortly. So after all this, our approximation of **0.1** is finally something which we can plug into the processor: The byte **11001101**

Now let's work an example because things would be clearer this way. Suppose I have the number **42**, and I want to divide it by 10. The number **42** in binary is **00101010**. Now I have my 2 bytes that I can process using my processor.

42 = 00101010 0.1 = 11001101

If I multiply those two together I get the following 16-bit binary number

00100001 10100010

Now we just have to get our final answer out of that. But how will we do that? Earlier in this post I wrote about multiplication in fixed point. Merely the observation that the result will have as many fractional digits as the original numbers had, combined. So let's see how many fractional digits there are in our values in this example. The number **42** (00101010) has 0 fractional digits. We just converted 42 to 42. Simple so far. Our approximation of **0.1** (11001101) has 11 fractional digits (even though we don't see the decimal point, I told you to remember this fact about it). So that means that our result will have **0 + 11 = 11** fractional digits. Let's see the result once again and this time I will put the decimal point where it should be

00100.001 10100010

Now since this is integer division, we should discard the fractional part which would leave us with:

00100

If we convert that back to decimal we get **4**, which is the correct answer for **42/10**.

## One more step...

There is one final issue left, which is very simple to solve. Basically we have to somehow tell the processor to discard the fractional part. This is actually very simple, and we will accomplish this by the use of bit shifts. A right bit shift is a very simple operation. It will simply move all bits one position to the right, and discard the rightmost bit (which has nowhere to go). We need to remove 11 digits from the right of our value (since the fractional part has 11 digits) so we shift right 11 times. Let's see this in action:

00100001 10100010 1: 00010000 11010001 2: 00001000 01101000 3: 00000100 00110100 4: 00000010 00011010 5: 00000001 00001101 6: 00000000 10000110 7: 00000000 01000011 8: 00000000 00100001 9: 00000000 00010000 10: 00000000 00001000 11: 00000000 00000100

As you can see after 11 bit shifts to the right, we end up with the number 4, which is our final answer.

Now in our processor, 11 bit shifts will take 11 instructions. But there is a simple way to reduce them. As I said earlier, the processor can only work with 8 bit values. So when it works out the multiplication, it does not give us a 16-bit result. Instead it gives us two bytes which together form the 16 bit result. This is convenient for us, since we can save 8 shifts by simply ignoring the lower byte. Here's how it will work:

00100001 10100010 (complete result) 00100001 (we ignore the lower byte) 00010000 (shift right) 00001000 (shift right) 00000100 (shift right)

And as you can see we still ended up with the number 4 (in a convenient single byte) at the end.

## The code

After all this trouble, here's the assembly code I wrote to do all this:

LDI R16, 42 ;Load 42 into register R16 LDI R17, 0b11001101 ;Load our approximation of 0.1 into R17 MUL R16, R17 ;Multiply R16 and R17. ;This puts the low byte and high byte ;of the result in R0 and R1 respectively LSR R1 ;three right shifts of the high byte LSR R1 LSR R1 ;The final answer is in R1

## Why all this trouble?

The reason we went to all this trouble is that division is *slowwwwww*. The manufacturer of this microcontroller has provided some generic multiply and divide routines that can be used with it. They can be found here. If you look at the 8x8 unsigned division algorithm in there, it states that that algorithm takes 58 clock cycles to complete. In my case I have sacrificed flexibility (my algorithm can only divide by 10!) for performance. The assembly code written above runs in 7 clock cycles, which is slightly over 8 times faster. There is another trick I could use to make it execute in 6 clock cycles instead of 7, but that is beyond the scope of this post.

I hope you found this post interesting, and I wish you luck if you ever decide to get into assembly programming. It's not easy, but it's actually quite fun once you start getting the hang of it.