Writing your own toy operating system: Enabling the A20 line

I can see blonde, redhead...

Here we are again, with yet another step in the process of building a second-stage bootloader for our own toy operating system. In the previous sections (here and here) we learned how to create and set up the global descriptor table (GDT) and the interrupt descriptor table (IDT), which the CPU uses to access and control memory and execute interrupts calls. Both of these structures are necessary to switch the CPU to protected mode.

We are now actually close to reaching that holy grail now. Only a few more preparatory steps are necessary to switch to protected mode, which will finish off our second-stage boot loader and allow us to start writing our kernel in C (which will be a relief from all the assembler code).

In this article, we will talk a little more about memory (we had already done that extensively in the past), but there’s a final hoop we need to jump through in order to have the the full 4GB of memory at our kernel’s disposal.

This article is part of a short series “Writing your own toy operating system.” Here is an index to the entire series.

The A20 line

Having created a global descriptor table with code selectors and data selectors at 4 KB granularity, we should theoretically have access to 4GB of memory. However, we are still prevented from that because of a legacy problem (it seems that most of what we’ve been doing has been getting rid of legacy stuff: 16-bit code, real-mode segmented memory addressing, old-style interrupts…). That problem is this.

I lied to you when I said that the memory chip had 20 address lines. It actually has more.Well, I didn’t really lie, since the 8088 actually did have 20 lines and that was the processor we had been discussing.

Let’s do a bit of math to see what is going on.

  • The Intel 8088 processor had 20 address lines. These 20 bits could hold a number between 0 and 2^20 = 1,048,576, which is the number of different memory cells we could address. But we never address individual cells – rather we address blocks of 8 cells (0ne byte). Consequently, we could address one megabyte of memory.
  • The Intel 80286 processor, by contrast, had 24 address lines. This is because it could address 16 MB of memory. After all, 2^24 = 16,777,216 = 16 MB. Also, this was a 16-bit processor which could load entire word (16 bits) from memory at the same time.
  • The Intel 80386 processor had 32 address lines, so it could address 2^32 = 4,294,967,296 = 4 GB of memory. Also, it was a 32-bit processor which could load entire double words (32 bits) from memory at the same time, which made it faster.

Since we’re aiming for 80386+ processors (since we want to use protected mode with 4GB of memory), we’re actually looking at 32 address lines and not 20. That’s a big plus, because these 32 lines can contain one double word, which is wide enough to address the entire 4GB space in one go without any of the segmentation business.

Legacy pain

But – the CPU won’t let us because of a tiny technicality. You see, when it first starts up the CPU will assume that there are only 20 address lines. We can’t use the rest, so we’re limited to accessing only 1 MB of memory. Here’s why.

In the olden days of the 8088, you could access 1 MB of memory using the segmented real-mode model. We already know what. But consider the following memory address: ffff:0010. It’s a valid real-mode address. But as it turns out, it points to a place beyond one megabyte. See:

  • The segment 0xffff, times 16 bytes, points to linear address 0xffff0. That’s byte number 1,048,560.
  • The offset ox0010 adds 16 bytes to that. That gives us a final position of 1,048,576.
  • But! One megabyte is a range of 0 through 1,048,575. We’re actually pointing to a place one byte past the one megabyte mark!

What would happen in this case that (on the Intel 8088 processor), the memory would “wrap around”. The address ffff:0010 would actually point to address 0000:0000. Similarly, address ffff:0011 would point to 0000:0001 and so on.

Now here’s the problem: people would actually write code that depended on this “feature” of the processor.

At the time that the Intel 80286 processor introduced more address lines, code that depended on the memory wrapping at the one megabyte mark would cease to function. Apparently there was enough of this code around to make Intel engineers decide to disable the additional address lines, and it became necessary to manually turn them on to make use of the higher memory. More precisely, one has to turn on the 21st address line in order to make all higher address lines available. This address line is known as the A20 line, since you start counting the lines from zero.

Turning on the A20 line

Turning on the 21st address line is unfortunately a bit arcane. There are various ways to do it, and it’s necessary to implement them all since you never know which processor (well, at least 80386 or better) you’re running on. Here’s what you do, in order:

Test if the A20 line is already enabled

On some recent processors, the A20 may be enabled by default. That makes things easy, but we’ll still need to check whether this is the case before moving on.

Use a BIOS function

The BIOS actually offers a function to turn on the A20 line. At least, some BIOSes do. Not all, so we can try using this function and if it doesn’t work, we move on.

Use the keyboard controller

Here’s a funny thing: the Intel 8042 keyboard controller had a spare pin, and this pin was used to route the A20 line through. It’s not as if the keyboard controller had anything to do with the memory chip address lines, but I suppose it seemed handy at the time. We can communicate with the keyboard controller to try and turn on the A20 line. And if that doesn’t work…

The “Fast A20″ method

Some processors (or rather, the chipsets that contain them) actually contain special ports that we can write some data to to cause the A20 line to be enabled. This is called the Fast A20 method because it’s, well, fast. Nevertheless, not all chipsets support this trick and writing the bytes to the chipset might actually do something else entirely. Therefore, we can only try this method after we’ve exhausted all other methods. Consequently, it’s really not very fast.

We’ll look at (and write code for) all these methods below.

Testing the A20 line

Whenever we try a method that enable to A20 line, we’ll need to check before whether it’s already enabled. And after, to see whether our trick was successful. This is done by testing whether the memory wraps at the one megabyte mark.

Here’s the recipe:

  • Write a byte (say 0×00) to address 0000:0500.
  • Write a byte (say 0xff) to address ffff:0510
    If the memory wraps around, these positions will refer to the same byte in memory.
  • If the byte at 0000:0510 is now 0xff, then the memory wrapped around.
  • If the byte at 0000:0500 is still ox00, then the memory did not wrap around (and the A20 line is enabled).

In code:

.func CheckA20
  CheckA20:
    pushf                          # Save registers that
    push ds                        # we are going to
    push es                        # overwrite.
    push di
    push si
 
    cli                            # No interrupts, please
 
    xor ax, ax                     # Set es:di = 0000:0500
    mov es, ax
    mov di, 0x0500
 
    mov ax, 0xffff                 # Set ds:si = ffff:0510
    mov ds, ax
    mov si, 0x0510
 
    mov al, byte ptr es:[di]       # Save byte at es:di on stack.
    push ax                        # (we want to restore it later)
 
    mov al, byte ptr ds:[si]       # Save byte at ds:si on stack.
    push ax                        # (we want to restore it later)
 
    mov byte ptr es:[di], 0x00     # [es:di] = 0x00
    mov byte ptr ds:[si], 0xFF     # [ds:si] = 0xff
 
    cmp byte ptr es:[di], 0xFF     # Did memory wrap around?
 
    pop ax
    mov byte ptr ds:[si], al       # Restore byte at ds:si
 
    pop ax
    mov byte ptr es:[di], al       # Restore byte at es:di
 
    mov ax, 0
    je check_a20__exit             # If memory wrapped around, return 0.
 
    mov ax, 1                      # else return 1.
 
  check_a20__exit:
    pop si                         # Restore saved registers.
    pop di
    pop es
    pop ds
    popf
    ret
.endfunc

That’s quite a long piece of code, but it was written in such a way that it doesn’t disturb the state of all registers it affects so it can be called from anywhere in the second-stage boot loader. What it effectively does is precisely what’s outlined in the recipe above.

Enabling the A20 line through the BIOS

As outlined above, there is a BIOS function that enables the A20 line, but it’s only available on some BIOSes. The specification is here. It’s interrupt 0×15, subfunction 0×2401. Apparently it’s supported on PS/2 models, and also the AMI PCI BIOS and Qualitas 386MAX v6.01+. Very specific, yes.

This is how you call it:

.macro mSetA20BIOS
  mov ax, 0x2401
  int 0x15
.endm

This may or may not enable the A20 line. For BIOSes that do not implement this interrupt function, nothing happens. We’ll need to test the status of the A20 line after the call.

Enabling the A20 line through the 8042 keyboard controller

The keyboard controller can sometimes be used to enable the A20 line (on any chipset where the A20 line was routed through the keyboard controller). Here’s the recipe to do it:

  • Disable the keyboard (send command 0xad to the keyboard controller)
  • Tell controller that we want to read input (send command 0xd0)
  • Read one byte of input
  • Tell the controller that we want to write output (send command 0xd1)
  • In the byte just read, enable bit #2 and write it to the controller
  • Enable the keyboard (send command 0xae)

When we want to send a command to the keyboard controller, we need wait until it is ready to receive a command. This is done by hanging in a loop while polling the controller:

.func Wait_8042_command
Wait_8042_command:
  in      al,0x64
  test    al,2
  jnz     Wait_8042_command
  ret
.endfunc

Similarly, when we want to send a byte of data, we must wait until the controller is ready to receive it. We need to hang and poll once again:

.func Wait_8042_data
Wait_8042_data:
  in      al,0x64
  test    al,1
  jz      Wait_8042_data
  ret
.endfunc

Now to implement the recipe:

.macro mSetA20Keyboard
    cli                        # Disable interrupts
 
    call    Wait_8042_command  # When controller ready for command
    mov     al,0xAD            # Send command 0xad (disable keyboard).
    out     0x64,al
 
    call    Wait_8042_command  # When controller ready for command
    mov     al,0xD0            # Send command 0xd0 (read from input)
    out     0x64,al
 
    call    Wait_8042_data     # When controller has data ready
    in      al,0x60            # Read input from keyboard
    push    eax                # ... and save it
 
    call    Wait_8042_command  # When controller is ready for command
    mov     al,0xD1            # Set command 0xd1 (write to output)
    out     0x64,al            
 
    call    Wait_8042_command  # When controller is ready for command
    pop     eax                # Write input back, with bit #2 set
    or      al,2
    out     0x60,al
 
    call    Wait_8042_command  # When controller is ready for command
    mov     al,0xAE            # Write command 0xae (enable keyboard)
    out     0x64,al
 
    call    Wait_8042_command  # Wait until controller is ready for command
 
    sti                        # Enable interrupts
.endm

Quite a bit of code again, but we’re doing very little. We’re sending a couple of commands to the keyboard controller in the hopes that the A20 line will become enabled. Afterwards, we’ll have to test once again to see whether it was, in fact, enabled.

Enabling the A20 line through the “Fast A20″ method

This is the method that, while it may or may not work, may crash the computer when it doesn’t. It must therefore be tried last. As said above, some chipsets have a special input/output port (0×92) that we can write to to enable the A20 line. This is what you do:

  • Read a byte from port 0×92
  • Enable bit #2 of that byte
  • Write the byte back.

So here goes:

.macro mSetA20FastGate
  in al, 0x92
  or al, 2
  out 0x92, al
.endm

Not only is it fast (which is not too relevant for us as we’ll have to try the other methods first), it’s also mercifully short. Once again, having executed this code, the A20 line may have become enabled. If it still hasn’t, then we must give up and conclude we’re running on a system that has no A20 line (like an Intel 8086).

Putting it all together

Okay, so we’ve written all these little functions and macros that we can use to try and turn on the A20 line. What remains is joining it all in once macro that calls them one by one, tests whether the A20 line has become enabled, and eventually quits if nothing worked. We’ll do the following:

  • Check whether the A20 is already enabled. If so, leave.
  • Attempt to enable the A20 through the BIOS.
  • Check whether the A20 is now enabled. If so, leave.
  • Attempt to enable the A20 through the keyboard controller.
  • Check whether the A20 is already enabled. If so, leave.
  • Attempt to enable the A20 through the Fast A20 method.
  • Check whether the A20 is already enabled. If so, leave.
  • Print an error message and reboot.

In code:

.macro mEnableA20
    call CheckA20
    cmp  ax, 0
    jne  enable_A20__done
    mSetA20BIOS
    call CheckA20
    cmp  ax, 0
    jne  enable_A20__done
    mSetA20Keyboard
    call CheckA20
    cmp  ax, 0
    jne  enable_A20__done
    mSetA20FastGate
    call CheckA20
    xchg bx, bx
    cmp  ax, 0
    jne  enable_A20__done
  enable_A20__fail:
    mWriteString a20error
    mReboot
  enable_A20__done:
.endm

We’re actually making use of some macros that we had written before for the first-stage bootloader, which we’ll simply include in our second-stage bootloader as well: mWriteString and mReboot. We’ll also define a string somewhere, a20error, which reads “A20 unavailable.” We’ll print it if we can’t manage to turn on the A20 line.

Summary

In this section, we examined how to turn on the 21st address line from the CPU to the memory chip, which is in most cases by default disabled because of a legacy compatibility problem. We’ll need it to be able to access 4GB of memory. Without it, we would be limited to only 1 MB of memory, beyond which the memory wraps back to position zero.

The A20 line can be turned on using a variety of methods (through the BIOS, through the keyboard controller, and through a special port supported by some chipsets). We must try them all in the hopes that one works. After each attempt, we check whether the A20 line is now enabled by seeing whether the memory wraps.

Having implemented this, we are one step closer to jumping to protected mode. We’re almost there!

This series continues here with the final preparatory steps for entering protected mode.


4 Comments

  1. Isaac says:

    Hi. I’m reporting an error. The line:
    “The segment 0xffff, times 16 bytes, points to linear address 0xfff0. That’s byte number 1,048,560.”
    is supposed to be:
    “The segment 0xffff, times 16 bytes, points to linear address 0xffff0. That’s byte number 1,048,560.”
    You might think that this error doesn’t make such a difference. But it does. I was reading the article and I couldn’t figure out what’s flying! I got scared! I thought I might have not understood “memory segmentation” correctly. Until I calculated the address and found that it was nothing, but a mistake.
    Please correct it… Thank you!

  2. Isaac says:

    Is this line also a mistake?
    “If the byte at 0000:0510 is now 0xff, then the memory wrapped around.”
    should be:
    “If the byte at 0000:0500 is now 0xff, then the memory wrapped around.”


Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">