Compiling C and assembling ASM into machine code

Question

Compiling C and assembling ASM into machine code

I have three questions:

What compiler can I use and how can I use it to compile C source code to machine code ?
Which assembler can be used and how can I use it to assemble ASM into machine code ?
(not necessary). How do you recommend placing the machine code at the correct addresses (i.e. the machine code of the bootloader should be placed in the boot sector)?

My goal: I am trying to build a basic operating system. This will use a personally made bootloader and kernel. I will also try to take bits and pieces from the Linux kernel (namely drivers) and integrate them into my kernel. I hope to create a 32-bit DOS-like operating system for communicating with memory on most modern computers. I don't think I will create an executable format for my operating system, as my operating system will not be dynamic enough to require it.

My situation: I am working on an x86-64 Windows 8 laptop with an Intel Celeron processor; I believe it is using Secure Boot. I would test my operating system on an x86-64 desktop with an Intel Core I3 processor. I have an average understanding of operating systems and their methods. I know C theory, ASM and the computer required for this project. I think it's also worth noting that I am sixteen years old with no formal computer science education.

My research: . After searching Google for which C is usually compiled for, I found answers ranging from machine code, binary, plain binary, source binary, assembly, and relocatable. The assembly, as I understand it, is usually assembled into an executable file in PE format. I have heard of Cygwin, GCC C and MingW C compilers. In terms of assemblers, I have heard of FASM, MASM and NASM. I've searched sites like OSDev and OSDever .

What I tried: I tried to set up GCC (nightmare) and build a cross compiler (another nightmare).

Conclusion . As you can tell, I am changing in confusion regarding compilers, assemblers and executable formats. Please develop my ignorance and answer my questions. This is probably the only thing stopping me from having an OS on my resume. Sorry I would have included more links, but stackoverflow wouldn't let me do more than two. Thanks a ton!

+3

c assembly compilation machine-code

user2035846 02 Feb 13 at 20:57

source to share

4 answers

Carl norum · Answer 1 · 2013-02-02T21:06:00+0000

First, some quick answers to your three questions.

Almost any compiler translates C code into assembly code. What compilers do. GCC and clang are popular and free.

clang -S -o example.s example.c

Whichever compiler you choose will probably support assembly just by using the same compiler driver.

clang -o example.o example.s

Your linker documentation tells you how to place specific code at specific addresses, etc. If you are using GCC or clang as described above, you will probably use ld(1)

. In this case, read "linker scripts".

Further, some notes:

You don't need a cross compiler or GCC setup yourself. You are working on an Intel machine, generating code for an Intel machine. Any clang or GCC binary distribution that comes with your linux distribution should work fine.
C compilers typically compile code into an assembly, and then pipe the resulting assembly to the system assembler to produce machine code. Machine code, binary, regular binary, raw binary are all mostly synonyms.
The generated machine code is packaged in some kind of executable file format to tell the host operating system how to load and run the code. On windows it's PE, on Linux it's ELF, and on Mac OS X it's Mach-O.
You don't need to create an executable format for your OS, but you probably want to use one. ELF is a fairly simple (and well-documented) option.

And a little personal note that I hope will not bother you too much - if you are not very familiar with how compilers, assemblers, linkers and all these tools work, your project will be very complex and confusing. You might want to start with small projects to get the "sea feet" so to speak.

johnfound · Answer 2 · 2013-02-02T22:02:23+0000

At first, "machine code" and "binary" are synonymous. The "object code" is a kind of middleware that the linker will convert to binary at the end. Some C / C ++ compilers do not generate binary directly, but assembler source code, which they pass to the assembler, which generates the object code, and then the linker, which makes the final binary. In most cases, these processes are transparent to the user. You feed the compiler with C / C ++ / Pascal / any source code and get a binary in the output.

FASM assembler, also known as flatassembler , is the best assembler for OS development. Several operating systems have already been created in FASM.

This is because FASM is self-compiling and very portable. Thus, within 2-3 days you can transfer it to your OS, and then your OS will become self-sufficient - ie. You will be able to compile programs from your operating system.

Another good thing about FASM is that it doesn't need a linker - it can generate binaries directly in multiple formats.

A large active community is also very important. There are many sources for FASM, including OS development.

The message board is very active and is a place to learn a lot.

user257111 · Answer 3 · 2013-02-02T22:45:49+0000

I think the first part of your question has been answered, so I'll take the other two:

What assembler can I use and how can I use it to build ASM for machine code?

One of nasm

, yasm

(mostly very similar nasm

) fasm

, "masm", i.e. ml64.exe

, ml.exe

and is freely available as part of Microsoft tools.

Of these, I probably recommend either nasm

or yasm

. This recommendation is entirely based on personal preference - but for the most part, the wide range of platforms supported, and the use of default Intel syntax. I'll try a few and see what you like.

(not necessary). How would you recommend placing the machine code at the correct addresses (i.e. the machine code of the bootloader should be placed in the boot sector)?

Well, there is only one way to place the bootloader at the correct address for the MBR - open the disk at LBA 0 and write exactly 512 bytes there, ending with 0x55AA

. Rinse then close. The MBR usually also has a partition table embedded in it - this is both code and data. A shorthand term for this Von Neumann Architecture material , which can be summarized briefly as "programs and data are stored in one place." The BIOS's action to boot from disk will be to read the first 512 bytes in memory, verify the signature, and if it matches, execute that memory (starting at byte 0).

It's good that these questions are aside. Now I will give you some more notes:

512 bytes for the bootloader is not enough for any use. Thus, some file systems contain boot sectors, and the loader itself simply loads the code / data found in them. This allows large amounts of code to be loaded - enough to get the kernel. For example grub contains legacy stage1, stage1_5 and stage2 components.
While most operating systems require an executable format container, you do not need one. On disk and in memory, executable code is one, two, or three byte strings called opcodes. You can read the option link or Intel / AMD manuals to see what hex value translates to what. In any case, you can do a direct assembly to binary conversion with nasm like this:
```
 nasm -f bin input.asm -o output.asm

      

        
        
        
      

    
```
Which will work for 16, 32 or 64 bit assembler rather happily, although the result will most likely fail. The only thing that will happen is if you explicitly use the directive [bits 16]

in your code as well org 100h

, then you have the MSDOS.com program. Unfortunately, this is the simplest binary format in existence: you only have code and data in one big computer, and this should not exceed the size of one segment.

I feel like this can handle this point:

I found answers from machine code, binary, plain binary, source binary, assembly and relocatable object.

Assembly response - collects opcodes and memory addresses depending on the assembler. This is represented in bytes, which are data themselves. You can read them raw with a hex editor, although there are a few cases where this is strictly necessary. I mention memory addresses because some opcodes control how memory addresses are interpreted - for example, relocatable object code requires addresses not to be hardcoded (instead they are interpreted as offsets from the current location).

The assembly, as I understand it, is usually assembled into an executable file in PE format.

It is fair to say that the assembler from which your C / C ++ was derived is compiled into opcodes, which, together with anything else that must be included in the program (data, resources), are saved in an executable format. eg physical education Usually depends on your OS.
If you read the OSDev Wiki completely, you will realize that segmented addressing is a complete pain - the standard and only use of segments in modern operating systems is to define four segments spanning the entire address space - two given segments in ring 0 and 3, two code segments in the ring 0 and 3.
If you haven't read the OSDEV Wiki , you should. I also recommended James' Kernel Tutorials for practical advice on building a C kernel.
If you just want to do bad things in the DOS kernel, you can actually still write the complete kernel yourself without having to write. You should also be able to switch the CPU to protected mode from DOS. You need FreeDOS and an assembler of your choice. There is a great tutorial on terminating and staying resident , which basically means connecting to an abort routine and then editing yourself from the list of active processes in the Rootkit Arsenal . There may be tutorials on the internet for this as well.

I might be tempted to recommend doing this as the first one just to get used to this kind of low level material.
If you just want to expose the OS, you can set up kernel debugging in Windows. WinDbg is a little ... arcane, but once you get used to it, it makes sense.
You mentioned that your laptop is using Secure Boot. If so, your laptop is using UEFI. If you want to read about it, the UEFI spec is 100% guaranteed to be more boring than your math homework, but I recommend hiding it just for understanding the purpose and basic environment. It is important that the EFI SDKwhich allows you to build EFI compliant applications (which are in PE format and exist in a FAT32 partition on your disk - so installing an EFI bootloader is very easy even if you write it wrong. If I had to make an honest recommendation, I would stick with MBR for now since emulating an OS with MBR is much easier than EFI at the time and you really want to do it in some form of VM at the moment.Also, I would use an existing one like grub since bootloaders are not really all that fun ...
Others have said this and I will say it: you absolutely want to do something like this in an emulator or virtual machine. You will make a mistake, you guarantee, and you will run into something that you don't understand. VM emulators and software are free these days, and some, like BOCHS, will tell you what is causing a given error, trap, etc. It is very useful!

Keith nicholas · Answer 4 · 2013-02-02T21:09:11+0000

First, use something like a virtual window for testing

I think you might want to do a few smaller steps, get usable C code.

and then see how boot sectors work on disks (well documented on the internet), also look at the code of other open source bootloaders.

Then see how to perform task switching. It's not that hard to write. You can even write down most of it while running under your regular OS, before trying to build into your own OS.

With C compilers you can usually mix in asm inline usually with asm { /* assembly code */ }

Compiling C and assembling ASM into machine code

More articles: