Basic OS X installation and Mach-O format
I am interested in programming in assembly x86-64
on the Mac OS X platform. I came across this page on creating a 248B Mach-O program which led me to the Mach-O Format Reference for Apple . After that, I thought I would make the same simple C program in Xcode and run the generated build.
This was the code:
int main(int argc, const char * argv[])
{
return 42;
}
But the generated assembly was 334 rows containing (based on model 248B) a lot of redundant content.
First, why is there so much DWARF debug information included in the Release build of the C executable? Second, I notice that the Mach-O header data is included 4 times (in different DWARF-linked sections
). Why is this necessary? Finally, the Xcode build includes:
.private_extern _main
.globl _main
_main:
.cfi_startproc
But in program 248B all this is nowhere to be seen - instead, the program starts with _start
. How is this possible if all programs by definition start at main
?
Full Xcode build:
# Assembly output for main.c
# Generated at 4:04:08 PM on Sunday, January 20, 2013
# Using Release configuration, x86_64 architecture for Tiny target of Tiny project
.section __TEXT,__text,regular,pure_instructions
.file 1 "/Users/####/Desktop/Tiny/Tiny/main.c"
.section __DWARF,__debug_info,regular,debug
Lsection_info:
.section __DWARF,__debug_abbrev,regular,debug
Lsection_abbrev:
.section __DWARF,__debug_aranges,regular,debug
.section __DWARF,__debug_macinfo,regular,debug
.section __DWARF,__debug_line,regular,debug
Lsection_line:
.section __DWARF,__debug_loc,regular,debug
.section __DWARF,__debug_pubtypes,regular,debug
.section __DWARF,__debug_str,regular,debug
Lsection_str:
.section __DWARF,__debug_ranges,regular,debug
Ldebug_range:
.section __DWARF,__debug_loc,regular,debug
Lsection_debug_loc:
.section __TEXT,__text,regular,pure_instructions
Ltext_begin:
.section __DATA,__data
.section __TEXT,__text,regular,pure_instructions
.private_extern _main
.globl _main
_main: ## @main
.cfi_startproc
Lfunc_begin0:
.loc 1 12 0 ## /Users/####/Desktop/Tiny/Tiny/main.c:12:0
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
##DEBUG_VALUE: main:argc <- EDI+0
##DEBUG_VALUE: main:argv <- RSI+0
movl $42, %eax
.loc 1 15 5 prologue_end ## /Users/####/Desktop/Tiny/Tiny/main.c:15:5
Ltmp5:
popq %rbp
ret
Ltmp6:
Lfunc_end0:
.cfi_endproc
Ltext_end:
.section __DATA,__data
Ldata_end:
.section __TEXT,__text,regular,pure_instructions
Lsection_end1:
.section __DWARF,__debug_info,regular,debug
Linfo_begin1:
.long 127 ## Length of Compilation Unit Info
.short 2 ## DWARF version number
Lset0 = Labbrev_begin-Lsection_abbrev ## Offset Into Abbrev. Section
.long Lset0
.byte 8 ## Address Size (in bytes)
.byte 1 ## Abbrev [1] 0xb:0x78 DW_TAG_compile_unit
Lset1 = Lstring0-Lsection_str ## DW_AT_producer
.long Lset1
.short 12 ## DW_AT_language
Lset2 = Lstring1-Lsection_str ## DW_AT_name
.long Lset2
.quad 0 ## DW_AT_entry_pc
.long 0 ## DW_AT_stmt_list
Lset3 = Lstring2-Lsection_str ## DW_AT_comp_dir
.long Lset3
.byte 1 ## DW_AT_APPLE_optimized
.byte 2 ## Abbrev [2] 0x27:0x3e DW_TAG_subprogram
Lset4 = Lstring3-Lsection_str ## DW_AT_name
.long Lset4
.byte 1 ## DW_AT_decl_file
.byte 11 ## DW_AT_decl_line
.byte 1 ## DW_AT_prototyped
.long 101 ## DW_AT_type
.byte 1 ## DW_AT_external
.quad Lfunc_begin0 ## DW_AT_low_pc
.quad Lfunc_end0 ## DW_AT_high_pc
.byte 1 ## DW_AT_frame_base
.byte 86
.byte 3 ## Abbrev [3] 0x46:0xf DW_TAG_formal_parameter
Lset5 = Lstring5-Lsection_str ## DW_AT_name
.long Lset5
.byte 1 ## DW_AT_decl_file
.byte 11 ## DW_AT_decl_line
.long 101 ## DW_AT_type
Lset6 = Ldebug_loc0-Lsection_debug_loc ## DW_AT_location
.long Lset6
.byte 3 ## Abbrev [3] 0x55:0xf DW_TAG_formal_parameter
Lset7 = Lstring6-Lsection_str ## DW_AT_name
.long Lset7
.byte 1 ## DW_AT_decl_file
.byte 11 ## DW_AT_decl_line
.long 125 ## DW_AT_type
Lset8 = Ldebug_loc2-Lsection_debug_loc ## DW_AT_location
.long Lset8
.byte 0 ## End Of Children Mark
.byte 4 ## Abbrev [4] 0x65:0x7 DW_TAG_base_type
Lset9 = Lstring4-Lsection_str ## DW_AT_name
.long Lset9
.byte 5 ## DW_AT_encoding
.byte 4 ## DW_AT_byte_size
.byte 4 ## Abbrev [4] 0x6c:0x7 DW_TAG_base_type
Lset10 = Lstring7-Lsection_str ## DW_AT_name
.long Lset10
.byte 6 ## DW_AT_encoding
.byte 1 ## DW_AT_byte_size
.byte 5 ## Abbrev [5] 0x73:0x5 DW_TAG_const_type
.long 108 ## DW_AT_type
.byte 6 ## Abbrev [6] 0x78:0x5 DW_TAG_pointer_type
.long 115 ## DW_AT_type
.byte 6 ## Abbrev [6] 0x7d:0x5 DW_TAG_pointer_type
.long 120 ## DW_AT_type
.byte 0 ## End Of Children Mark
Linfo_end1:
.section __DWARF,__debug_abbrev,regular,debug
Labbrev_begin:
.byte 1 ## Abbreviation Code
.byte 17 ## DW_TAG_compile_unit
.byte 1 ## DW_CHILDREN_yes
.byte 37 ## DW_AT_producer
.byte 14 ## DW_FORM_strp
.byte 19 ## DW_AT_language
.byte 5 ## DW_FORM_data2
.byte 3 ## DW_AT_name
.byte 14 ## DW_FORM_strp
.byte 82 ## DW_AT_entry_pc
.byte 1 ## DW_FORM_addr
.byte 16 ## DW_AT_stmt_list
.byte 6 ## DW_FORM_data4
.byte 27 ## DW_AT_comp_dir
.byte 14 ## DW_FORM_strp
.ascii "\341\177" ## DW_AT_APPLE_optimized
.byte 12 ## DW_FORM_flag
.byte 0 ## EOM(1)
.byte 0 ## EOM(2)
.byte 2 ## Abbreviation Code
.byte 46 ## DW_TAG_subprogram
.byte 1 ## DW_CHILDREN_yes
.byte 3 ## DW_AT_name
.byte 14 ## DW_FORM_strp
.byte 58 ## DW_AT_decl_file
.byte 11 ## DW_FORM_data1
.byte 59 ## DW_AT_decl_line
.byte 11 ## DW_FORM_data1
.byte 39 ## DW_AT_prototyped
.byte 12 ## DW_FORM_flag
.byte 73 ## DW_AT_type
.byte 19 ## DW_FORM_ref4
.byte 63 ## DW_AT_external
.byte 12 ## DW_FORM_flag
.byte 17 ## DW_AT_low_pc
.byte 1 ## DW_FORM_addr
.byte 18 ## DW_AT_high_pc
.byte 1 ## DW_FORM_addr
.byte 64 ## DW_AT_frame_base
.byte 10 ## DW_FORM_block1
.byte 0 ## EOM(1)
.byte 0 ## EOM(2)
.byte 3 ## Abbreviation Code
.byte 5 ## DW_TAG_formal_parameter
.byte 0 ## DW_CHILDREN_no
.byte 3 ## DW_AT_name
.byte 14 ## DW_FORM_strp
.byte 58 ## DW_AT_decl_file
.byte 11 ## DW_FORM_data1
.byte 59 ## DW_AT_decl_line
.byte 11 ## DW_FORM_data1
.byte 73 ## DW_AT_type
.byte 19 ## DW_FORM_ref4
.byte 2 ## DW_AT_location
.byte 6 ## DW_FORM_data4
.byte 0 ## EOM(1)
.byte 0 ## EOM(2)
.byte 4 ## Abbreviation Code
.byte 36 ## DW_TAG_base_type
.byte 0 ## DW_CHILDREN_no
.byte 3 ## DW_AT_name
.byte 14 ## DW_FORM_strp
.byte 62 ## DW_AT_encoding
.byte 11 ## DW_FORM_data1
.byte 11 ## DW_AT_byte_size
.byte 11 ## DW_FORM_data1
.byte 0 ## EOM(1)
.byte 0 ## EOM(2)
.byte 5 ## Abbreviation Code
.byte 38 ## DW_TAG_const_type
.byte 0 ## DW_CHILDREN_no
.byte 73 ## DW_AT_type
.byte 19 ## DW_FORM_ref4
.byte 0 ## EOM(1)
.byte 0 ## EOM(2)
.byte 6 ## Abbreviation Code
.byte 15 ## DW_TAG_pointer_type
.byte 0 ## DW_CHILDREN_no
.byte 73 ## DW_AT_type
.byte 19 ## DW_FORM_ref4
.byte 0 ## EOM(1)
.byte 0 ## EOM(2)
.byte 0 ## EOM(3)
Labbrev_end:
.section __DWARF,__apple_names,regular,debug
Lnames_begin:
.long 1212240712 ## Header Magic
.short 1 ## Header Version
.short 0 ## Header Hash Function
.long 1 ## Header Bucket Count
.long 1 ## Header Hash Count
.long 12 ## Header Data Length
.long 0 ## HeaderData Die Offset Base
.long 1 ## HeaderData Atom Count
.short 1 ## eAtomTypeDIEOffset
.short 6 ## DW_FORM_data4
.long 0 ## Bucket 0
.long 2090499946 ## Hash in Bucket 0
.long LNames0-Lnames_begin ## Offset in Bucket 0
LNames0:
Lset11 = Lstring3-Lsection_str ## main
.long Lset11
.long 1 ## Num DIEs
.long 39
.long 0
.section __DWARF,__apple_objc,regular,debug
Lobjc_begin:
.long 1212240712 ## Header Magic
.short 1 ## Header Version
.short 0 ## Header Hash Function
.long 1 ## Header Bucket Count
.long 0 ## Header Hash Count
.long 12 ## Header Data Length
.long 0 ## HeaderData Die Offset Base
.long 1 ## HeaderData Atom Count
.short 1 ## eAtomTypeDIEOffset
.short 6 ## DW_FORM_data4
.long -1 ## Bucket 0
.section __DWARF,__apple_namespac,regular,debug
Lnamespac_begin:
.long 1212240712 ## Header Magic
.short 1 ## Header Version
.short 0 ## Header Hash Function
.long 1 ## Header Bucket Count
.long 0 ## Header Hash Count
.long 12 ## Header Data Length
.long 0 ## HeaderData Die Offset Base
.long 1 ## HeaderData Atom Count
.short 1 ## eAtomTypeDIEOffset
.short 6 ## DW_FORM_data4
.long -1 ## Bucket 0
.section __DWARF,__apple_types,regular,debug
Ltypes_begin:
.long 1212240712 ## Header Magic
.short 1 ## Header Version
.short 0 ## Header Hash Function
.long 2 ## Header Bucket Count
.long 2 ## Header Hash Count
.long 20 ## Header Data Length
.long 0 ## HeaderData Die Offset Base
.long 3 ## HeaderData Atom Count
.short 1 ## eAtomTypeDIEOffset
.short 6 ## DW_FORM_data4
.short 3 ## eAtomTypeTag
.short 5 ## DW_FORM_data2
.short 5 ## eAtomTypeTypeFlags
.short 11 ## DW_FORM_data1
.long 0 ## Bucket 0
.long 1 ## Bucket 1
.long 193495088 ## Hash in Bucket 0
.long 2090147939 ## Hash in Bucket 1
.long Ltypes0-Ltypes_begin ## Offset in Bucket 0
.long Ltypes1-Ltypes_begin ## Offset in Bucket 1
Ltypes0:
Lset12 = Lstring4-Lsection_str ## int
.long Lset12
.long 1 ## Num DIEs
.long 101
.short 36
.byte 0
.long 0
Ltypes1:
Lset13 = Lstring7-Lsection_str ## char
.long Lset13
.long 1 ## Num DIEs
.long 108
.short 36
.byte 0
.long 0
.section __DWARF,__debug_pubtypes,regular,debug
Lset14 = Lpubtypes_end1-Lpubtypes_begin1 ## Length of Public Types Info
.long Lset14
Lpubtypes_begin1:
.short 2 ## DWARF Version
Lset15 = Linfo_begin1-Lsection_info ## Offset of Compilation Unit Info
.long Lset15
Lset16 = Linfo_end1-Linfo_begin1 ## Compilation Unit Length
.long Lset16
.long 0 ## End Mark
Lpubtypes_end1:
.section __DWARF,__debug_loc,regular,debug
Ldebug_loc0:
.quad Lfunc_begin0
.quad Ltmp6
Lset17 = Ltmp8-Ltmp7 ## Loc expr size
.short Lset17
Ltmp7:
.byte 85 ## DW_OP_reg5
Ltmp8:
.quad 0
.quad 0
Ldebug_loc2:
.quad Lfunc_begin0
.quad Ltmp6
Lset18 = Ltmp10-Ltmp9 ## Loc expr size
.short Lset18
Ltmp9:
.byte 84 ## DW_OP_reg4
Ltmp10:
.quad 0
.quad 0
Ldebug_loc4:
.section __DWARF,__debug_aranges,regular,debug
.section __DWARF,__debug_ranges,regular,debug
.section __DWARF,__debug_macinfo,regular,debug
.section __DWARF,__debug_inlined,regular,debug
Lset19 = Ldebug_inlined_end1-Ldebug_inlined_begin1 ## Length of Debug Inlined Information Entry
.long Lset19
Ldebug_inlined_begin1:
.short 2 ## Dwarf Version
.byte 8 ## Address Size (in bytes)
Ldebug_inlined_end1:
.section __DWARF,__debug_str,regular,debug
Lstring0:
.asciz "Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)"
Lstring1:
.asciz "/Users/####/Desktop/Tiny/Tiny/main.c"
Lstring2:
.asciz "/Users/####/Desktop/Tiny"
Lstring3:
.asciz "main"
Lstring4:
.asciz "int"
Lstring5:
.asciz "argc"
Lstring6:
.asciz "argv"
Lstring7:
.asciz "char"
.subsections_via_symbols
First, why is there so much DWARF debugging information included in the release build of the C executable?
The ability to debug optimized code is incredibly useful. It is not uncommon for errors to appear only in optimized assemblies. If you are going to write an assembly, it is unlikely that you will care about the DWARF information, so I would suggest creating a comparison code with no argument -g
.
Second, I notice that Mach-O header data is included 4 times (in different DWARF related sections). Why is this necessary?
These are not the Mach-O headers you see. They are table headers for DWARF Accelerator , an LLVM extension for DWARF that optimizes checking whether a symbol is installed within a given compilation block.
But in program 248B all this is nowhere to be seen — instead, the program starts with _start. How is this possible if all programs, by definition, start mostly?
Historically, all programs in OS X start with start
. However, this symbol usually comes from the system library and is not defined by the program itself. The system implementation start
will do some initialization and then jump to your "real" program entry point.
The entry points into Mach-O binaries are determined by the load commands LC_UNIXTHREAD
or LC_MAIN
. When the LC_UNIXTHREAD
convention for versions prior to OS X 10.8 is used with a regular C or C ++ program, the linker is used start
as the entry point. This character usually comes from /usr/lib/crt1.o
, and its address is written in the instruction pointer field of the boot command LC_UNIXTHREAD
. The 248B binary you referenced includes the LC_UNIXTHREAD
c command eip
set to 0x000010e8. This is the address of the symbol _start
. Since this little program is a static executable and the binary is generated directly, it can write whatever address it wants in the command pointer field of the load command.
If you create your OS X 10.8+ executable targeting, the linker will generate the download command LC_MAIN
instead LC_UNIXTHREAD
. The kernel knows that binaries using the command LC_MAIN
should be executed by loading the dynamic linker and navigating to its entry point. The dynamic linker, dyld, initializes itself and then navigates to the address specified in the command LC_MAIN
. In this bold new world, the symbol start
is not used at all.