Jump to content

AT&T-X86: intro

From Wikiversity

What's AT&T assembly

[edit | edit source]

AT&A syntax assembly is one of the two major dialects of x86 assembly alongside Intel syntax. As the name suggests, AT&T syntax was developed by AT&T.

Why AT&T and not Intel syntax

[edit | edit source]

The main difference between the two is the assembler support. A assembler is a piece of software that turns the assembly code into the actual binary that your computer can run. The most common assemblers for the x86 architecture are[1]:

For all upcoming examples I am going to be using the GNU Assemblers's AT&T syntax, though the concepts should translate to most other x86 syntaxes.

What AT&T looks like

[edit | edit source]

When learning most programming languages you start with the infamous "Hello, World!" program. In assembly though that would look something like this:

.section	.data
    foo:   .string	"Hello, World!\n"

.section    .text
	.globl	_start
_start:
	leaq	foo(%rip), %rax
	movq	%rax, %rdi
	call	puts@PLT
	movl	$0, %eax
	ret

(Note: if you use gcc, make sure to replace '_start' with 'main')

Break-Down:

[edit | edit source]
.section .data

This first row dictates that the following code should be put in the data section. Later on there will be a more detailed coverage of the data section at .data.

    foo:    .string "Hello, World!\n"

This line declares a string with value "Hello, World!\n" under the label 'foo'. We can later use this label as a reference for the computer to know where the string is stored. As with the .data section, storing variables in the data section will be further explained on the data page.

    
.section    .text

Line 3 is empty and counts as white-space, like with C, C++, and some other languages in assembly white-space is ignored. Line 4 again says the following code should be out in a certain section, but this time it is the .text section. generally the .data section is used for declaring and defining data (so variables) and the .text section is used for code though this gets further explained in the .text and Sections lessons.

    .globl _start

This line declares '_start' as a global label, this means that the _start label can be accessed by systems outside of your program. In this case that is needed for the computer to know where to start the code execution. Note that if you use gcc to assemble and/or link your code you will have to use main instead of _start because gcc already defines _start for _start to then call main, for the rest main can usually by used the same as _start.

_start:

This line defines the actual start of the _start (or main) label. The code that comes after will be considered part of the (in this case _start) label. The Labels & Basic Keywords/Operations lesson will explain the rest of the functionality of labels.

	leaq	foo(%rip), %rax
	movq	%rax, %rdi
	call	puts@PLT

This is the part where we actually 'print' the string. Line 7 stores the string in the rax register. Regesters will be explained in the Registers & exit codes lesson, but in short a register is a small amount of memory (usually either 32 or 64 bits in size depending on your CPU) that is directly built into your CPU and therefore can be accessed really quickly and efficiently. Each register also has its own small 'ability' like how the rip (or eip for 32 systems) will always hold the memory location of the current command that is being executed, but that will be explained in the Registers & exit codes lesson for now all you will need to know is that it is a efficient temporary way to store data. In Line 8 the contents of rax get moved to rdi. Both the rax and the rdi register now hold our string. One more thing registers can do is pass arguments (parameters). The last line: Line 9 actually calls the function to print our string. Puts is a C function similar to printf but a little simpler. @PLT stands for 'Procedure Linkage Table' and is usually just an easy way to access pre-existing functions. In assembly you can not just write for example 'foo(Arg1, Arg2)' to call a function because the call keyword takes just one argument, though you can of course pass arguments, this is done in one of two ways:

  1. If you have 6 or less arguments you can use rdi, rsi, rdx, r10, r8 and r9 for Argument 1, 2, 3, 4, 5 and 6 respectively (Note that these are 64 bit registers, on a 32 bit system you will have to use: ebx, ecx, edx, esi and ebp).
  2. If you have more than 6 arguments you can use the stack. (for the srack see lesson: Using the Stack).
    movq    $0, %eax
    ret

Lastly we will have to put in the assembly equivalent of the 'return 0;' line in C. In line 10 the value 0 is moved into the eax register (Note that $ is used to denote Imeadiate values, so '$300' means 300, but just '300' usually means memory address 300) and then on line 11 the ret keyword will return control from the program to the OS. The eax register will later be read by your operation system as the 'return code' so -1 would be seen as a error just like with the return keyword in the C main function.

Next: C's Relation to Assembly

Citations:

[edit | edit source]
  1. "x86 Assembly/x86 Assemblers - Wikibooks, open books for an open world". en.wikibooks.org. Retrieved 2024-12-02.
  2. "x86 Assembly/x86 Assemblers - Wikibooks, open books for an open world". en.wikibooks.org. Retrieved 2024-12-02.