Assembly “Hello world” execution file less than 20 bytes

520 Views Asked by At

The code below takes 20 bytes. Yet, there’s a way to make it even smaller through interrupts. How?

A
MOV AH,9
MOV DX,108
INT 21
RET
DB 'HELLO WORLD$'

R CX
14
N MYHELLO.COM
W 
3

There are 3 best solutions below

3
On

FYI Here is my attempt at writing a short hello-world DOS .exe program.

I'm not trying to overlap the code and the message (like @Peter Cordes in their DOS .com solution), but (as we will see it later), it wouldn't help reducing the size of the program.

Surely a DOS hello-world .exe program will be longer than an equivalent DOS .com program, because the DOS .exe header is 28 bytes (and the DOS .com header is 0 bytes).

I was able to squeeze it to 40 bytes, including the 28 bytes of DOS .exe header and the 15 bytes of message, totaling 43 bytes (!), by making both the code and the message overlap the DOS .exe header. Here is my solution:

; nasm -O0 -f bin -o prog.exe prog.nasm
exe:  ; DOS .exe header: http://justsolve.archiveteam.org/wiki/MS-DOS_EXE
.signature      db 'MZ'
.lastsize       dw end-exe  ; Number of bytes in the last 0x200-byte block in the .exe file. For us, total file size.
.nblocks        dw 1  ; Number of 0x200-byte blocks in .exe file (rounded up).
.nreloc         dw 0  ; No relocations.
.hdrsize        dw 0  ; Load .exe file to memory from the beginning.
..@code:
%if 1  ; Produces identical .exe output even if we change it to 0.
.minalloc:      mov ax, 0x903  ; AH := 9, AL := junk. The number 3 matters for minalloc, see below.
.ss_minus_1:    mov dx, message+(0x100-exe)  ; (0x100-exe) to make it work with any `org'.
.sp:            int 0x21  ; Print the message to stdout. https://stanislavs.org/helppc/int_21-9.html
.checksum:      int 0x20  ; Exit. Requires CS == PSP. https://stanislavs.org/helppc/int_20.html
%else
.minalloc       dw 0x03b8  ; To have enough room for the stack, we need minalloc >= ss+((sp+0xf)>>4)-0x20 == 0x315. kvikdos verifies it.
.maxalloc       dw 0xba09  ; Actual value doesn't matter.
.ss             dw 0x0118  ; Actual value doesn't matter as long as it matches minalloc. dw message
.sp             dw 0x21cd  ; Actual value doesn't matter. int 0x21
.checksum       dw 0x20cd  ; Actual value doesn't matter. int 0x20
%endif
.ip             dw ..@code+(0x100-exe)  ; Entry point offset. (0x100-exe) to make it work with any `org'.
.cs             dw 0xfff0  ; CS := PSP upon entry.
%if 0
.relocpos       dw ?  ; Doesn't matter, overlaps with 2 bytes of message: 'He'.
.noverlay       dw ?  ; Doesn't matter. overlaps with 2 bytes of message: 'll'.
; End of 0x1c-byte .exe header.
%endif
message         db 'Hello, World!', 13, 10, '$'
end:

See more comments at https://github.com/pts/mininasm/blob/master/demo/hello/helloe.nasm

Here are the tricks involved:

  • Most .exe programs have mov ax, @data followed by mov ds, ax to initialize ds. (We need a correct ds to print a message with ah == 9 and int 0x21.) This is rather long, and it also uses a 4-byte relocation slot for @data. One way to avoid it is to make ss in the .exe header be 0, and then push ss followed by pop ds also works, and it doesn't use any relocation. Even better: don't initialize ds, make use of the fact that DOS initializes it to the PSP segment (which is 0x100 bytes long, and it's right in front of the program image in memory), so just add 0x100 to dx instead when printing.

  • Set the cs in the .exe header 0xfff0, and this will initialize cs to the PSP segment. If the value of cs is the PSP segment, then int 0x20 can be used to exit to DOS with exit code 0, rather than the longer mov ax, 0x4c00 followed by int 0x21. This saves 3 bytes.

  • Unfortunately the xchg bp, ax trick to set ah to 9 doesn't work before MS-DOS 4.00, so my solution doesn't use it. The usual mov ah, 9 is 1 byte longer. My solution uses mov ax, 0x903 instead, which is even longer by 1 bytes, seemingly wasteful (because the 3 in al is not used), but there is room in the overlapped DOS .exe headers for that extra byte, and the small value of 3 reduces the memory usage of the program, because the mov ax, 0x903 instruction overlaps with minalloc in the .exe header.

  • The relocpos and noverlay .exe header fields at the end of the .exe header are ignored by DOS, so my solution overlaps them with the message. relocpos is only used if the .exe program contains relocations (mine doesn't), and noverlay is completely ignored by DOS when loading (and executing) the .exe file.

  • The single important remaining trick is completely embedding the code into the DOS .exe header. It looks like the code fits nicely in the 10 bytes of the consecutive .exe header fields minalloc, maxalloc, ss, sp, checksum, and the actual code bytes used don't make problems when DOS is interpreting those header fields when loading the program. See the comments in the source code for more details. Some more info: we don't want a large minalloc, that would increase the memory requirements of the program. maxalloc doesn't matter (but make it at least as large as minalloc for good manners). We need a large enough ss and sp so that the stack doesn't overlap the code and the message (e.g. the initial value should point to at least 32 bytes above the end), and also minalloc must be large enough so that the stack also fits. The checksum doesn't matter, old and new DOS versions (tested with PC DOS 2.00 as the oldest) all ignore it.

  • It's not possible to fit the message in its entirety to the DOS .exe header, because the message itself is too long. It's also not possible to overlap it more, because then it would have to overlap the ip and cs header fields, and we need very specific values there. Thus we can't save more bytes if embed the code entirely in the message rather than entirely in the .exe headers. I've decided to do the latter for simplicity, and also for making the message flexible.

  • To make the .exe file shorter, make the message shorter (down to 4 bytes, including the $). Less than 4 bytes may not work, because some DOS versions insist that an .exe program file is at least 28 bytes long, because that's the minimum size of the DOS .exe header.

9
On

Print a shorter message, like db 'hi$' :P
Or as Vitsoft suggests, take the string as an arg like the Unix echo command, so it doesn't take up space in your program.

Or depend on some values that some DOS versions leave in registers when your program starts, if you don't care about portability or only relying on documented guarantees. (e.g. save one byte by using xchg ax, bp instead of mov ah, 9.)

I have no idea what you mean by "There’s a way to make it even smaller through interrupts." I don't think that's true, but you're stating it as a fact, not asking. If you don't know how, what's your source for it being possible to make this smaller via interrupts, while still printing the same output string?


Almost certainly int 21h / ah=9 is the most compact way to print multiple bytes of text. You need to get AH=9 and DX=pointer somehow. Without relying on existing bytes in convenient places in registers or memory that some DOS version might happen to leave lying around, that takes a 2-byte mov ah,9 and a 3-byte mov dx, imm16.

You can set DX=0 with xor dx,dx, but even the very start of your file is at offset 100h in a .com program. (And that would mean letting the ASCII text execute as machine code without a jmp over it!)

call label / db "text" / label: pop dx would be 4 bytes total to get the pointer into DX.


Use uninitialized register values left by some known DOS version.

http://www.fysnet.net/yourhelp.htm linked from Tips for golfing in x86/x64 machine code on codegolf.SE found the startup register values across an array of DOS versions. This is not standardized, AFAIK, so it's just a happens-to-work. Later versions of FreeDOS became more and more similar to MS-DOS, because presumably some existing software was written to rely on it, on purpose, by accident, or because some people didn't know that "works on my machine" isn't the same thing as "guaranteed future proof and portable", but various other DOS versions differ. This is not something you should rely on for production use, only silly computer tricks like code golf or "demo scene" programs.

Most DOS versions happen to leave SI=0100h at program startup. So if we can put our string there without messing up the machine (or SI), we can mov dx, si (2 bytes) instead of mov dx, 108h or 107h (3 bytes). But lea dx, [si+8] is 3 bytes (opcode + modrm + disp8), so no saving unless we let the string execute.

Or even better, if there's something on the stack you could use for pop dx? Or popa, if you're extremely lucky also setting AX=09xx. But I don't know if any DOS versions happen to leave any known stuff on the stack, other than the "return" address which points at an int 20h instruction or something. Popping that would mean exiting manually with int 20h instead of ret, costing 1 more byte.

Actually, xchg ax, reg is only 1 byte, so if any register starts with 09xx, we can use that. MS-DOS 4.0 and later, FreeDOS 1.0, and IBM PC-DOS 4.0 and later, all start with BP=09xxh. So we can save a byte in AH init by using xchg ax, bp, separate from anything with DX. (Fun fact: This is where the 90h NOP encoding comes from: it's just a special case of xchg ax, ax, until x86-64 had to document it as an actual NOP because in 64-bit mode it doesn't zero-extend EAX into RAX like xchg eax, eax would).


Letting the text execute as code

To save any more bytes, the only hope I see for making this shorter is using text that happens to decode as instructions that let execution come out the other side, without messing up SI, so you can put it in the path of execution. But at best you're saving 1 byte, unless the text also contains instructions that do anything useful.

But your message don't work for that. I checked out how 3 capitalizations of it would disassemble, using nasm to make a flat binary and ndisasm -b16 to disassemble the result. (I used align 16 so I could find the boundary, and to give a sort of nop slide so if the last byte of the string was not the last byte of an instruction, it would consume some of that padding instead of changing decoding of the next string.) I don't have DOS or a debug.exe, so I'm using trailing-h syntax on hex numbers. In DOS Debug, all numbers are implicitly hex, that's why int 21 is the right number. I also haven't tested these, I'm not that interested in obsolete 16-bit stuff, but the x86 machine code shenanigans are fun. Although true challenge questions are off-topic on Stack Overflow, this kind of single-language optimization question is a better fit here than on https://codegolf.stackexchange.com/

; just to look at disassembly, to see if there's any hope of letting them execute
DB 'HELLO WORLD$'
align 16
db 'Hello World$'
align 16
db 'hello world$'
;; DB 'HELLO WORLD$'
00000000  48                dec ax        ; early-alphabet upper-case 
00000001  45                inc bp        ; is all single-byte inc/dec
00000002  4C                dec sp        ; the same opcodes x86-64 repurposed as REX
00000003  4C                dec sp         ;; Modified SP breaks RET
00000004  4F                dec di
00000005  20574F            and [bx+0x4f],dl   ;; step on part of the PSP
00000008  52                push dx       ; 'R'  also modifies SP
00000009  4C                dec sp        ; 'L'
0000000A  44                inc sp        ; 'D'   cancel each other's effect on SP
0000000B  2490              and al,0x90
0000000D  90                nop
0000000E  90                nop
0000000F  90                nop

Nifty. So it doesn't actually do anything fatal to the machine (depending on where BX is pointing). With BX=0 from the DOS versions that give the BP and SI values we want, that would mask away some bits in [ds: 4f], which is in the reserved part of the PSP (program segment prefix). This may be fine if nothing else ever looks there before we exit, or during the DOS exit call.

But note the and al, 0x90 ending: the string itself ended with 24h, aka '$', as the start of an instruction. That's the opcode for and al, imm8, so it consumes 1 byte of whatever's next as part of that instruction.

So you'd need a byte of padding after it before you could put the start of a useful instruction. That would kill the 1-byte size saving.

And it messes up SP, so we can't ret anymore. We'd need int 20h to exit, unless you can bail out with CC int3 or something. Not sure what DOS does on that exception.

;; 20 bytes, cancelling out saving from  xchg ax,bp
DB 'HELLO WORLD$'  ; executes as machine code without doing anything too bad
nop                ; but this is needed.  It's actually consumed as an immediate for 24h
mov  dx, si
xchg ax, bp        ; AH=09  on some DOS versions, in 1 byte instead of 2.
int  21
int  20            ; larger than ret, making this a net loss.

Other capitalizations are a problem, involving 'l' as 6C insb IO instructions (https://www.felixcloutier.com/x86/ins:insb:insw:insd), and similarly 'o' as 6F outsw.

;; db 'Hello World$'
00000010  48                dec ax
00000011  656C              gs insb       ; big problem, IO could crash the machine
00000013  6C                insb          ; using port=DX, data from [DS:SI]
00000014  6F                outsw
00000015  20576F            and [bx+0x6f],dl
00000018  726C              jc 0x86       ; conditional branch, but AND always clears CF so this will be not-taken
0000001A  642490            fs and al,0x90    ; FS prefix was new with 386
0000001D  90                nop               ; the align 16 padding, including previous immediate
0000001E  90                nop
0000001F  90                nop
;; db 'hello world$'
00000020  68656C            push word 0x6c65
00000023  6C                insb
00000024  6F                outsw
00000025  20776F            and [bx+0x6f],dh
00000028  726C              jc 0x96
0000002A  64                fs
0000002B  24                db 0x24

Again the 24h byte is left dangling, as the start of an instruction. If there'd been a nop after it, ndisasm would have decoded it as fs and al, 0x90 like in the previous block.

Looks like ello is a problem, with IO instructions.


We need the 2nd-last byte of the string to be something else, like the start of a 2-byte instruction, ideally something like 3C ib cmp al, imm8. That's ASCII <.

And we need it not to mess up SP. If it decrements it, we need to increment or pop into dummy registers, so it's once again pointing at the return address.

18 byte version, printing a modified string of same length

;; 18 bytes
DB 'HELLO_WOIY<$'  ; executes as machine code, returning SP to original position without overwriting return address

mov  dx, si    ; mov dx,0100h MS-DOS (all versions), FreeDOS 1.0, many other DOSes
xchg ax, bp    ; mov ah,9     MS-DOS 4.0 and later, and FreeDOS 1.0
int  21h
ret

Disassembles as

00000000  48        dec ax     ; 'H'
00000001  45        inc bp     ; 'E' affects BP, which we want to use later
00000002  4C        dec sp     ; 'L'
00000003  4C        dec sp     ; 'L'  ; SP offset by -2
00000004  4F        dec di     ; 'O'
00000005  5F        pop di     ; '_'  ; restore SP
00000006  57        push di    ; 'W'  ; SP offset by -2
00000007  4F        dec di     ; 'O'
00000008  49        dec cx     ; 'I'
00000009  59        pop cx     ; 'Y'  ; restore SP
0000000A  3C24      cmp al,0x24   ; '<' consumes the '$' as an imm8

0000000C  89F2      mov dx,si      ; instructions from the source, as written.
0000000E  95        xchg ax,bp
0000000F  CD21      int 0x21
00000011  C3        ret

This does inc bp, modifying one of the registers we're relying on for an initial value. But unless the low byte was FF to start with, it won't wrap and change the 09 in the high half. On FreeDOS 1.0 specifically, the initial BP value is 091Eh. On MS-DOS versions from Win9x, it's 0912h. On DOS from Win-NT derived versions, it's 09xxh, which doesn't rule out 09FFh

I had to mangle the string pretty seriously to balance the stack, with an even number of dec sp instructions and pops to balance that. The 1-byte 58+ rw pop reg includes some of the late upper-case alphabet letters.

Also had to avoid add [si], sp or things like that, since the initial SI points at our string. (The initial BX typically doesn't.)


HELLO_WOLD< has one too many pushes, but the 'LD' part cancels out, dec sp / inc sp. In that order so it doesn't temporarily leave part of your return address below SP, where an interrupt or debugger could clobber it.

If you really wanted to get serious about coming up with a string that visually looked more like HELLO WORLD, you'd want to make a table of ASCII character and the corresponding instruction. Many upper-case ASCII characters are opcodes for single-byte instructions, either inc/dec or push/pop.

You could use a good assembler like NASM with a %rep / %assign i i+1 / db i / %endrep block, and run it through a disassembler. Or write a program to output a binary file and disassemble that.

Or look at http://ref.x86asm.net/coder32.html and match it up with https://asciitable.com/


Can we use the text as machine code to at least exit the program? Unlikely; ret is c3, int 20h is CD 20, so neither of those opcodes will appear in ASCII text.

AFAIK, you can't tail-call a DOS routine to have it print and then exit without needing a ret or equivalent in your own code. Or if you can, it would be a 3-byte jmp rel16, or more likely a far jmp, which would take more bytes than 2+2 for mov ah, 9 / int 21h if we're talking about the jmp ptr16:16 form.

0
On

You can shorten it to 8 bytes if you don't mind providing the text as its argument:

R:\>debug
-A
0DB2:0100 MOV AH,9
0DB2:0102 MOV DX,82
0DB2:0105 INT 21
0DB2:0107 RET
0DB2:0108
-R CX
CX 0000  :8
-N HELLO8.COM
-W
Writing 0008 bytes
-Q

R:\>HELLO8.COM HELLO WORLD$
HELLO WORLD
R:\>