Turbo Pascal Deconstructed(simonwillison.net)
7 points by alberto-m 3 days ago | 6 comments
- ThrowawayR2 3 days agoThat raises questions though: How does he know that the chart is correct? Furthermore, whether it's correct or not, what was the process by which the LLM reached its conclusions, i.e. did it disassemble and examine the source itself or did it use pre-existing reverse engineering done by others on the internet like https://www.pcengines.ch/tp3.htm?[-]
- simonw 3 days agoYou can see hints of what it did in https://claude.ai/share/260d2eed-8d4a-4b9f-8a75-727c3ec4274e - annoyingly though it looks like Claude sharing doesn't detail actual code it ran.
Here's the zip file it gave me of the files it generates along the way: https://static.simonwillison.net/static/2026/turbo-pascal-an...
I had Codex GPT-5.4 xhigh run a check of those files to see if the artifact at the end appeared to use the right data, which isn't 100% fool proof but have me enough confidence to publish since this is a pretty low stakes project!
[-]- fm77 7 hours agoThank you very much for that super analysis done with the help of AI - I really enjoyed reading that. May I ask are you paying for that service? And if so, how much?
Anyhow, I downloaded your ZIP file and looked into the disassembly. It seems that the disassembler simply disassembled byte by byte not taking into account that TURBO.COM is both, code and data. Since the x86 instruction set is very tense, pretty much every byte sequence turns into legal instructions. Even the ASCII strings were disassembled. Look at address hex4864 in the file for example - it should be the string "Write block to file" but got disassembled. I wonder how AI managed that obscure file.
[-]- simonw 7 hours agoI ran the analysis using regular Claude. I'm paying $200/month by the $20/month subscription should work fine too, and it might even work with the free plan.
- rep_lodsb 13 hours agoFor the code generator, it produced this annotated disassembly:
Obviously, there has to be a lot more to even a simple-minded x86 code generator than just a generic "emit opcode byte" and "emit call" routine. In general, what A"I" produced here is not a full disassembly but a collection of short snippets, potentially not even including the really interesting ones. But is it even correct?2100 push ax ;--- EmitByte: write one byte to code output --- 2101 mov di, [code_ptr] ;DI → current position in output buffer 2104 stosb ;Write AL to output, advance DI 2105 mov [code_ptr], di ;Update code pointer 2108 pop ax ;Restore AX 2109 ret ;Every compiled instruction flows through this 6-instruction emitter 2110 mov al, 0E8h ;--- EmitCall: generate CALL instruction --- 2112 call EmitByte ;Emit opcode byte E8h (near CALL) 2115 sub bx, [code_ptr] ;Calculate relative offset 2118 sub bx, 2 ;Adjust for instruction length 211A xchg ax, bx ;AX = relative offset 211B call EmitWord ;Emit 16-bit relative displacement 211E ret ;Generated: E8 lo hi — a complete CALL instructionEmitByte here is unnecessarily pushing/popping AX, which isn't modified by the few instructions in between at all. No competent assembly language programmer would do this. So maybe against all expectations, Turbo Pascal is just really badly coded? No, it's of course a hallucination: those instructions don't appear in the binary at all!
That the hex addresses are wrong can already be seen in the instruction "mov di,[code_ptr]" here being apparently only three bytes long. In reality it would take four! And it's easy to confirm that this code isn't present at the addresses shown.
So maybe it's somewhere else? x86 disassembly can be complicated because the opcodes are variable length, and particularly in old programs like this the code and data are often not cleanly separated. Claude apparently ran it through NDISASM, which doesn't even attempt to handle that task.
But searching for e.g. the hex opcode B0 E8 ('mov al,0xe8') is enough to confirm that this code snippet isn't to be found anywhere.
There is a lot more suspicious code, including some that couldn't possibly work (like the "ret 1" in the system call dispatcher, which would misalign the stack).
Conclusion: it's slop
[-]- simonw 10 hours agoThanks for this, I've added that to my write-up of the project here: https://simonwillison.net/2026/Mar/20/turbo-pascal/#hallucin...