NES delay code generator

NES 6502 fixed-cycle delay code generator

NES 6502 fixed-cycle delay code vending machine

NES 6502 / RP2A03 / RP2A07 fixed-cycle delay code generator

NES 6502 / RP2A03 / RP2A07 fixed-cycle delay code vending machine

Choose: The code is permitted to…

Choose constraints. The code is permitted to…

Choose what generated delay code is permitted to do:

Choose the constraints. The generated delay code is permitted to…

Registers

 ? 

a: Permits the generation of delay code that leaves the A register in unknown state, by using instructions such as LDA or ASL.

 ? 

x: Permits the generation of delay code that leaves the X register in unknown state, by using instructions such as LDX or DEX.

 ? 

y: Permits the generation of delay code that leaves the Y register in unknown state, by using instructions such as LDY or DEY.

 ? 

stack: Permits the generation of delay code that leaves the S register in unknown state (mismatched pairs of push and pop operations). Use only if you are absolutely certain that you don’t need the stack for anything, such as for an RTS or RTI instruction, or if you have backed the stack pointer in memory and plan to restore it later. Note that using this option will not cause previously pushed values in stack to be modified.


 ? 

c: Permits the generation of delay code that leaves the C flag in unknown state, for example by issuing CMP or ASL instructions.

 ? 

zn: Permits the generation of delay code that leaves the Z and N flags in unknown state. As nearly all instructions modify these flags, protecting these flags involves wrapping the delay code in PHP-PLP, and will enlarge the code. If nostack is also chosen, the only remaining option may be a very long sequence of NOPs.

 ? 

d: Permits the generation of delay code that leaves the D flag in unknown state, by issuing CLD or SED instructions. In the NES and the Famicom, the decimal mode flag has no effect in any calculation and changing it is completely harmless. On second-source 6502-compatible CPUs, on the Commodore 64, and on bad emulators, the situation may be different.

 ? 

v: Permits the generation of delay code that leaves the V flag in unknown state, for example by issuing SBC or BIT instructions.


Memory

 ? 

ram: @zptemp defines a zero-page memory address that can be read and overwritten with random data. If you check this option, you must have defined this label. Example: @zptemp = $AA

 ? 

ptr: @ptrtemp is a two-byte variable in zero-page, that you have defined, and which points to some memory location that can be safely read and overwritten with random data. If you check this option, you must have defined the variable. Such as with: @ptrtemp = $12. And the variable must contain a pointer, such that e.g. sta (@ptrtemp),y is harmless with y=0 and a=any value.



Utility functions

 ? 

rts12: In order to utilize the delay_n macro, you must have defined a label "@nearby_rts", which points to anywhere in memory that you know contains byte $60 (RTS).

If the nostack option is enabled, @nearby_rts will not be used.

Example: @nearby_rts = $FFF4

 ? 

rts14: In order to utilize the delay_n macro, you must have defined a label "@nearby_rts_14cyc", which points to anywhere in memory that you know contains byte $EA (NOP) followed by $60 (RTS). Instead of NOP, it may also contain some other 2-cycle opcode that you know is harmless for your constraints.

If the nostack option is enabled, @nearby_rts_14cyc will not be used.

Example: @nearby_rts_14cyc = $A6E0

 ? 

rts15: In order to utilize the delay_n macro, you must have defined a label "@nearby_rts_15cyc", which points to anywhere in memory that contains a JMP instruction into a RTS instruction.

If the nostack option is enabled, @nearby_rts_15cyc will not be used.

Example: @nearby_rts_15cyc = $E67F

 ? 

rti: To use BRK in delay code, you must have a dummy interrupt handler (IRQ vector) that does absolutely nothing but immediately RTI.


 ? 

a25:
See http://wiki.nesdev.com/w/index.php/Delay_code for this function.

 ? 

a27:
See http://wiki.nesdev.com/w/index.php/Delay_code for this function.

 ? 

xa30:
See http://wiki.nesdev.com/w/index.php/Delay_code for this function.

 ? 

ax33:
See http://wiki.nesdev.com/w/index.php/Delay_code for this function.

 ? 

Ba16:
See http://wiki.nesdev.com/w/index.php/Delay_code for this function.


Code generation

 ? 

unofficial: Permits the generation of delay code that utilizes unofficial/illegal instructions. These instructions do work on all official NES consoles and on practically all NES clones too, but not on all emulators, and using them is considered bad style. Nevertheless, using them may help produce shorter delay code in some cases. Only “safe” instructions with known and consistent behavior are used.

 ? 

nostack: Prohibits the generation of code that writes into stack. This means that harmless code-pairs like PHP-PLP, PHA-PLA, and JSR-RTS, will not be used.

Warning: If you also prohibit the clobbering of Z&N flags, the only remaining option for the code generator is to generate very long chains of NOP instructions. Not recommended.

 ? 

unreloc: Permits the generation of JMP instructions to the same code. As these instructions hard-code the target address, relocating is an issue. ROM hackers in particular often prefer working with code that doesn’t require special effort for relocation.

Disabling this option is currently not permitted, because it is the only safe fall-back for odd cyclecounts when nothing can be clobbered.

 ? 

unsafe: Permit the generation of delay code that may corrupt data if an interrupt happens in the middle of the delay.

An example of such code could be: PHP-TSX-PLA-TXS-PLP. If an interrupt happens between the PLA and the TXS, the backup of the flags register will be corrupted.

Even though it may sometimes result in a byte or two smaller code, currently this option is not offered.

 ? 

nofc: Prohibits the generation of code that follows the assumptions outlined in the Memory access chapter below. Check this option if you are generating code that should be run on e.g. Commodore 64.

 ? 

nobranch: Prohibits the generation of branch instructions, such as BCC and BPL. You might choose this option, if it is infeasible for you to ensure that branches will never page-wrap (for example, if your delay code is wrapped inside a .repeat-.endrepeat pair).


Note: Some option combinations are forced in this generator. If you require a specific combination that is explicitly disabled in this generator, explain your needs in an e-mail to me (wFbilsqk1wiAt@4gikaCi.Ckfi), and I may generate the file for you. This is done in order to keep the number of pre-generated combinations in check.

Information

This generates a CA65-compatible source code file that defines a single macro:
.macro delay_n n
  ≺ ··· contents ··· ≻
.endmacro
This macro produces exactly N cycles of delay, for use in timing-sensitive NES code, such as scanline effects, screen splits, or PCM sound generation. The delay code is algorithmically computed, optimized for size, extremely tight packed, and uses techniques such as partial opcode execution.

Examples of these generated delay routines:

7 cycles, 2 bytes
Writes to stack
php
plp
10 cycles, 4 bytes
Clobbers Z+N
rol $26
ror $26
20 cycles, 5 bytes
Clobbers A, Z+N, and C
lda #$2A ;hides 'rol a'
clc
bpl *-2
557 cycles, 7 bytes
Clobbers X, Z+N, and C
cmp #$CD ;hides 'cmp $BDA2'
ldx #189
dex
bmi *-4
1032 cycles; 10 bytes
Writes to stack
php
 pha
  txa
  ldx #202 ;hides 'dex'
  bne *-1
  tax
 pla
plp

The macros define the guaranteed*-to-be-smallest code for all delays from 2 to 20000 cycles. Delays larger than that are generated with subdivision.

From real life, example code that uses this macro:

.include "6502-inline_delay-keepy-ram.inc"
  ···
  @nearby_rts       = $C452
  @nearby_rts_14cyc = $C43B  ; clc + rts
  @nearby_rts_15cyc = $C649  ; jmp + rts
  @zptemp = $07
  ···
  .if (MAPPER = 24) .or (MAPPER = 26) ;VRC6
  cost_d1 = 92
  MAP_CHANGE_COST = 12 + 2*4 + (4+2+6)*2 + 2*4 + 8*4
  MAP_CHANGE_OFFSET = 15
  .endif
  ···
  delay_n (cost_d-MAP_CHANGE_COST+MAP_CHANGE_OFFSET)

*) Barring discoveries of even more efficient code.

Terms of use

#include <MIT license>

Copyright © 2016 Joel Yliluoma (http://iki.fi/bisqwit/)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Watch out for…

Operator precedence and macros

The delay macro is not operator-precedence-safe. If your delay count is an expression and not a simple number, you must use parentheses in the delay parameter. Example:
  i=5
  j=10
  delay_n i+j   ;wrong
  delay_n (i+j) ;correct

DPCM sound

If you are playing DPCM sounds at the same time while the delay code is running, the DPCM DMA will steal clock cycles at predictable intervals, causing the delay code to run longer than requested.

If you know exactly what you are playing, it is possible to compensate, and still get very nearly cycle-accurate delays with a small margin of error.

An example of a delay macro, that does such cycle-compensation, can be found here: http://iki.fi/bisqwit/src/6502-inline_dmc_delay_compensation.inc.

This DMC-compensation module works only with NES Simon’s Quest, as it relies on knowing exactly which DPCM sample is currently playing; but the code can be changed to work with any other game, as long as it is possible to know which sample is currently playing and at which speed.

I have used this macro as follows:

.macro MYDELAY  n_scanlines, ntscadjust, paladjust, extra
  .if PAL=0
    compensate_dmc_delay extra, (n_scanlines)*341   /3 + ntscadjust
  .else
    compensate_dmc_delay extra, (n_scanlines)*341*5/16 + paladjust
  .endif
.endmacro
  ···
  ; Delay exactly 48 scanlines, plus cost_c1 or cost_c2 cycles,
  ; depending whether we're PAL or NTSC;
  ; with the assumption that MAP_CHANGE_COST cycles
  ; of DMC-unaware delay were already performed by a
  ; previously issued mapper-change routine.
  MYDELAY 48, cost_c1, cost_c2, MAP_CHANGE_COST

Memory access

In addition to the memory constraints you select, the delay code assumes that the following memory addresses can be safely read without side effects:

And that garbage can be safely written into the following addresses:

Reading from write-only ports and writing into read-only ports is considered safe, as the recipient circuitry ignores those accesses. Writing into ROM is not considered safe, because many cartridges contain mapper circuitry that intercepts those writes, causing side-effects. Technically I could have made that a selectable option, but I had to draw the line somewhere: generating all these options already consumed weeks of CPU time.

If @zptemp is enabled, it is also assumed that memory address $xxE6, where xx=@zptemp, can be safely read. Of particular concern are @zptemp values of $40–$5F. Technically there could be a device on the gamecart that responds with side-effects into reads concerning such addresses, for example $51E6. If this is the case, do not use @zptemp, or choose another value for @zptemp.

If the not-Famicom option (nofc) is chosen, then only the memory region $0000–$07FF is assumed readable, and nothing is considered writable, unless explicitly enabled.

Branch macros

The generated code requires that these branch macros are defined. Note that these macro names differ from the better known “long branch” macros, in that the first “J” letter is uppercase.

The purpose of these macros is to verify at link-time that the branch does not cross page boundaries. The reason is that branch instructions consume an extra cycle when a page-crossing happens, and the delay code is not written to account for that.

.macro branch_check opc, dest
  opc dest
  .assert >* = >(dest), warning, "branch_check: failed, crosses page"
.endmacro

.macro Jcc dest
  branch_check bcc, dest
.endmacro
.macro Jcs dest
  branch_check bcs, dest
.endmacro
.macro Jeq dest
  branch_check beq, dest
.endmacro
.macro Jne dest
  branch_check bne, dest
.endmacro
.macro Jmi dest
  branch_check bmi, dest
.endmacro
.macro Jpl dest
  branch_check bpl, dest
.endmacro
.macro Jvc dest
  branch_check bvc, dest
.endmacro
.macro Jvs dest
  branch_check bvs, dest
.endmacro

Decimal mode

Every macro produced by this vending machine assumes that arithmetic instructions like ADC and SBC are done in binary mode, as if D flag was clear, or as if D flag was ignored like on the NES.

Even if you disable the clobbering of the D flag, the code might include CLD / SED instructions and ADC / SBC operations that assume binary mode; just wrapped in PHP⋯PLP.

Received file may contain additional constraints

If the vending machine gives you a different file than what you requested (less permissive constraints), it is because the more lenient settings in your case did not help produce smaller code, and the code was merged into another file. This is not an error in the vending machine.

An example situation where this may happen: You permit clobbering A, X, and Y, but you prohibit stack writes (nostack), and you also prohibit clobbering Z&N. In this case, there is no possible code that can utilize A, X or Y while honoring the other constraints, so the request is merged into a more strict file that preserves all registers.

If the vending machine gives you code that clobbers something you did not give permission to clobber, then it is an error that should be reported.


P.S. Please do not DoS my server.
Last edited at: 2016-05-04T20:51:14+00:00 by geXJoelo Yli3luomCa <biW0Jsqwitt@ikqZRi.fi>