Skip to content

Added __*shift_nz routines (for shifting by non-zero or a constant amount)#756

Closed
ZERICO2005 wants to merge 2 commits into
masterfrom
shift_non_zero
Closed

Added __*shift_nz routines (for shifting by non-zero or a constant amount)#756
ZERICO2005 wants to merge 2 commits into
masterfrom
shift_non_zero

Conversation

@ZERICO2005
Copy link
Copy Markdown
Contributor

@ZERICO2005 ZERICO2005 commented Mar 16, 2026

Similar in spirit to #755.
__*shift_nz may allow for a small speed optimization by optionally skipping a test for a shift-by-zero.

I know that Clang/LLVM has some functionality to detect if the shift amount is non-zero. So the compiler could be able to output __*shift_nz when applicable. __*shift_nz will be emitted either when the shift amount is constant, or the shift amount is a variable that is proven to not be zero.

Calling __*shift_nz with a shift amount of zero is undefined behavior.

Additionally, it is always safe to convert __*shift_nz back to __*shift.

Pros:

  • Small speed optimization
  • If you see call __*shift in the compiler output, then you can be almost certain that the shift amount is by a variable instead of a constant
  • If only __lshl_nz is used, then it might be possible to not link __lshl which would save 3 bytes (although this would need FASMG require to implement)

Cons:

  • Potential for human error in hand written assembly code.

Here, 3F + 1 is saved by skipping the check for a shift-by-zero in __lshl.

__lshl:
	inc	l
	dec	l
	ret	z	; shift by zero
__lshl_nz:
	push	bc
	ld	b, l
	ex	(sp), hl
.L.loop:
	add	hl, hl
	rla
	djnz	.L.loop
	ex	(sp), hl
	pop	bc
	ret

Note that __*shift_nz aliases __*shift if no optimizations are possible.

__llshru_nz:
__llshru:
; Suboptimal for large shift amounts
	push	af
	push	iy
	ld	iy, 0
	add	iy, sp
	ld	a, (iy + 9)
	or	a, a
	jr	z, .L.finish	; shift by zero
; we cannot place __llshru_nz: here
	push	de
	push	hl

Here is a list of routines where __*shift_nz is faster:

  • __bshl
  • __bshru
  • __bshrs
  • __lshl
  • __lshru
  • __lshrs
  • __i48shru
  • __i48shrs

Comment thread src/crt/dtof.src
or a, $10

call __lshru
call __lshru_nz
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: Shift amount should be 5-29, and the same shift amount was used in DJNZ (which would break if the shift amount could be zero)

Comment thread src/crt/dtoll.src
ex (sp), hl
; shift is non-zero and [1, 11] in the non-UB case
call c, __llshl
call c, __llshl_nz
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: A is [1, 204] here, and call c, __llshl_nz will only call the function if the shift amount is less than 31.

Comment thread src/crt/ftod.src
ld c, a ; A is [1, 23]
; shift until the MSB of the mantissa is the LSB of the exponent
call __ishl
call __ishl_nz
Copy link
Copy Markdown
Contributor Author

@ZERICO2005 ZERICO2005 Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: rcf \ adc hl, hl was done prior to .L.subnormal, which means that __ictlz will return a value that is at least 1 since the LSB will be cleared.

Comment thread src/crt/ltod.src

ex (sp), hl ; (SP) = shift
call __llshru
call __llshru_nz
Copy link
Copy Markdown
Contributor Author

@ZERICO2005 ZERICO2005 Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: Shift amount is [1, 11], and the exact same shift amount was used for DJNZ, which would break if the shift amount were zero.

Comment thread src/libc/frexpf.src
call __ictlz
ld c, a
call __ishl
call __ishl_nz
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: add hl, hl is done, meaning that the LSB is 0, so __ictlz will return a value greater than 1

Comment thread src/libc/truncf.src
ld d, c ; store C
ld c, a
call __ishl
call __ishl_nz
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof: Since sub a, 23 set the carry flag, it implies that A became [-23, -1], then neg makes A [1, 23]

@ZERICO2005
Copy link
Copy Markdown
Contributor Author

There are more important compiler issues to work on for the moment.

@ZERICO2005 ZERICO2005 closed this Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant