At startup, we load the outpoints to watch, *then* roll back 15
blocks. If there were things in those blocks we wanted to watch, we
no longer do!
1. We load the utxoset into memory: everything in the utxoset table
which has spendheight null.
2. We roll back 15 blocks to re-read. Deleting a block from the
database causes the utxo spentheights referring to it to be set
to null.
3. We roll forward, but we didn't update the in-memory utxoset,
so we're not watching those utxos which are spent.
The main symptom of this is that we spam peers with obsolete gossip
(if we get sent a channel announcement for a closed channel, we can
think it isn't spent yet). But it could *also* mean we don't notice
onchain txs, if we restart at the wrong time!
Changelog-Fixed: lightningd: we could miss tx spends which happened in the past blocks when we restarted.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If they had a channel bias, and ran xpay, it will update the bias
to a v2 bias (with a timestamp). We must downgrade that, or the
older version won't load!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: tools: `lightningd-downgrade` can downgrade your database from v25.12 to v25.09 if something goes wrong.
When installed, the name is `lightning-hsmtool`. We actually copy
`tools/hsmtool` to `tools/lightning-hsmtool` but that's a silly step
which we should get rid of.
So:
1. Make sure our documentation always refers to it as lightning-hsmtool.
2. Make sure our tests invoke it as `lightning-hsmtool`.
3. Rename the C file.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't expect an internal command to take 5 seconds to service
without explicitly pausing: if it does, log at a higher level.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we enter the wrong passphrase hsmd crashes like this with an unknown message type:
lightning_hsmd: Failed to load hsm_secret: Wrong passphrase (version v25.12rc1-7-g7713a42-modded)
0x102ba44bf ???
send_backtrace+0x4f:0
0x102b0900f status_failed
common/status.c:207
0x102af1a37 hsmd_send_init_reply_failure
hsmd/hsmd.c:301
0x102af1497 load_hsm
hsmd/hsmd.c:446
0x102af1497 init_hsm
hsmd/hsmd.c:548
0x102b29e63 next_plan
ccan/ccan/io/io.c:60
0x102b29e63 do_plan
ccan/ccan/io/io.c:422
0x102b29d8b io_ready
ccan/ccan/io/io.c:439
0x102b2b4bf io_loop
ccan/ccan/io/poll.c:470
0x102af0a83 main
hsmd/hsmd.c:886
lightningd: HSM sent unknown message type
This change swaps write_all() to wire_synce_write() because write_all() is missing the wire protocol length prefix. We also don't send a stack trace anymore if the user has entered the wrong passphrase and exit cleanly.
The original method name was lsps-lsps2-invoice but I somehow messed it
up and renamed during a rebase.
Changelog-Changed: lsps-jitchannel is now lsps-lsps2-invoice
Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
This is slow, but will make sure we find out if we add latency spikes in future.
tests/test_coinmoves.py::test_generate_coinmoves (5,000,000, sqlite3):
Time (from start to end of l2 node): 223 seconds
Latency min/median/max: 0.0023 / 0.0033 / 0.113 seconds
tests/test_coinmoves.py::test_generate_coinmoves (5,000,000, Postgres):
Time (from start to end of l2 node): 470 seconds
Worst latency: 0.0024 / 0.0098 / 0.124 seconds
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: lightningd: multiple signficant speedups for large nodes, especially preventing "freezes" under exceptionally high load.
Changelog-Added: Plugins: "filters" can be specified on the `custommsg` hook to limit what message types the hook will be called for.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
Time (from start to end of l2 node): 135 seconds **WAS 227**
Worst latency: 12.1 seconds **WAS 62.4**
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we've rid ourselves of the worst offenders, we can make this a real
stress test. We remove plugin io saving and low-level logging, to avoid
benchmarking testing artifacts.
Here are the results:
tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
Time (from start to end of l2 node): 518 seconds
Worst latency: 353 seconds
tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, Postgres):
Time (from start to end of l2 node): 417 seconds
Worst latency: 96.6 seconds
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we add a new hook, not at the end, while hooks are getting called,
then iteration could be messed up (e.g. calling a plugin twice, or
skipping one).
The simplest thing is to defer updates until nobody is calling the
hook. In theory this could livelock, in practice it won't.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We start with 100,000 entries. We will scale this to 2M as we fix the
O(N^2) bottlenecks.
I measure the node time after we modify the db, like so:
while guilt push && rm -rf /tmp/ltests* && uv run make -s RUST=0; do RUST=0 VALGRIND=0 TIMEOUT=100 TEST_DEBUG=1 eatmydata uv run pytest -vvv -p no:logging tests/test_coinmoves.py::test_generate_coinmoves > /tmp/`guilt top`-sql 2>&1; done
Then analyzed the results with:
FILE=/tmp/synthetic-data.patch-sql; START=$(grep 'lightningd-2 .* Server started with public key' $FILE | tail -n1 | cut -d\ -f2 | cut -d. -f1); END=$(grep 'lightningd-2 .* JSON-RPC shutdown' $FILE | tail -n1 | cut -d\ -f2 | cut -d. -f1); echo $(( $(date +%s -d $END) - $(date +%s -d $START) )); grep 'E assert' $FILE;
tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3):
Time (from start to end of l2 node): 85 seconds
Worst latency: 75 seconds
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
bc4bb2b0ef "libplugin: use jsonrpc_io logic for sync requests too."
changed this message, and test was not updated.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Delay can cause bogus complaints:
```
2025-11-13T23:50:03.6643632Z lightningd-3 2025-11-13T23:37:29.947Z **BROKEN** 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: wake delay for WIRE_CHANNEL_REESTABLISH: 5708msec
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSON-RPC: `listpeerchannels` `funding` object `withheld` flag, and `listclosedchannels` `funding_withheld` flags, indicating fundchannel_complete was called with the `withheld` parameter true.
This covers the other corner case, where we crash before actually
signing and sending the PSBT. We can spot this because the channel is
in AWAITING_LOCKIN and we have a PSBT, but it's not signed yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSON-RPC: `psbt` field in `funding` in listpeerchannels, and `funding_psbt` in listclosedchannels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: we now re-transmit unseen funding transactions on startup, for more robustness.
This attempts to solve a problem we have with Phoenix clients:
This payment has been split in two many parts by the sender: 31 parts vs max 6 parts allowed for on-the-fly funding.
The problem is that we don't have any way in bolt11 or bolt12 to
specify the maximum number of HTLCs.
As a workaround, we start by restricting askrene to 6 parts if the
node is not openly reachable, and if it struggles, we remove the
restriction. This would work much better if askrene handled maxparts
more completely!
See-Also: https://github.com/ElementsProject/lightning/issues/8331
Changelog-Fixed: `xpay` will not try to send too many HTLCs through unknown channels (6, as that is Phoenix's limit) unless it has no choice
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have the issue of aliases: xpay uses scids like 0x0x0 for
routehints and blinded paths, and then can apply reservations to them. But
generally, reservations are *global*, so we need to differentiate.
Changelog-Added: Plugins: `askrene-reserve` and `askrene-unreserve` can take an optional `layer` inside `path` elements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we simply call it at the end. We need to check it hasn't violated fee maxima, but
otherwise it's simple.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Plugins: `askrene` now handles limits on number of htlcs much more gracefully.
We don't need to convert to strings, we can compare directly. This removes the final
use of the index arrays.
This of course changes the order of returned routes, which alters test_real_biases, since
that biases against the final channel in the *first* route.
Took me far too long to diagnose that!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I added amount_msat_accumulate for the "a+=b" case, but I was struggling
with a name for the subtractive equivalent. After some prompting, ChatGPT
suggested deduct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Improvements in the fuzz-testing scheme of `fuzz-initial_channel`
led to the discovery of test inputs that result in greater code
coverage. Add these inputs to the test's seed corpus.
Currently, `fuzz-initial_channel` doesn't verify the function
`channel_update_fundinng()` in its target file,
`common/initial_channel.h`.
Add a test for it.
Changelog-None: `towire_wireaddr()` and `fromwire_wireaddr()` in
`common/wireaddr.h` are responsible for marshalling/unmarshalling
BOLT #7 address descriptors.
Since these aren't tested by the existing wire fuzz tests, add a
roundtrip test for them. This has the added benefit of testing
`parse_wireaddr()` as well.
Hacky parser, not a real one, but this is for devs, so they can clean
it up with ccan/opt themselves if the want to be fancy! 🎩
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>