Commit Graph

17599 Commits

Author SHA1 Message Date
Rusty Russell
ffb8d860cc pytest: fix flake in test_important_plugin node failure.
xpay can get upset if askrene goes away first:

lightningd-1 2026-02-18T02:47:44.908Z **BROKEN** plugin-cln-xpay: askrene-create-layer failed with {"code":-32601,"message":"Unknown command 'askrene-create-layer'"}

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-20 10:47:10 +10:30
Rusty Russell
a84ad10820 common: fix bad cupdates count in gossmap.c
I noticed this in the logs:

```
lightningd-1 2026-01-28T00:27:37.504Z DEBUG   gossipd: gossip_store: Read 59428/118856/0/0 cannounce/cupdate/nannounce/delete from store in 45521871 bytes, now 45521849 bytes (populated=true)
lightningd-1 2026-01-28T00:27:37.504Z DEBUG   gossipd: Got 118856 bad cupdates, ignoring them (expected on mainnet)
```

That's weird, and turns out it counting good updates, not bad ones!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
ab3e7c31e8 pytest: rework test_real_data and test_real_biases to be parallel.
This speeds them up, and exercises the askrene parallel code.

Before: test_real_data: 348s  test_real_biases: 105s

After:  test_real_data: 133s  test_real_biases: 106s

And this is because much of the time is spent uncompressing the gossmap
and startup.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
d5c2c486f1 askrene: close files in child to isolate against bugs.
This makes sure it cannot interfere with the parent askrene's
connection to lightningd, for example.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
6903cf2efb askrene: make "child.c" to be the explicit child entry point.
The fork logic itself is pretty simple, so do that directly in
askrene.c, and then call into "run_child()" almost as soon as
we do the fork.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
3c6504a99e askrene: limit how many children we have.
Queue them before we query local channels, so they don't use stale
information.

Changelog-Added: Config: `askrene-max-threads` to control how many CPUs we use for routing (default 4).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
fb4232dff8 askrene: actually run children in parallel.
Changelog-Changed: Plugins: `askrene` now runs routing in parallel.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
68b30b1e5d askrene: have child make struct route_query internally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
b002824217 askrene: move route_query definition and functions into child/.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
85c9179f77 askrene: expose additional_costs htable so child can access it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
0f575ac85a askrene: remove non child-friendly fields from struct route_query.
Notably no access to the struct command and struct plugin.

Note: we actually *do* mess with askrene->reserves, but the previous code
used cmd to get to it.  Now we need to include a non-const pointer in
struct route_query.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
ac9aa975ad askrene: make children use child_log() instead of rq_log.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
33e2f0a47b askrene: move fork() entry point into its own file.
Now there's only one file clearly shared by both parent and child.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
395261fc30 askrene: move fmt_flow_full from askrene.c into flow.c.
Weird that it was in askrene.c

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
8775b62871 askrene: move routines only accessed by the child process into child/.
We want to make it clear when future generations edit the code, which
routines are called in the child (i.e. all the routing), and which in
the parent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
da2f77767c askrene: add child_log function so child can do logging.
We just shim rq_log for now, but we'll be weaning the child process off
that soon.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
0ede29b81f askrene: fork before calling the route solver.
This is fairly simple.  We do all the prep work, fire off the child,
and it continues all the way to producing JSON output (or an error).
The parent then forwards it.

Limitations (fixed in successive patches):

1. Child logging currently gets lost.
2. We wait for the child, so this code is not a speedup.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
e397b12282 askrene: make minflow() static, and remove unused linear_flow_cost.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
d9774e73dc bitcoin: hash_scid and hash_scidd public functions.
We reimplemented this redundantly: hash_scid was called
short_channel_id_hash, so I obviously missed it.

Rename, and implement hash_scidd helper too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
Rusty Russell
9bcac63414 libplugin: add command_finish_rawstr() for when we're simply repeating an entore response.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-19 17:04:35 +10:30
dovgopoly
edbad6cdca pytest: add tests for bcli getblockfrompeer retry path
Add `test_bcli_concurrent` to verify bcli handles concurrent requests while the `getblockfrompeer` retry path is active, simulating a pruned node scenario where `getblock` initially fails.

Add `test_bcli_retry_timeout` to verify lightningd crashes with a clear error message when we run out of `getblock` retries.
2026-02-18 14:16:29 +10:00
dovgopoly
2b39fc0cb4 bcli: replace magic numbers with constants 2026-02-18 14:16:29 +10:00
dovgopoly
4126f3b1fe bcli: refactor wait_and_check_bitcoind and run_bitcoin_cli to use shared execution
Extract `execute_bitcoin_cli` as shared function used by both `run_bitcoin_cli` and `wait_and_check_bitcoind`.
2026-02-18 14:16:29 +10:00
dovgopoly
d727946b14 bcli: return "not found" on any getblockhash exit status
Return "not found" on any `getblockhash` exit status. Previously, only exit code 8 (block height doesn't exist) returned "not found", while other exit codes returned an error. Now any non-zero exit status returns "not found" since any failure means the block is unavailable.
2026-02-18 14:16:29 +10:00
dovgopoly
57d60c025b bcli: remove unused async code after sync refactor
Remove the asynchronous execution infrastructure no longer needed after converting all bcli commands to synchronous execution. This includes removing the async callbacks, the pending request queue, etc.

Fix missing `close(from)` file descriptor leak in `run_bitcoin_cliv`.

Changelog-Changed: bcli plugin now uses synchronous execution, simplifying bitcoin backend communication and improving error handling reliability.
2026-02-18 14:16:29 +10:00
dovgopoly
3e979d1b20 pytest: fix bcli tests after sync refactor
Rewrite `test_bitcoin_failure` to reflect synchronous bcli behavior: the node now crashes on invalid bitcoind responses rather than retrying. Add `may_fail` and `broken_log` to handle expected crash.

Update `test_bitcoind_fail_first` stderr check to match the new error message format from `get_bitcoin_result`.

Update test mocks to use proper error format for "block not found".

Co-authored-by: ShahanaFarooqui <shahana.farooqui@gmail.com>
2026-02-18 14:16:29 +10:00
dovgopoly
7b1793f40d lightningd: add get_bitcoin_result for bcli response handling
Add `get_bitcoin_result` function that checks bcli plugin responses for errors and returns the result token. Previously, callbacks only detected errors when result parsing failed, ignoring the explicit error field from the plugin. Now we extract the actual error message from bcli, providing clearer reasoning when the plugin returns an error response.
2026-02-18 14:16:29 +10:00
dovgopoly
b5c300a82b bcli: convert getrawblockbyheight to synchronous execution
Also rename command_err_badjson to generic command_err helper, since error messages aren't always about bad JSON (e.g., "command failed" for non-zero exit).
2026-02-18 14:16:29 +10:00
dovgopoly
d06024cef7 bcli: convert estimatefees to synchronous execution
Add `command_err_badjson` helper for sync error handling, mirroring the async `command_err_bcli_badjson`. Store args string in `bcli_result` for consistent error messages.
2026-02-18 14:16:29 +10:00
dovgopoly
0de1350706 bcli: convert sendrawtransaction to synchronous execution 2026-02-18 14:16:29 +10:00
dovgopoly
a3e07f4f3a bcli: convert getutxout to synchronous execution 2026-02-18 14:16:29 +10:00
dovgopoly
f8c7a20403 bcli: convert getchaininfo to synchronous execution 2026-02-18 14:16:29 +10:00
dovgopoly
fad05200eb bcli: add synchronous run_bitcoin_cli for future refactor 2026-02-18 14:16:29 +10:00
Rusty Russell
963b353a30 connectd: use membuf for more efficient output queue.
This is exactly what membuf is for: it handles expansion much more
neatly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
afdc92fedf connectd: only do lazy transmission for *definitely* non-urgent messages.
Since we delay the others quite a lot (up to 1 second), it's better to consider
most messages "urgent" and worth immediately transmitting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
2436ee6f6f connectd: don't flush messages unless we have something important.
This replaces our previous nagle-based toggling.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
8b90d40a75 connectd: pad messages with dummy pings if needed to make size uniform.
Messages are now constant.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we now pad all peer messages to make them the same length.
2026-02-18 14:13:25 +10:30
Rusty Russell
ca2d389920 devtools/gossipwith: don't count "padding" pings towards max-messages count.
We are about to use them to make our packet size constant, and this
will upset the tests.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
d45bc2d56e connectd: don't toggle nagle on and off, leave it always off.
We're doing our own buffering now.

We leave the is_urgent() function for two commits in the future though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
c23b7a492d connect: switch to using io_write_partial instead of io_write.
This gives us finer control over write sizes: for now we just cap
the write size.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
df1ae1d680 connectd: refactor to break up "encrypt_and_send".
Do all the special treatment of the message type first.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
7577e59f6c connectd: refactor outgoing loop.
Give us a single "next message" function to call.  This will be useful
when we want to write more than one at a time.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
42bdb2d638 CI: run tests in the wireshark group so we can test packet sizes
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
369338347d pytest: add fixture for checking packet sizes.
This requires access to dumpcap.  On Ubuntu, at least, this means you
need to be in the "wireshark" group.

We may also need:
	sudo ethtool -K lo gro off gso off tso off

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
cd7afb506a pytest: remove now-invalid test.
Commit 888745be16 (dev_disconnect:
remove @ marker.) in v0.11 in April 2022) removed the '@' marker from
our dev_disconnect code, but one test still uses it.

Refactoring this code made it crash on invalid input.  The test
triggered a db issue which has been long fixed, so I'm simply removing
it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
4d030d83ce pytest: fix flae int test_fetchinvoice_autoconnect.
l3 doesn't just need to know about l2 (which it can get from the
channel_announcement), but needs to see the node_announcement.

Otherwise:

```
        l1, l2 = node_factory.line_graph(2, wait_for_announce=True,
                                         # No onion_message support in l1
                                         opts=[{'dev-force-features': -39},
                                               {'dev-allow-localhost': None}])
    
        l3 = node_factory.get_node()
        l3.rpc.connect(l1.info['id'], 'localhost', l1.port)
        wait_for(lambda: l3.rpc.listnodes(l2.info['id'])['nodes'] != [])
    
        offer = l2.rpc.call('offer', {'amount': '2msat',
                                      'description': 'simple test'})
>       l3.rpc.call('fetchinvoice', {'offer': offer['bolt12']})

tests/test_pay.py:4804: 
...	
>           raise RpcError(method, payload, resp['error'])
E           pyln.client.lightning.RpcError: RPC call failed: method: fetchinvoice, payload: {'offer': 'lno1qgsqvgnwgcg35z6ee2h3yczraddm72xrfua9uve2rlrm9deu7xyfzrcgqypq5zmnd9khqmr9yp6x2um5zcssxwz9sqkjtd8qwnx06lxckvu6g8w8t0ue0zsrfqqygj636s4sw7v6'}, error: {'code': 1003, 'message': 'Failed: could not route or connect directly to 033845802d25b4e074ccfd7cd8b339a41dc75bf9978a034800444b51d42b07799a: {"code":400,"message":"Unable to connect, no address known for peer"}'}
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-18 14:13:25 +10:30
Rusty Russell
29e0a1ddfe bkpr: limp along if we lost our db.
We can't really do decent bookkeeping any more, but don't crash!

```
bookkeeper: plugins/bkpr/recorder.c:178: find_txo_chain: Assertion `acct->open_event_db_id' failed.
bookkeeper: FATAL SIGNAL 6 (version v25.12)
0xaaaab7d51a7f send_backtrace
	common/daemon.c:38
0xaaaab7d51b2b crashdump
	common/daemon.c:83
0xffff8c0b07cf ???
	???:0
0xffff8bdf7608 __pthread_kill_implementation
	./nptl/pthread_kill.c:44
0xffff8bdacb3b __GI_raise
	../sysdeps/posix/raise.c:26
0xffff8bd97dff __GI_abort
	./stdlib/abort.c:79
0xffff8bda5cbf __assert_fail_base
	./assert/assert.c:96
0xffff8bda5d2f __assert_fail
	./assert/assert.c:105
0xaaaab7d41fd7 find_txo_chain
	plugins/bkpr/recorder.c:178
0xaaaab7d421fb account_onchain_closeheight
	plugins/bkpr/recorder.c:291
0xaaaab7d37687 do_account_close_checks
	plugins/bkpr/bookkeeper.c:884
0xaaaab7d38203 parse_and_log_chain_move
	plugins/bkpr/bookkeeper.c:1261
0xaaaab7d3871f listchainmoves_done
	plugins/bkpr/bookkeeper.c:171
0xaaaab7d4811f handle_rpc_reply
	plugins/libplugin.c:1073
0xaaaab7d4827b rpc_conn_read_response
	plugins/libplugin.c:1377
0xaaaab7d889a7 next_plan
	ccan/ccan/io/io.c:60
0xaaaab7d88f7b do_plan
	ccan/ccan/io/io.c:422
0xaaaab7d89053 io_ready
	ccan/ccan/io/io.c:439
```

Fixes: https://github.com/ElementsProject/lightning/issues/8854
Changelog-Fixed: Plugins: `bkpr_listbalances` no longer crashes if we lost our db, then do emergencyrecover and close a channel.
Reported-by: https://github.com/enaples
2026-02-17 12:10:26 +10:30
Rusty Russell
2e8261ef9e pytest: test for bkpr_listbalances after emergencyrecover.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-17 12:10:26 +10:30
Rusty Russell
b150309854 pytest: test for crash when we have dying channels and compact the gossip_store.
Before I fixed the handling of dying channels:

```
lightning_gossipd: gossip_store: can't read hdr offset 2362/2110: Success (version v25.12-279-gb38abe6-modded)
0x6537c19ecf3a send_backtrace
        common/daemon.c:38
0x6537c19f1a1d status_failed
        common/status.c:207
0x6537c19e557a gossip_store_get_with_hdr
        gossipd/gossip_store.c:527
0x6537c19e5613 check_msg_type
        gossipd/gossip_store.c:559
0x6537c19e5a36 gossip_store_set_flag
        gossipd/gossip_store.c:577
0x6537c19e5c82 gossip_store_del
        gossipd/gossip_store.c:629
0x6537c19e8ddd gossmap_manage_new_block
        gossipd/gossmap_manage.c:1362
0x6537c19e390e new_blockheight
        gossipd/gossipd.c:430
0x6537c19e3c37 recv_req
        gossipd/gossipd.c:532
0x6537c19ed22a handle_read
        common/daemon_conn.c:35
0x6537c19fbe71 next_plan
        ccan/ccan/io/io.c:60
0x6537c19fc174 do_plan
        ccan/ccan/io/io.c:422
0x6537c19fc231 io_ready
        ccan/ccan/io/io.c:439
0x6537c19fd647 io_loop
        ccan/ccan/io/poll.c:470
0x6537c19e463d main
        gossipd/gossipd.c:609
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
acb8a8cc15 gossipd: dev-compact-gossip-store to manually invoke compaction.
And tests!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30