Unfortunately the effect of leaving Nagle enabled is subtle. Here it
is in v25.12:
Normal:
tests/test_connection.py::test_no_delay PASSED
====================================================================== 1 passed in 13.87s
Nagle enabled:
tests/test_connection.py::test_no_delay PASSED
====================================================================== 1 passed in 21.70s
So it's hard to both catch this issue and not have false positives. Improve the
test by deliberately running with Nagle enabled, so we can do a direct comparison.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This test restarts l2 twice. Each time, l1 is reconnecting, and backs
off. If the test is slow enough, the backoff gets extreme:
```
2026-02-19T02:13:03.7669982Z lightningd-1 2026-02-19T01:50:56.541Z DEBUG 033845802d25b4e074ccfd7cd8b339a41dc75bf9978a034800444b51d42b07799a-lightningd: peer_disconnected
2026-02-19T02:13:03.7670444Z lightningd-1 2026-02-19T01:50:56.547Z DEBUG 033845802d25b4e074ccfd7cd8b339a41dc75bf9978a034800444b51d42b07799a-connectd: Will try reconnect in 256 seconds
```
This isn't a bug! The backoff caps at 300 seconds, and only gets
reset if we remain connected for that long.
A manual reconnect here not only fixes the flake, but make the test
much faster, by not *doubling* the time for slow tests as shown on my
laptop (the final test using `taskset -c 1`):
Normal Valgrind Valgrind, 1 CPU
Before: 22sec 124sec 230sec
After: 18sec 102sec 191sec
These are from a single run: it could be much more in the worst case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With the extra padding pings, we can get more!
```
# Make sure the noise is within reasonable bounds
assert tally['query_short_channel_ids'] <= 1
assert tally['query_channel_range'] <= 1
> assert tally['ping'] <= 3
E assert 4 <= 3
tests/test_gossip.py:2396: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
xpay can get upset if askrene goes away first:
lightningd-1 2026-02-18T02:47:44.908Z **BROKEN** plugin-cln-xpay: askrene-create-layer failed with {"code":-32601,"message":"Unknown command 'askrene-create-layer'"}
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This speeds them up, and exercises the askrene parallel code.
Before: test_real_data: 348s test_real_biases: 105s
After: test_real_data: 133s test_real_biases: 106s
And this is because much of the time is spent uncompressing the gossmap
and startup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add `test_bcli_concurrent` to verify bcli handles concurrent requests while the `getblockfrompeer` retry path is active, simulating a pruned node scenario where `getblock` initially fails.
Add `test_bcli_retry_timeout` to verify lightningd crashes with a clear error message when we run out of `getblock` retries.
Rewrite `test_bitcoin_failure` to reflect synchronous bcli behavior: the node now crashes on invalid bitcoind responses rather than retrying. Add `may_fail` and `broken_log` to handle expected crash.
Update `test_bitcoind_fail_first` stderr check to match the new error message format from `get_bitcoin_result`.
Update test mocks to use proper error format for "block not found".
Co-authored-by: ShahanaFarooqui <shahana.farooqui@gmail.com>
Messages are now constant.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we now pad all peer messages to make them the same length.
This requires access to dumpcap. On Ubuntu, at least, this means you
need to be in the "wireshark" group.
We may also need:
sudo ethtool -K lo gro off gso off tso off
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit 888745be16 (dev_disconnect:
remove @ marker.) in v0.11 in April 2022) removed the '@' marker from
our dev_disconnect code, but one test still uses it.
Refactoring this code made it crash on invalid input. The test
triggered a db issue which has been long fixed, so I'm simply removing
it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
l3 doesn't just need to know about l2 (which it can get from the
channel_announcement), but needs to see the node_announcement.
Otherwise:
```
l1, l2 = node_factory.line_graph(2, wait_for_announce=True,
# No onion_message support in l1
opts=[{'dev-force-features': -39},
{'dev-allow-localhost': None}])
l3 = node_factory.get_node()
l3.rpc.connect(l1.info['id'], 'localhost', l1.port)
wait_for(lambda: l3.rpc.listnodes(l2.info['id'])['nodes'] != [])
offer = l2.rpc.call('offer', {'amount': '2msat',
'description': 'simple test'})
> l3.rpc.call('fetchinvoice', {'offer': offer['bolt12']})
tests/test_pay.py:4804:
...
> raise RpcError(method, payload, resp['error'])
E pyln.client.lightning.RpcError: RPC call failed: method: fetchinvoice, payload: {'offer': 'lno1qgsqvgnwgcg35z6ee2h3yczraddm72xrfua9uve2rlrm9deu7xyfzrcgqypq5zmnd9khqmr9yp6x2um5zcssxwz9sqkjtd8qwnx06lxckvu6g8w8t0ue0zsrfqqygj636s4sw7v6'}, error: {'code': 1003, 'message': 'Failed: could not route or connect directly to 033845802d25b4e074ccfd7cd8b339a41dc75bf9978a034800444b51d42b07799a: {"code":400,"message":"Unable to connect, no address known for peer"}'}
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now only need to walk it if we're doing an upgrade.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).
We also put this in the store_ended message, too: so you can
tell if the equivalent_offset there really refers to this new
entry (or if two or more rewrites have happened).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our poor scid generation clashes badly with simplified hashing (the
next patch) leading to l1's startup time when using a generated map
moving from 4 seconds to 14 seconds. Under CI it actually timed out
several tests.
Fixing our fake scids to be more "random" reduces it to 1.5 seconds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It didn't do anything, since the dev_compact_gossip_store command was
removed. When we make it do something, it crashes since old_len is 0:
```
gossipd: gossip_store_compact: bad version
gossipd: FATAL SIGNAL 6 (version v25.12rc3-1-g9e6c715-modded)
...
gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7119bd8288fe
gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7119bd82881a
gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7119bd83b516
gossipd: backtrace: gossipd/gossip_store.c:52 (append_msg) 0x56294de240eb
gossipd: backtrace: gossipd/gossip_store.c:358 (gossip_store_compact) 0x56294
gossipd: backtrace: gossipd/gossip_store.c:395 (gossip_store_new) 0x56294de24
gossipd: backtrace: gossipd/gossmap_manage.c:455 (setup_gossmap) 0x56294de255
gossipd: backtrace: gossipd/gossmap_manage.c:488 (gossmap_manage_new) 0x56294
gossipd: backtrace: gossipd/gossipd.c:400 (gossip_init) 0x56294de22de9
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's always true for the first hook invocation, but if there is more
than one plugin, it could vanish between the two! In the default configuration, this can't happen.
This bug has been around since v23.02.
Note: we always tell all the plugins about the peer, even if it's
already gone.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: lightningd: possible crash when peers disconnected if there was more than one plugin servicing the `peer_connected` hook.
Reported-by: https://github.com/santyr
Fixes: https://github.com/ElementsProject/lightning/issues/8858
Avoids guessing what the timeout should be, use a file trigger. This
is more optimal, and should reduce a flake in test_sql under valgrind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It uses the hold_invoice plugin to ensure that an HTLC is in flight, but
it tells it to hold the HTLC for "TIMEOUT * 2" which is a big number under CI.
Reduce it to sqrt(TIMEOUT + 1) * 2, which works for local testing (I run
with TIMEOUT=10) and still should be enough for CI (TIMEOUT=180).
Christian reported that the test took 763.00s (!!) under CI.
On my build machine (TIMEOUT=90):
Before:
383.00s
After:
64.38s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. If max was 0, we crashed with SIGFPE due to % 0.
2. If min was non-zero, logic was incorrect (but all callers had min == 0).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
2026-01-30T05:55:13.6654636Z # Note that l3 has the whole lease delay (minus blocks already mined)
2026-01-30T05:55:13.6655396Z _, _, l3blocks = l3.wait_for_onchaind_tx('OUR_DELAYED_RETURN_TO_WALLET',
2026-01-30T05:55:13.6656086Z 'OUR_UNILATERAL/DELAYED_OUTPUT_TO_US')
2026-01-30T05:55:13.6656618Z > assert l3blocks == 4032 - 6 - 2 - 1
2026-01-30T05:55:13.6657033Z E assert 4025 == (((4032 - 6) - 2) - 1)
```
Turns out that 4342043382 (tests: de-flake test that was failing on
cltv expiry) added a line to mine two more blocks, but the hardcoded
110 was not changed to 112, so we weren't actually waiting correctly.
Remove hardcoded numbers in favor of calculation, and do the same in
test_channel_lease_post_expiry (which was correct, for now).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If l2 hasn't seen l1's node_announcement yet:
```
# Correctly handles missing object.
> assert l2.rpc.sql("SELECT option_will_fund_lease_fee_base_msat,"
" option_will_fund_lease_fee_basis,"
" option_will_fund_funding_weight,"
" option_will_fund_channel_fee_max_base_msat,"
" option_will_fund_channel_fee_max_proportional_thousandths,"
" option_will_fund_compact_lease"
" FROM nodes WHERE HEX(nodeid) = '{}';".format(l1.info['id'].upper())) == {'rows': [[None] * 6]}
E AssertionError: assert {'rows': []} == {'rows': [[None, None, None, None, None, None]]}
E
E Differing items:
E {'rows': []} != {'rows': [[None, None, None, None, None, None]]}
E
E Full diff:
E {
E - 'rows': [
E + 'rows': [],
E ? ++
E - [
E - None,
E - None,
E - None,
E - None,
E - None,
E - None,
E - ],
E - ],
E }
tests/test_plugin.py:4131: AssertionError
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was added in 24.05, but LND since 0.18.3 no longer ever creates
such onions, and even that version (September 2024) is now a long way
behind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Protocol: we no longer support legacy onions (never sent by LND >= 0.18.3, which was the last)
If we can't decode something, and it decodes as a rune (and all bech32
strings do!), then we would usually just complain it was a malformed
rune. Be a big more useful, when the parameter looks like somthing else.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: JSON-RPC: `decode` is now more informative with malformed strings (won't claim everything is a malformed rune!).
The Reckless search command was only returning a result if you
searched a perfect match, which is not too helpful. This updates the
command so that partial search matches return a result.
Before:
reckless search bolt
Search exhausted all sources
reckless search bol
Search exhausted all sources
reckless search bolt12-pris
Search exhausted all sources
After:
reckless search bolt
Plugins matching 'bolt':
bolt12-prism (https://github.com/lightningd/plugins)
reckless search bol
Plugins matching 'bol':
bolt12-prism (https://github.com/lightningd/plugins)
reckless search bolt12-pris
Plugins matching 'bolt12-pris':
bolt12-prism (https://github.com/lightningd/plugins)
Changelog-Fixed: reckless search now returns partial matches instead of requiring exact plugin names.
Changelog-Fixed: askrene: fixed a class of corner cases that cause askrene main loop to timeout instead of quickly failing, thus wasting runtime.
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Improvements in the fuzz-testing scheme of
`fuzz-bolt12-offer-decode` led to the discovery of test inputs
that result in greater in code coverage.
Add these inputs to the test's seed corpus.