We're doing our own buffering now.
We leave the is_urgent() function for two commits in the future though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Give us a single "next message" function to call. This will be useful
when we want to write more than one at a time.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This requires access to dumpcap. On Ubuntu, at least, this means you
need to be in the "wireshark" group.
We may also need:
sudo ethtool -K lo gro off gso off tso off
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit 888745be16 (dev_disconnect:
remove @ marker.) in v0.11 in April 2022) removed the '@' marker from
our dev_disconnect code, but one test still uses it.
Refactoring this code made it crash on invalid input. The test
triggered a db issue which has been long fixed, so I'm simply removing
it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
l3 doesn't just need to know about l2 (which it can get from the
channel_announcement), but needs to see the node_announcement.
Otherwise:
```
l1, l2 = node_factory.line_graph(2, wait_for_announce=True,
# No onion_message support in l1
opts=[{'dev-force-features': -39},
{'dev-allow-localhost': None}])
l3 = node_factory.get_node()
l3.rpc.connect(l1.info['id'], 'localhost', l1.port)
wait_for(lambda: l3.rpc.listnodes(l2.info['id'])['nodes'] != [])
offer = l2.rpc.call('offer', {'amount': '2msat',
'description': 'simple test'})
> l3.rpc.call('fetchinvoice', {'offer': offer['bolt12']})
tests/test_pay.py:4804:
...
> raise RpcError(method, payload, resp['error'])
E pyln.client.lightning.RpcError: RPC call failed: method: fetchinvoice, payload: {'offer': 'lno1qgsqvgnwgcg35z6ee2h3yczraddm72xrfua9uve2rlrm9deu7xyfzrcgqypq5zmnd9khqmr9yp6x2um5zcssxwz9sqkjtd8qwnx06lxckvu6g8w8t0ue0zsrfqqygj636s4sw7v6'}, error: {'code': 1003, 'message': 'Failed: could not route or connect directly to 033845802d25b4e074ccfd7cd8b339a41dc75bf9978a034800444b51d42b07799a: {"code":400,"message":"Unable to connect, no address known for peer"}'}
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: `gossipd` now uses a `lightning_gossip_compactd` helper to compact the gossip_store on demand, keeping it under about 210MB.
A new subprocess run by gossipd to create a compacted gossip store.
It's pretty simple: a linear compaction of the file. Once it's done the amount it
was told to, then gossipd waits until it completes the last bit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossip_store.c uses this to avoid two reads, and we want to use it
elsewhere too.
Also fix old comment on gossip_store_readhdr().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the file responsible for all the writing, so it should be
responsible for the rewriting if necessary (rather than
gossmap_manage).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This saves gossipd from converting it:
```
lightningd-1 2026-02-02T00:50:49.505Z DEBUG gossipd: Time to convert version 14 store: 890 msec
```
Reducing node startup time from 1.4 seconds to 0.5 seconds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now only need to walk it if we're doing an upgrade.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).
gossmap doesn't care, so gossipd currently has to iterate through the
store to find them at startup. Create a callback for gossipd to use
instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's used by common/gossip_store.c, which is used by many things other than
gossipd. This file belongs in common.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also put this in the store_ended message, too: so you can
tell if the equivalent_offset there really refers to this new
entry (or if two or more rewrites have happened).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's actually quite quick to load a cache-hot 308,874,377 byte
gossip_store (normal -Og build), but perf does show time spent
in siphash(), which is a bit overkill here, so drop that:
Before:
Time to load: 66718983-78037766(7.00553e+07+/-2.8e+06)nsec
After:
Time to load: 54510433-57991725(5.61457e+07+/-1e+06)nsec
We could save maybe 10% more by disabling checksums, but having
that assurance is nice.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our poor scid generation clashes badly with simplified hashing (the
next patch) leading to l1's startup time when using a generated map
moving from 4 seconds to 14 seconds. Under CI it actually timed out
several tests.
Fixing our fake scids to be more "random" reduces it to 1.5 seconds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It didn't do anything, since the dev_compact_gossip_store command was
removed. When we make it do something, it crashes since old_len is 0:
```
gossipd: gossip_store_compact: bad version
gossipd: FATAL SIGNAL 6 (version v25.12rc3-1-g9e6c715-modded)
...
gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7119bd8288fe
gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7119bd82881a
gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7119bd83b516
gossipd: backtrace: gossipd/gossip_store.c:52 (append_msg) 0x56294de240eb
gossipd: backtrace: gossipd/gossip_store.c:358 (gossip_store_compact) 0x56294
gossipd: backtrace: gossipd/gossip_store.c:395 (gossip_store_new) 0x56294de24
gossipd: backtrace: gossipd/gossmap_manage.c:455 (setup_gossmap) 0x56294de255
gossipd: backtrace: gossipd/gossmap_manage.c:488 (gossmap_manage_new) 0x56294
gossipd: backtrace: gossipd/gossipd.c:400 (gossip_init) 0x56294de22de9
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Not a complete decode, just the highlights (what channel was announced
or updated, what node was announced).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's always true for the first hook invocation, but if there is more
than one plugin, it could vanish between the two! In the default configuration, this can't happen.
This bug has been around since v23.02.
Note: we always tell all the plugins about the peer, even if it's
already gone.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: lightningd: possible crash when peers disconnected if there was more than one plugin servicing the `peer_connected` hook.
Reported-by: https://github.com/santyr
Fixes: https://github.com/ElementsProject/lightning/issues/8858
In November 2022 we seemed to increase parallelism from 2 and 3 to 10!
That is a huge load for these CI boxes, and does explain some of our
flakes.
We only run in parallel because some tests sleep, but it's diminishing
returns (GH runners have 4 VCPUs, 16GB RAM).
This reduces it so:
- Normal runs are -n 4
- Valgrind runs are -n 2
- Sanitizer runs are -n 3
If I use my beefy build box (64BG RAM) but reduce it to 4 CPUs:
Time for pytest -n 5:
Time for pytest -n 4:
Time for pytest -n 3:
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Avoids guessing what the timeout should be, use a file trigger. This
is more optimal, and should reduce a flake in test_sql under valgrind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It uses the hold_invoice plugin to ensure that an HTLC is in flight, but
it tells it to hold the HTLC for "TIMEOUT * 2" which is a big number under CI.
Reduce it to sqrt(TIMEOUT + 1) * 2, which works for local testing (I run
with TIMEOUT=10) and still should be enough for CI (TIMEOUT=180).
Christian reported that the test took 763.00s (!!) under CI.
On my build machine (TIMEOUT=90):
Before:
383.00s
After:
64.38s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. If max was 0, we crashed with SIGFPE due to % 0.
2. If min was non-zero, logic was incorrect (but all callers had min == 0).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
2026-01-30T05:55:13.6654636Z # Note that l3 has the whole lease delay (minus blocks already mined)
2026-01-30T05:55:13.6655396Z _, _, l3blocks = l3.wait_for_onchaind_tx('OUR_DELAYED_RETURN_TO_WALLET',
2026-01-30T05:55:13.6656086Z 'OUR_UNILATERAL/DELAYED_OUTPUT_TO_US')
2026-01-30T05:55:13.6656618Z > assert l3blocks == 4032 - 6 - 2 - 1
2026-01-30T05:55:13.6657033Z E assert 4025 == (((4032 - 6) - 2) - 1)
```
Turns out that 4342043382 (tests: de-flake test that was failing on
cltv expiry) added a line to mine two more blocks, but the hardcoded
110 was not changed to 112, so we weren't actually waiting correctly.
Remove hardcoded numbers in favor of calculation, and do the same in
test_channel_lease_post_expiry (which was correct, for now).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Normally, connectd forwards messages and then the subds do logging,
but it logs manually for msgs which are handled internally.
Clarify this logic in one place for all callers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>