Basically, `devtools/reduce-includes.sh */*.c`.
Build time from make clean (RUST=0) (includes building external libs):
Before:
real 0m38.944000-40.416000(40.1131+/-0.4)s
user 3m6.790000-17.159000(15.0571+/-2.8)s
sys 0m35.304000-37.336000(36.8942+/-0.57)s
After:
real 0m37.872000-39.974000(39.5466+/-0.59)s
user 3m1.211000-14.968000(12.4556+/-3.9)s
sys 0m35.008000-36.830000(36.4143+/-0.5)s
Build time after touch config.vars (RUST=0):
Before:
real 0m19.831000-21.862000(21.5528+/-0.58)s
user 2m15.361000-30.731000(28.4798+/-4.4)s
sys 0m21.056000-22.339000(22.0346+/-0.35)s
After:
real 0m18.384000-21.307000(20.8605+/-0.92)s
user 2m5.585000-26.843000(23.6017+/-6.7)s
sys 0m19.650000-22.003000(21.4943+/-0.69)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each header should only include the other headers it needs to compile;
`devtools/reduce-includes.sh */*.h` does this. The C files then need
additional includes if they don't compile.
And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can happen with other subdaemons too, on ZFS on Linux:
```
2025-09-24T13:51:22.703Z **BROKEN** connectd: Bad checksum on gossmap record @9850670/9851114 should be 3379961343 (01009411e26cd56d68aabc285ee1c8ee43d59be6f939b0ce353d80213918680a7438356b9c5ea6bb001a6bb37a4dea93776f4abc8cd371525b4d1605a74b89d7cb1bfc8865ddf22288c7ea08b9d98b34155b4aed159eb81732957e6bf79b996752bf2a9995aaead1d65e7889e826ea0ba42f7746c176fe12f2fe6c04af1a74b4f0a262d20efd57133eb32693c789eb3f09caf4f4c6ecd2f734b3b36e751ffcc2748c58feabce4173c4ce6098a2c5397aabf1be5442cb67b5030be11ebd8b9841838dae127fe30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
```
Reported-by: @grubles
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We add `start_batch` to match t-bast’s splicing spec and we add a new internal wire type `WIRE_PROTOCOL_BATCH_ELEMENT` using the type number 0
Changelog-Added: support for `start_batch`
We handed NULL as the logcb, resulting in a very uninformative crash:
```
2025-03-14T03:46:36.447Z INFO lightningd: Server started with public key 03d67f36c4f81789e2fe425028bacc96b199813eae426c517f589a45f1136c1fe5, alias Jubilee (color #dc42f4) and lightningd v25.02
topology: FATAL SIGNAL 11 (version v25.02)
0x560037f64aad send_backtrace
common/daemon.c:33
0x560037f64b49 crashdump
common/daemon.c:78
0x7f6c41ff351f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x0 ???
???:0
```
Changelog-Fixed: `topology` crash on invoice creation if a peer had a really high feerate.
Fixes: https://github.com/ElementsProject/lightning/issues/8156
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Unfortunately a spec typo means the data fields are missing (PR pending),
so we still patch those in.
The message "your_peer_storage" got renamed to "peer_storage_retrieval",
and the option "want_peer_backup_storage" was removed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `experimental-peer-storage` now only advertizes feature 43, not 41.
After analyzing various weird cases where we ended up with duplicate
gossip_store entries, it could be explained by us not fully processing
the gossip store.
It's not clear that my assumptions that we would always see our own writes
are true: technically this may require an fsync(). So we now add the
check, and do an fsync and try again.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.
Instead of making a copy.
To measure the performance impact, I timed
tests/test_askrene.py::test_real_biases on my laptop.
No checksum check: 194.52s
Copying for checksum check: 202.81s
Zero-copy checksum check: 194.40s
But these numbers proved noisy. Still, doesn't hurt.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assume if it's incorrect, we simply need to wait. If this proves incorrect,
we will see a stream of BROKEN log messages.
To measure the performance impact, I timed
tests/test_askrene.py::test_real_biases on my laptop.
Before: 194.52s
After: 202.81s
So it's marginal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
While this shouldn't happen, it does (pending other fixes), and we stop reading the
gossip store until next time. The result is partial gossip, demonstrated beautifully
by NicolasDorier's report:
```
lightning_gossipd: gossmap: redundant channel_announce for 864063x1306x1, offsets 1272259 and 1784859!"
```
Gossipd stalld there and don't make more progress. So gossipd itself
doesn't see the entire gossip_store.
Then things get really batshit:
```
2025-02-04T05:53:28.582Z DEBUG gossipd: Store compact time: 1429910 msec
```
This took 1429 seconds to process. Why?
Because it hasn't been processing the gossip store fully, gossipd kept adding "new" records to the end:
```
2025-02-04T05:53:28.583Z DEBUG gossipd: gossip_store: Read 62716143/1739952/5158256/0 cannounce/cupdate/nannounce/delete from store in 31634458462 bytes, now 31634458440 bytes (populated=true)
```
It has 31GB of gossip in there! No wonder it took so long...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/8035
Changelog-Fixed: gossipd: corruption in the gossip_store could cause ever-longer startup times and no gossip updates.
Default goes to stderr for LOG_UNUSUAL and higher.
We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.
In fact, it fixes a race if we don't have HAVE_PWRITEV.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a report of this happening under ZFS. We cannot do much if
this really is a problem where we can't read back what we write, but
this avoids the immediate crash.
Fixes: https://github.com/ElementsProject/lightning/issues/7971
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossmap: occasional crash (at least on ZFS) reading gossip_store.
The updated API requires typed htables to explicitly state whether they
allow duplicates: for most cases we don't, but we've had issues in the
past.
This is a big patch, but mainly mechanical.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, this lets you find the exact htlc_maximum_msat/htlc_minimum_msat
values.
This means we actually create real channel_updates for local mods, which
requires a second "local" scratch region.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we don't compact the gossmap on the fly (FIXME!) we can
easily surpass 4GB in the gossmap, and 32 bit offsets are not
sufficient.
I'm a bit surprised we don't crash immediately, but we've definitely
seen issues.
Changelog-Fixed: gossipd: crash errors with large gossip_store (>4MB) growth on longer-running nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It was weird not to have a capacity associated with localmods channels, and
fixing it has some very nice side effects.
Now the gossmap_chan_get_capacity() call never fails (we prevented reading
of channels from gossmap in the partially-written case already), so we
make it return the capacity. We do this in msat, because that's what
all the callers want.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is actually what we want in several places: to only override one or
two fields in a channel_update.
We add a gossmap_local_setchan() with a similar API to the old
gossmap_local_updatechan(), for the case where we want to set every
field.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We allow adding them, but crash when we remove the localmods. Yet
this could theoretically happen if a channel we modified was removed
from the gossmap, anyway.
Reported-by: Lagrang3 <lagrang3@protonmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies the callers significantly: all channel_announcements now
have an amount, so gossmap_chan_get_capacity() only fails on a local
modification.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only write these in two places: one where we get a message from lightningd about
our own channel, and one where we get a reply from lightningd about a txout check.
The former case we explicitly check that we don't already have it in gossmap, so
add checks to the latter case, and give verbose detail if it's found.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a u64, we should pass by copy. This is a big sweeping change,
but mainly mechanical (change one, compile, fix breakage, repeat).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Wrote a test program which passed num_channel_updates_rejected as NULL
(which we don't usually do), and valgrind complained:
```
==1048302== Conditional jump or move depends on uninitialised value(s)
==1048302== at 0x118B90: update_channel (gossmap.c:550)
==1048302== by 0x119EEE: map_catchup (gossmap.c:663)
==1048302== by 0x11A299: load_gossip_store (gossmap.c:726)
==1048302== by 0x11A352: gossmap_load (gossmap.c:1052)
==1048302== by 0x125362: main (run-route-infloop.c:90)
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Thanks to amazing debugging assistance from grubles, we figured out
that indeed, my memory was correct: write and mmap are not consistent
on all platforms. The easiest fix is to disable mmap on OpenBSD for now:
the better fix is to do in-place updates using the mmap, and only rely
on write() for append (which always causes a remap anyway before it's accessed).
Fixes: https://github.com/ElementsProject/lightning/issues/7109
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We never enabled it, because we seemed to be eliminating valid
channels. We discard zombie-marked records on loading.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, allow callers to see unknown records we ignore (and let
them fail as a result), and get called if we can't pack a
channel_update into our internal format.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The only way you'll see private channel_updates is if you put them
there yourself with localmods.
I also renamed the confusing gossmap_chan_capacity to gossmap_chan_has_capacity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Doesn't happen on x86, but struct gossmap_chan defines:
```
u32 private: 1;
u32 plus_scid_off: 31;
```
And complains when we initialize plus_scid_off and access it later:
```
VALGRIND=1 valgrind -q --error-exitcode=7 --track-origins=yes --leak-check=full --show-reachable=yes --errors-for-leak-kinds=all plugins/renepay/test/run-mcf > /dev/null
==186886== Conditional jump or move depends on uninitialised value(s)
==186886== at 0x10076388: chan_iter (gossmap.c:1098)
==186886== by 0x100797F3: gossmap_next_chan (gossmap.c:1112)
==186886== by 0x1008C5AF: main (run-mcf.c:309)
==186886== Uninitialised value was created by a heap allocation
==186886== at 0x40F0A44: malloc (vg_replace_malloc.c:431)
==186886== by 0x10072BAF: allocate (tal.c:256)
==186886== by 0x100737A7: tal_alloc_ (tal.c:463)
==186886== by 0x100738DF: tal_alloc_arr_ (tal.c:506)
==186886== by 0x10079507: load_gossip_store (gossmap.c:690)
==186886== by 0x10079667: gossmap_load (gossmap.c:978)
==186886== by 0x1008C4AF: main (run-mcf.c:295)
```
Reported-by: @grubles
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #6557
This will fix a crash that I caused on armv7
and by looking inside the coredump with gdb
(by adding an assert on n that must be
different from null) I get the following stacktrace
```
(gdb) bt
\#0 0x00000000 in ?? ()
\#1 0x0043a038 in send_backtrace (why=0xbe9e3600 "FATAL SIGNAL 11") at common/daemon.c:36
\#2 0x0043a0ec in crashdump (sig=11) at common/daemon.c:46
\#3 <signal handler called>
\#4 0x00406d04 in node_announcement (map=0x938ecc, nann_off=495146) at common/gossmap.c:586
\#5 0x00406fec in map_catchup (map=0x938ecc, num_rejected=0xbe9e3a40) at common/gossmap.c:643
\#6 0x004073a4 in load_gossip_store (map=0x938ecc, num_rejected=0xbe9e3a40) at common/gossmap.c:697
\#7 0x00408244 in gossmap_load (ctx=0x0, filename=0x4e16b8 "gossip_store", num_channel_updates_rejected=0xbe9e3a40) at common/gossmap.c:976
\#8 0x0041a548 in init (p=0x93831c, buf=0x9399d4 "\n\n{\"jsonrpc\":\"2.0\",\"id\":\"cln:init#25\",\"method\":\"init\",\"params\":{\"options\":{},\"configuration\":{\"lightning-dir\":\"/home/vincent/.lightning/testnet\",\"rpc-file\":\"lightning-rpc\",\"startup\":true,\"network\":\"te"..., config=0x939cdc) at plugins/topology.c:622
\#9 0x0041e5d0 in handle_init (cmd=0x938934, buf=0x9399d4 "\n\n{\"jsonrpc\":\"2.0\",\"id\":\"cln:init#25\",\"method\":\"init\",\"params\":{\"options\":{},\"configuration\":{\"lightning-dir\":\"/home/vincent/.lightning/testnet\",\"rpc-file\":\"lightning-rpc\",\"startup\":true,\"network\":\"te"..., params=0x939c8c)
at plugins/libplugin.c:1208
\#10 0x0041fc04 in ld_command_handle (plugin=0x93831c, toks=0x939bec) at plugins/libplugin.c:1572
\#11 0x00420050 in ld_read_json_one (plugin=0x93831c) at plugins/libplugin.c:1667
\#12 0x004201bc in ld_read_json (conn=0x9391c4, plugin=0x93831c) at plugins/libplugin.c:1687
\#13 0x004cb82c in next_plan (conn=0x9391c4, plan=0x9391d8) at ccan/ccan/io/io.c:59
\#14 0x004cc67c in do_plan (conn=0x9391c4, plan=0x9391d8, idle_on_epipe=false) at ccan/ccan/io/io.c:407
\#15 0x004cc6dc in io_ready (conn=0x9391c4, pollflags=1) at ccan/ccan/io/io.c:417
\#16 0x004cf8cc in io_loop (timers=0x9383c4, expired=0xbe9e3ce4) at ccan/ccan/io/poll.c:453
\#17 0x00420af4 in plugin_main (argv=0xbe9e3eb4, init=0x41a46c <init>, restartability=PLUGIN_STATIC, init_rpc=true, features=0x0, commands=0x6167e8 <commands>, num_commands=4, notif_subs=0x0, num_notif_subs=0, hook_subs=0x0, num_hook_subs=0, notif_topics=0x0, num_notif_topics=0) at plugins/libplugin.c:1891
\#18 0x0041a6f8 in main (argc=1, argv=0xbe9e3eb4) at plugins/topology.c:679
```
I do not know if this is a solution because I do not know
when I can parse a node announcement for a node that
it is not longer in the gossip map.
So, I hope this is just usefult for @rustyrussell
Changelog-Fixed: fixes `FATAL SIGNAL 11` on gossmap node announcement parsing.
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
It's actually two separate u16 fields, so actually treat it as
such!
Cleans up zombie handling code a bit too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>