Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: `gossipd` now uses a `lightning_gossip_compactd` helper to compact the gossip_store on demand, keeping it under about 210MB.
A new subprocess run by gossipd to create a compacted gossip store.
It's pretty simple: a linear compaction of the file. Once it's done the amount it
was told to, then gossipd waits until it completes the last bit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the file responsible for all the writing, so it should be
responsible for the rewriting if necessary (rather than
gossmap_manage).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now only need to walk it if we're doing an upgrade.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).
gossmap doesn't care, so gossipd currently has to iterate through the
store to find them at startup. Create a callback for gossipd to use
instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's used by common/gossip_store.c, which is used by many things other than
gossipd. This file belongs in common.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also put this in the store_ended message, too: so you can
tell if the equivalent_offset there really refers to this new
entry (or if two or more rewrites have happened).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It didn't do anything, since the dev_compact_gossip_store command was
removed. When we make it do something, it crashes since old_len is 0:
```
gossipd: gossip_store_compact: bad version
gossipd: FATAL SIGNAL 6 (version v25.12rc3-1-g9e6c715-modded)
...
gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7119bd8288fe
gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7119bd82881a
gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7119bd83b516
gossipd: backtrace: gossipd/gossip_store.c:52 (append_msg) 0x56294de240eb
gossipd: backtrace: gossipd/gossip_store.c:358 (gossip_store_compact) 0x56294
gossipd: backtrace: gossipd/gossip_store.c:395 (gossip_store_new) 0x56294de24
gossipd: backtrace: gossipd/gossmap_manage.c:455 (setup_gossmap) 0x56294de255
gossipd: backtrace: gossipd/gossmap_manage.c:488 (gossmap_manage_new) 0x56294
gossipd: backtrace: gossipd/gossipd.c:400 (gossip_init) 0x56294de22de9
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means that we won't complain to peers which gossip about our
channels, but it does mean that our channel graph (like other nodes on
the network) will show two channels, not one, for the duration.
For this reason, we need askrene to omit local dying channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd no longer makes gossip messages, and hasn't since v24.02, so it
doesn't actually need to talk to the hsm daemon.
Also, various comments were out of date, so fix those too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had the test backwards, so we moved it *all the time*. This bloats our gossip store, as well as
not moving it in the case where we need to.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: we would occasionally not show a node announcement in listnodes().
This is immune to things like clock changes, and has the convenient side-effect that
it will *not* be overridden when we override time for developer purposes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Only in developer mode, ofc.
Notes:
1. We have to move the initialization before the lightningd main trace_start,
since that uses pseudorand().
2. To make the results stable, we need to use per-caller values to randbytes().
Otherwise external timing changes the call order.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
You can now simply add per-tal-object helpers for memleak, but our older pattern required
calling memleak functions explicitly during memleak handling. Hash tables in particular need
to be dynamically allocated (we override the allocators using htable_set_allocator and assume
this), so it makes sense to have a helper macro that does all three.
This eliminates a huge amount of code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically, `devtools/reduce-includes.sh */*.c`.
Build time from make clean (RUST=0) (includes building external libs):
Before:
real 0m38.944000-40.416000(40.1131+/-0.4)s
user 3m6.790000-17.159000(15.0571+/-2.8)s
sys 0m35.304000-37.336000(36.8942+/-0.57)s
After:
real 0m37.872000-39.974000(39.5466+/-0.59)s
user 3m1.211000-14.968000(12.4556+/-3.9)s
sys 0m35.008000-36.830000(36.4143+/-0.5)s
Build time after touch config.vars (RUST=0):
Before:
real 0m19.831000-21.862000(21.5528+/-0.58)s
user 2m15.361000-30.731000(28.4798+/-4.4)s
sys 0m21.056000-22.339000(22.0346+/-0.35)s
After:
real 0m18.384000-21.307000(20.8605+/-0.92)s
user 2m5.585000-26.843000(23.6017+/-6.7)s
sys 0m19.650000-22.003000(21.4943+/-0.69)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each header should only include the other headers it needs to compile;
`devtools/reduce-includes.sh */*.h` does this. The C files then need
additional includes if they don't compile.
And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we don't have to manually choose what to link against,
which is much of the complexity of our Makefiles: the compiler will
automatically use any object files it needs to link.
We already do this for ccan as libccan.a, now we have libcommon.a.
We don't link against it for *everything*, as some tests require their own
versions.
Notes:
1. I get rid of the weird plugins/test/Makefile2 (accidental commit?)
2. Many tests change due to update-mocks.
3. In some places I added the missing dependency on the Makefile itself, though most are in the next
patch.
Before:
Total program size: 221366528
Total tests size: 364243856
After:
Total program size: 190733656
Total tests size: 337880888
Build time from make clean (RUST=0) (includes building external libs):
Before:
real 0m38.227000-44.245000(41.8222+/-1.6)s
user 3m2.105000-33.696000(23.1442+/-8.4)s
sys 0m35.054000-42.269000(39.7231+/-2)s
After:
real 0m38.944000-40.416000(40.1131+/-0.4)s
user 3m6.790000-17.159000(15.0571+/-2.8)s
sys 0m35.304000-37.336000(36.8942+/-0.57)s
Build time after touch config.vars (RUST=0):
Before:
real 0m18.928000-22.776000(21.5084+/-1.1)s
user 2m8.613000-36.567000(27.7281+/-7.7)s
sys 0m20.458000-23.436000(22.3963+/-0.77)s
After:
real 0m19.831000-21.862000(21.5528+/-0.58)s
user 2m15.361000-30.731000(28.4798+/-4.4)s
sys 0m21.056000-22.339000(22.0346+/-0.35)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
rusty@rusty-Framework:~/devel/cvs/lightni
This is a last resort, but what else are we supposed to do when we wrote
something and it didn't appear?
In particular, ZFS doesn't just "fix itself":
```
remaining_fd=200001b0c9761dff0000000001009411e26cd56d68aabc285ee1c8ee43d59be6f939b0ce353d80213918680a7438356b9c5ea6bb001a6
bb37a4dea93776f4abc8cd371525b4d1605a74b89d7cb1bfc8865ddf22288c7ea08b9d98b34155b4aed159eb81732957e6bf79b996752bf2a9995aae
ad1d65e7889e826ea0ba42f7746c176fe12f2fe6c04af1a74b4f0a262d20efd57133eb32693c789eb3f09caf4f4c6ecd2f734b3b36e751ffcc2748c5
8feabce4173c4ce6098a2c5397aabf1be5442cb67b5030be11ebd8b9841838dae127fe30000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000002000000a218b9d93000000001005000000000000c060
```
Note the record appended on the end *after all the zeroes*.
Changelog-Changed: gossipd: add gossip_store recovery for filesystems which do not synchronize read and write (e.g. ZFS on Linux), by disabling mmap reads and rewriting the last records.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This should detect partial writes more robustly, since we make a
separate pwrite() call to update this flag after the record is written.
Previously we were playing a bit loose with synchronization assumptions,
which seemed to work on Linux ext4, but not so well elsewhere.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We add `start_batch` to match t-bast’s splicing spec and we add a new internal wire type `WIRE_PROTOCOL_BATCH_ELEMENT` using the type number 0
Changelog-Added: support for `start_batch`
Checking a signature is a CPU-intensive operation that should be performed only
if gossmap doesn't already have the channel announcement in question and we're
not already checking for the announcement's UTxO.
Changelog-Fixed: `gossipd` doesn't waste CPU cycles checking signatures on channel announcements that are already known
Issue: https://github.com/ElementsProject/lightning/issues/7972
Unfortunately a spec typo means the data fields are missing (PR pending),
so we still patch those in.
The message "your_peer_storage" got renamed to "peer_storage_retrieval",
and the option "want_peer_backup_storage" was removed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `experimental-peer-storage` now only advertizes feature 43, not 41.
The vast majority of incoming channel updates seem to be cut due
to age, which results in noisy logs. Similarly, the chanbackup
logging verbosity might better match the equivalent actions in
channeld, which are at the debug level.
Fixes: #8058
Changelog-None: introduced in 25.02
After analyzing various weird cases where we ended up with duplicate
gossip_store entries, it could be explained by us not fully processing
the gossip store.
It's not clear that my assumptions that we would always see our own writes
are true: technically this may require an fsync(). So we now add the
check, and do an fsync and try again.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.
We had at least one report of overwriting the gossip_store file at
offset 1. Make sure this doesn't happen.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's actually the only one that uses it. We also tweak the way
gossip_store handles failure: gossmap_manage now tells it when to
reset the corrupted store.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Default goes to stderr for LOG_UNUSUAL and higher.
We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.
In fact, it fixes a race if we don't have HAVE_PWRITEV.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>