1226 Commits

Author SHA1 Message Date
Rusty Russell
acb8a8cc15 gossipd: dev-compact-gossip-store to manually invoke compaction.
And tests!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
88f3f97b7c gossipd: reset dying_channels array after compact.
Reported-by: @daywalker90
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
912b40aeff gossipd: compact when gossip store is 80% deleted records.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: `gossipd` now uses a `lightning_gossip_compactd` helper to compact the gossip_store on demand, keeping it under about 210MB.
2026-02-16 17:23:33 +10:30
Rusty Russell
15696d97bd gossipd: code to invoke compactd and reopen store.
This isn't called anywhere yet.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
f56f8adcdf gossipd: lightningd/lightning_gossip_compactd
A new subprocess run by gossipd to create a compacted gossip store.

It's pretty simple: a linear compaction of the file.  Once it's done the amount it
was told to, then gossipd waits until it completes the last bit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
1fb4da075f gossipd: put the last_writes array inside struct gossip_store.
This is the file responsible for all the writing, so it should be
responsible for the rewriting if necessary (rather than
gossmap_manage).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
445bcd040a gossipd: don't compact on startup.
We now only need to walk it if we're doing an upgrade.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).
2026-02-16 17:23:33 +10:30
Rusty Russell
dfc4ce21de gossipd: don't gather dying channels during compaction.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
900fd08455 gossipd: use gossmap to load the dying entries.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
1ad8ca9603 gossmap: add callback for gossipd to see dying messages.
gossmap doesn't care, so gossipd currently has to iterate through the
store to find them at startup.  Create a callback for gossipd to use
instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
e8fd235d4e common: move gossip_store_wire.csv into common/ from gossipd/
It's used by common/gossip_store.c, which is used by many things other than
gossipd.  This file belongs in common.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
5dcf39867c gossipd: write uuid record on startup.
This is the first record, and ignored by everything else.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
b1055aa0ac gossip_store: add UUID entry at front of the store.
We also put this in the store_ended message, too: so you can
tell if the equivalent_offset there really refers to this new
entry (or if two or more rewrites have happened).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
ae957161e6 pytest: fix bogus test_gossip_store_compact_noappend test.
It didn't do anything, since the dev_compact_gossip_store command was
removed.  When we make it do something, it crashes since old_len is 0:

```
gossipd: gossip_store_compact: bad version
gossipd: FATAL SIGNAL 6 (version v25.12rc3-1-g9e6c715-modded)
...
gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7119bd8288fe
gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7119bd82881a
gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7119bd83b516
gossipd: backtrace: gossipd/gossip_store.c:52 (append_msg) 0x56294de240eb
gossipd: backtrace: gossipd/gossip_store.c:358 (gossip_store_compact) 0x56294
gossipd: backtrace: gossipd/gossip_store.c:395 (gossip_store_new) 0x56294de24
gossipd: backtrace: gossipd/gossmap_manage.c:455 (setup_gossmap) 0x56294de255
gossipd: backtrace: gossipd/gossmap_manage.c:488 (gossmap_manage_new) 0x56294
gossipd: backtrace: gossipd/gossipd.c:400 (gossip_init) 0x56294de22de9
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
e383e14cb3 connectd: don't be That Node when someone is gossipping crap.
As seen in my logs, we complain about nodes a *lot* (Hi old CLN!).

```
===>1589311 DEBUG   02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-connectd: peer_out WIRE_WARNING
     139993 DEBUG   lightningd: fixup_scan: block 786151 with 1203 txs
      55388 DEBUG   plugin-bcli: Log pruned 1001 entries (mem 10508118 -> 10298662)
      33000 DEBUG   gossipd: Unreasonable timestamp in 0102000a38ec41f9137a5a560dac6effbde059c12cb727344821cbdd4ef46964a4791a0f67cd997499a6062fc8b4284bf1b47a91541fd0e65129505f02e4d08542b16fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000d9d56000ba40001690fe262010100900000000000000001000003e8000001f30000000000989680
      23515 DEBUG   hsmd: Client: Received message 14 from client
      22269 DEBUG   024b9a1fa8e006f1e3937f65f66c408e6da8e1ca728ea43222a7381df1cc449605-hsmd: Got WIRE_HSMD_ECDH_REQ
      14409 DEBUG   gossipd: Enqueueing update for announce 0102002f7e4b4deb19947c67292e70cb22f7fac837fa9ee6269393f3c513d0431d52672e7387625856c19299cfd584e1a3f39e0f98df13c99090df9f4d5cca8446776fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000e216b0008050001692e1c390101009000000000000003e800000000000013880000004526945a00
      12534 DEBUG   gossipd: Previously-rejected announce for 514127x248x1
      10761 DEBUG   02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: Got it!
      10761 DEBUG   02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: ... , awaiting 1120
      10761 DEBUG   02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: Sending master 1020
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-01-27 09:31:02 +10:30
Rusty Russell
241324aa09 gossipd: don't shortcut dying phase for local channels.
This means that we won't complain to peers which gossip about our
channels, but it does mean that our channel graph (like other nodes on
the network) will show two channels, not one, for the duration.

For this reason, we need askrene to omit local dying channels.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-01-08 22:33:19 +10:30
Rusty Russell
d6f6d46c3b gossipd: move timestamp_reasonable into gossmap_manage.c.
It's only used in there anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-12-19 12:37:36 +01:00
Rusty Russell
4c8d656f77 gossipd: don't need hsm fd any more.
gossipd no longer makes gossip messages, and hasn't since v24.02, so it
doesn't actually need to talk to the hsm daemon.

Also, various comments were out of date, so fix those too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-12-19 12:37:36 +01:00
Rusty Russell
f0e9547661 gossipd: make sure we correctly move node announcement when *no* channel preceeds it in the gossip store.
We had the test backwards, so we moved it *all the time*.  This bloats our gossip store, as well as
not moving it in the case where we need to.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: we would occasionally not show a node announcement in listnodes().
2025-12-17 11:56:42 +10:30
Rusty Russell
8b9020d7b9 global: use clock_time in place of time_now().
Except for tracing, that sticks with time_now().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
522457a12b connectd, gossipd, pay, bcli: use timemono when solely measuring duration for timeouts.
This is immune to things like clock changes, and has the convenient side-effect that
it will *not* be overridden when we override time for developer purposes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
806dc89cad gossipd: remove --dev-gossip-time setting, we'll use CLN_DEV_SET_TIME.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
2086699b70 common: add randbytes() wrapper to override cryptographic entropy: $CLN_DEV_ENTROPY_SEED
Only in developer mode, ofc.

Notes:
1. We have to move the initialization before the lightningd main trace_start,
   since that uses pseudorand().
2. To make the results stable, we need to use per-caller values to randbytes().
   Otherwise external timing changes the call order.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
75616f6b77 common: add new_htable() macro to allocate, initialize and setup memleak coverage for any typed hash table.
You can now simply add per-tal-object helpers for memleak, but our older pattern required
calling memleak functions explicitly during memleak handling.  Hash tables in particular need
to be dynamically allocated (we override the allocators using htable_set_allocator and assume
this), so it makes sense to have a helper macro that does all three.

This eliminates a huge amount of code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-24 11:30:17 +10:30
Rusty Russell
6e5cb299dd global: remove unnecessary includes from C files.
Basically, `devtools/reduce-includes.sh */*.c`.

Build time from make clean (RUST=0) (includes building external libs):

Before:
	real    0m38.944000-40.416000(40.1131+/-0.4)s
	user    3m6.790000-17.159000(15.0571+/-2.8)s
	sys     0m35.304000-37.336000(36.8942+/-0.57)s
After:
	real    0m37.872000-39.974000(39.5466+/-0.59)s
	user    3m1.211000-14.968000(12.4556+/-3.9)s
	sys     0m35.008000-36.830000(36.4143+/-0.5)s

Build time after touch config.vars (RUST=0):

Before:
	real    0m19.831000-21.862000(21.5528+/-0.58)s
	user    2m15.361000-30.731000(28.4798+/-4.4)s
	sys     0m21.056000-22.339000(22.0346+/-0.35)s

After:
	real    0m18.384000-21.307000(20.8605+/-0.92)s
	user    2m5.585000-26.843000(23.6017+/-6.7)s
	sys     0m19.650000-22.003000(21.4943+/-0.69)s

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-23 06:44:04 +10:30
Rusty Russell
f6a4e79420 global: remove unnecessary includes from headers.
Each header should only include the other headers it needs to compile;
`devtools/reduce-includes.sh */*.h` does this.  The C files then need
additional includes if they don't compile.

And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-23 06:44:04 +10:30
Rusty Russell
e120f87083 Makefile: create a library containing common, wire and bitcoin objects.
This means we don't have to manually choose what to link against,
which is much of the complexity of our Makefiles: the compiler will
automatically use any object files it needs to link.

We already do this for ccan as libccan.a, now we have libcommon.a.

We don't link against it for *everything*, as some tests require their own
versions.

Notes:
1. I get rid of the weird plugins/test/Makefile2 (accidental commit?)
2. Many tests change due to update-mocks.
3. In some places I added the missing dependency on the Makefile itself, though most are in the next
   patch.

Before:
	Total program size:     221366528
	Total tests size:       364243856

After:
	Total program size:     190733656
	Total tests size:       337880888

Build time from make clean (RUST=0) (includes building external libs):

Before:
	real    0m38.227000-44.245000(41.8222+/-1.6)s
	user    3m2.105000-33.696000(23.1442+/-8.4)s
	sys     0m35.054000-42.269000(39.7231+/-2)s
After:
	real    0m38.944000-40.416000(40.1131+/-0.4)s
	user    3m6.790000-17.159000(15.0571+/-2.8)s
	sys     0m35.304000-37.336000(36.8942+/-0.57)s

Build time after touch config.vars (RUST=0):

Before:
	real    0m18.928000-22.776000(21.5084+/-1.1)s
	user    2m8.613000-36.567000(27.7281+/-7.7)s
	sys     0m20.458000-23.436000(22.3963+/-0.77)s

After:
	real    0m19.831000-21.862000(21.5528+/-0.58)s
	user    2m15.361000-30.731000(28.4798+/-4.4)s
	sys     0m21.056000-22.339000(22.0346+/-0.35)s

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rusty@rusty-Framework:~/devel/cvs/lightni
2025-10-23 06:44:04 +10:30
Rusty Russell
75164d2c81 gossipd: save gossip store writes, try them again (and fsync) if we get a read issue.
This is a last resort, but what else are we supposed to do when we wrote
something and it didn't appear?

In particular, ZFS doesn't just "fix itself":

```
remaining_fd=200001b0c9761dff0000000001009411e26cd56d68aabc285ee1c8ee43d59be6f939b0ce353d80213918680a7438356b9c5ea6bb001a6
bb37a4dea93776f4abc8cd371525b4d1605a74b89d7cb1bfc8865ddf22288c7ea08b9d98b34155b4aed159eb81732957e6bf79b996752bf2a9995aae
ad1d65e7889e826ea0ba42f7746c176fe12f2fe6c04af1a74b4f0a262d20efd57133eb32693c789eb3f09caf4f4c6ecd2f734b3b36e751ffcc2748c5
8feabce4173c4ce6098a2c5397aabf1be5442cb67b5030be11ebd8b9841838dae127fe30000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000002000000a218b9d93000000001005000000000000c060
```

Note the record appended on the end *after all the zeroes*.

Changelog-Changed: gossipd: add gossip_store recovery for filesystems which do not synchronize read and write (e.g. ZFS on Linux), by disabling mmap reads and rewriting the last records.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Rusty Russell
5b3f3270f4 gossmap: use gossmap_disable_mmap() on corruption.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Rusty Russell
9fb8870f92 gossip: add COMPLETED bit to mark records which are complete.
This should detect partial writes more robustly, since we make a
separate pwrite() call to update this flag after the record is written.

Previously we were playing a bit loose with synchronization assumptions,
which seemed to work on Linux ext4, but not so well elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Rusty Russell
f882b69dfe gossipd: remove gossmap_fetch_tail.
It only gets called for diagnostics when something goes wrong (and we
were going to exit anyway), and it's only useful with mmap (which we now disable
on error) but it shouldn't crash:

```
**BROKEN** gossipd: Truncated gossmap record @7991501/7991523 (len 0): waiting
**BROKEN** gossipd: FATAL SIGNAL 6 (version v25.09)                                            
**BROKEN** gossipd: backtrace: common/daemon.c:41 (send_backtrace) 0x6506817cc529
**BROKEN** gossipd: backtrace: common/daemon.c:78 (crashdump) 0x6506817cc578
**BROKEN** gossipd: backtrace: ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 ((null)) 0x75e8267a032f
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:44 (__pthread_kill_implementation) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:78 (__pthread_kill_internal) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:89 (__GI___pthread_kill) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ../sysdeps/posix/raise.c:26 (__GI_raise) 0x75e8267a027d
**BROKEN** gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x75e8267838fe
**BROKEN** gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x75e82678381a
**BROKEN** gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x75e826796516
**BROKEN** gossipd: backtrace: common/gossmap.c:111 (map_copy) 0x6506817cea77
**BROKEN** gossipd: backtrace: common/gossmap.c:1870 (gossmap_fetch_tail) 0x6506817d1f93
**BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:1442 (gossmap_manage_get_gossmap) 0x6506817c45fb
**BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:753 (gossmap_manage_handle_get_txout_reply) 0x6506817c5850
**BROKEN** gossipd: backtrace: gossipd/gossipd.c:574 (recv_req) 0x6506817c172b
```

Reported-by: @grubles
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Dusty Daemon
07f4bc39b1 splice: Add start_batch and an internal wire type
We add `start_batch` to match t-bast’s splicing spec and we add a new internal wire type `WIRE_PROTOCOL_BATCH_ELEMENT` using the type number 0

Changelog-Added: support for `start_batch`
2025-08-14 16:40:04 +09:30
Matt Whitlock
3f6cd59dc9 gossipd: check for existing channel announcement before sigcheck
Checking a signature is a CPU-intensive operation that should be performed only
if gossmap doesn't already have the channel announcement in question and we're
not already checking for the announcement's UTxO.

Changelog-Fixed: `gossipd` doesn't waste CPU cycles checking signatures on channel announcements that are already known
Issue: https://github.com/ElementsProject/lightning/issues/7972
2025-06-10 16:40:33 -05:00
Rusty Russell
6b77c95b95 gossipd: don't spam the log on duplicate channel_update.
This message was too verbose (even for trace!)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-05-15 16:40:33 +09:30
Rusty Russell
9a967d6770 gossipd: don't try to connect to ourselves if we need more peers.
Reported-by: JssDWt
Closes: https://github.com/ElementsProject/lightning/issues/8200
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-None: trivial
2025-05-08 15:30:14 -05:00
Rusty Russell
fff054f8ee gossipd: fix false memleak positive in gossmap_manage
If there are pending channel announcements, they'll look like a leak unless we scan into
the maps.

```
lightningd-2 2025-05-01T07:27:03.922Z **BROKEN** gossipd: MEMLEAK: 0x60d000000478
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:   label=gossipd/gossmap_manage.c:595:struct pending_cannounce
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:   alloc:
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:488 (tal_alloc_)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:595 (gossmap_manage_channel_announcement)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/gossipd/gossipd.c:205 (handle_recv_gossip)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/gossipd/gossipd.c:300 (connectd_req)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/common/daemon_conn.c:35 (handle_read)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:60 (next_plan)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:422 (do_plan)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:439 (io_ready)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:455 (io_loop)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/gossipd/gossipd.c:660 (main)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main)
lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd:     ../csu/libc-start.c:392 (__libc_start_main_impl)
lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd:   parents:
lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd:     gossipd/gossmap_manage.c:475:struct gossmap_manage
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-05-08 14:01:38 +09:30
Rusty Russell
733efcf7dd BOLTs: import spec additions for option_simple_close.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-03-18 14:30:58 +10:30
Rusty Russell
67c91a7e5c BOLTs: Update to version with peer storage merged.
Unfortunately a spec typo means the data fields are missing (PR pending),
so we still patch those in.

The message "your_peer_storage" got renamed to "peer_storage_retrieval",
and the option "want_peer_backup_storage" was removed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `experimental-peer-storage` now only advertizes feature 43, not 41.
2025-03-18 14:30:58 +10:30
Alex Myers
1e5a577fb2 gossipd: fix typo
Changelog-None
2025-03-03 12:25:26 -06:00
Alex Myers
4f7df828b4 gossipd, chanbackup: reduce logging levels
The vast majority of incoming channel updates seem to be cut due
to age, which results in noisy logs.  Similarly, the chanbackup
logging verbosity might better match the equivalent actions in
channeld, which are at the debug level.

Fixes: #8058

Changelog-None: introduced in 25.02
2025-02-26 14:15:13 +10:30
Rusty Russell
d554206d7e gossipd: fix bogus message when dying channel is pruned.
```
2025-01-23T12:31:52.528Z DEBUG   gossipd: Pruning channel 839050x1246x0 from network view (ages 1736283379 and 1737600120)
2025-01-27T00:32:01.631Z DEBUG   gossipd: Pruning channel 839050x1246x0 from network view (ages 0 and 1737686520)
2025-01-27T00:50:05.998Z **BROKEN** gossipd: Dying channel 839050x1246x0 already deleted?
```

Easiest not to prune in this case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
2b4b1479ed gossipd: check that gossmap code sees updates from gossip_store writes.
After analyzing various weird cases where we ended up with duplicate
gossip_store entries, it could be explained by us not fully processing
the gossip store.

It's not clear that my assumptions that we would always see our own writes
are true: technically this may require an fsync().  So we now add the
check, and do an fsync and try again.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.
2025-02-11 15:11:47 -06:00
Rusty Russell
8156c83e11 gossipd: check that we are always appending.
We had at least one report of overwriting the gossip_store file at
offset 1.  Make sure this doesn't happen.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
769ccaa4c3 gossipd: correctly process dying channels.
Found by inspection.  Minor bug, since we'll catch it on the next block,
but annoying.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
1df1300cc9 gossip_store: don't need to check for truncated amounts.
That's actually caught by the gossmap load now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
9d98740e18 gossmap: stricter checks when gossipd itself loads the gossip_store.
This means we will correctly reset the store if it has redundant
records, for example.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
4f2a7039c6 gossipd: put gossip_store pointer inside gossmap_manage.
It's actually the only one that uses it.  We also tweak the way
gossip_store handles failure: gossmap_manage now tells it when to
reset the corrupted store.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
fdfc7ce62f gossmap: add (and use) logging hook.
Default goes to stderr for LOG_UNUSUAL and higher.

We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
607b14fe12 common/gossmap: remove open-by-fd.
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.

In fact, it fixes a race if we don't have HAVE_PWRITEV.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
01650ebcd7 gossipd: make sure we never write bad entries.
We have reports of crashes on reading gossip_store, including from gossipd itself!

```
lightning_gossipd: common/gossmap.c:121: map_copy: Assertion `offset + len <= map->map_size' failed.
...
lightning_gossipd: FATAL SIGNAL (version v24.11)
0x6260c41d682a send_backtrace
  common/daemon.c:33
0x6260c41e098b status_failed
  common/status.c:221
0x6260c41e0b41 status_backtrace_exit
  common/subdaemon.c:18
0x6260c41d68b8 crashdump
  common/daemon.c:78
0x70508ea6913f ???
  ???:0
0x70508e8a0d51 ???
  ???:0
0x70508e88a536 ???
  ???:0
0x70508e88a40e ???
  ???:0
0x70508e8996d1 ???
  ???:0
0x6260c41d8b69 map_copy
  common/gossmap.c:121
0x6260c41d8bab map_be16
  common/gossmap.c:142
0x6260c41daa45 map_catchup
  common/gossmap.c:705
0x6260c41dab95 gossmap_refresh_mayfail
  common/gossmap.c:1192
0x6260c41daca6 gossmap_refresh
  common/gossmap.c:1213
0x6260c41cee32 gossmap_manage_get_gossmap
  gossipd/gossmap_manage.c:1314
0x6260c41d0686 gossmap_manage_new_block
  gossipd/gossmap_manage.c:1221
0x6260c41cbfdd new_blockheight
  gossipd/gossipd.c:473
0x6260c41cc363 recv_req
  gossipd/gossipd.c:584
0x6260c41d6b1d handle_read
  common/daemon_conn.c:35
0x6260c43175b5 next_plan
  ccan/ccan/io/io.c:60
0x6260c4317a40 do_plan
  ccan/ccan/io/io.c:422
0x6260c4317af9 io_ready
  ccan/ccan/io/io.c:439
0x6260c4319446 io_loop
  ccan/ccan/io/poll.c:455
0x6260c41cccf4 main
  gossipd/gossipd.c:665
```

This implies that we have a message shorter than 2 bytes, which should never happen.

An audit didn't shed any light, but let's make sure we don't ever write such a thing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00