185 Commits

Author SHA1 Message Date
Rusty Russell
15696d97bd gossipd: code to invoke compactd and reopen store.
This isn't called anywhere yet.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
1fb4da075f gossipd: put the last_writes array inside struct gossip_store.
This is the file responsible for all the writing, so it should be
responsible for the rewriting if necessary (rather than
gossmap_manage).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
445bcd040a gossipd: don't compact on startup.
We now only need to walk it if we're doing an upgrade.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).
2026-02-16 17:23:33 +10:30
Rusty Russell
dfc4ce21de gossipd: don't gather dying channels during compaction.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
e8fd235d4e common: move gossip_store_wire.csv into common/ from gossipd/
It's used by common/gossip_store.c, which is used by many things other than
gossipd.  This file belongs in common.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
5dcf39867c gossipd: write uuid record on startup.
This is the first record, and ignored by everything else.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
b1055aa0ac gossip_store: add UUID entry at front of the store.
We also put this in the store_ended message, too: so you can
tell if the equivalent_offset there really refers to this new
entry (or if two or more rewrites have happened).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
ae957161e6 pytest: fix bogus test_gossip_store_compact_noappend test.
It didn't do anything, since the dev_compact_gossip_store command was
removed.  When we make it do something, it crashes since old_len is 0:

```
gossipd: gossip_store_compact: bad version
gossipd: FATAL SIGNAL 6 (version v25.12rc3-1-g9e6c715-modded)
...
gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7119bd8288fe
gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7119bd82881a
gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7119bd83b516
gossipd: backtrace: gossipd/gossip_store.c:52 (append_msg) 0x56294de240eb
gossipd: backtrace: gossipd/gossip_store.c:358 (gossip_store_compact) 0x56294
gossipd: backtrace: gossipd/gossip_store.c:395 (gossip_store_new) 0x56294de24
gossipd: backtrace: gossipd/gossmap_manage.c:455 (setup_gossmap) 0x56294de255
gossipd: backtrace: gossipd/gossmap_manage.c:488 (gossmap_manage_new) 0x56294
gossipd: backtrace: gossipd/gossipd.c:400 (gossip_init) 0x56294de22de9
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-02-16 17:23:33 +10:30
Rusty Russell
8b9020d7b9 global: use clock_time in place of time_now().
Except for tracing, that sticks with time_now().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
522457a12b connectd, gossipd, pay, bcli: use timemono when solely measuring duration for timeouts.
This is immune to things like clock changes, and has the convenient side-effect that
it will *not* be overridden when we override time for developer purposes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
6e5cb299dd global: remove unnecessary includes from C files.
Basically, `devtools/reduce-includes.sh */*.c`.

Build time from make clean (RUST=0) (includes building external libs):

Before:
	real    0m38.944000-40.416000(40.1131+/-0.4)s
	user    3m6.790000-17.159000(15.0571+/-2.8)s
	sys     0m35.304000-37.336000(36.8942+/-0.57)s
After:
	real    0m37.872000-39.974000(39.5466+/-0.59)s
	user    3m1.211000-14.968000(12.4556+/-3.9)s
	sys     0m35.008000-36.830000(36.4143+/-0.5)s

Build time after touch config.vars (RUST=0):

Before:
	real    0m19.831000-21.862000(21.5528+/-0.58)s
	user    2m15.361000-30.731000(28.4798+/-4.4)s
	sys     0m21.056000-22.339000(22.0346+/-0.35)s

After:
	real    0m18.384000-21.307000(20.8605+/-0.92)s
	user    2m5.585000-26.843000(23.6017+/-6.7)s
	sys     0m19.650000-22.003000(21.4943+/-0.69)s

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-23 06:44:04 +10:30
Rusty Russell
f6a4e79420 global: remove unnecessary includes from headers.
Each header should only include the other headers it needs to compile;
`devtools/reduce-includes.sh */*.h` does this.  The C files then need
additional includes if they don't compile.

And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-23 06:44:04 +10:30
Rusty Russell
75164d2c81 gossipd: save gossip store writes, try them again (and fsync) if we get a read issue.
This is a last resort, but what else are we supposed to do when we wrote
something and it didn't appear?

In particular, ZFS doesn't just "fix itself":

```
remaining_fd=200001b0c9761dff0000000001009411e26cd56d68aabc285ee1c8ee43d59be6f939b0ce353d80213918680a7438356b9c5ea6bb001a6
bb37a4dea93776f4abc8cd371525b4d1605a74b89d7cb1bfc8865ddf22288c7ea08b9d98b34155b4aed159eb81732957e6bf79b996752bf2a9995aae
ad1d65e7889e826ea0ba42f7746c176fe12f2fe6c04af1a74b4f0a262d20efd57133eb32693c789eb3f09caf4f4c6ecd2f734b3b36e751ffcc2748c5
8feabce4173c4ce6098a2c5397aabf1be5442cb67b5030be11ebd8b9841838dae127fe30000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000002000000a218b9d93000000001005000000000000c060
```

Note the record appended on the end *after all the zeroes*.

Changelog-Changed: gossipd: add gossip_store recovery for filesystems which do not synchronize read and write (e.g. ZFS on Linux), by disabling mmap reads and rewriting the last records.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Rusty Russell
9fb8870f92 gossip: add COMPLETED bit to mark records which are complete.
This should detect partial writes more robustly, since we make a
separate pwrite() call to update this flag after the record is written.

Previously we were playing a bit loose with synchronization assumptions,
which seemed to work on Linux ext4, but not so well elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Rusty Russell
2b4b1479ed gossipd: check that gossmap code sees updates from gossip_store writes.
After analyzing various weird cases where we ended up with duplicate
gossip_store entries, it could be explained by us not fully processing
the gossip store.

It's not clear that my assumptions that we would always see our own writes
are true: technically this may require an fsync().  So we now add the
check, and do an fsync and try again.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.
2025-02-11 15:11:47 -06:00
Rusty Russell
8156c83e11 gossipd: check that we are always appending.
We had at least one report of overwriting the gossip_store file at
offset 1.  Make sure this doesn't happen.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
1df1300cc9 gossip_store: don't need to check for truncated amounts.
That's actually caught by the gossmap load now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
4f2a7039c6 gossipd: put gossip_store pointer inside gossmap_manage.
It's actually the only one that uses it.  We also tweak the way
gossip_store handles failure: gossmap_manage now tells it when to
reset the corrupted store.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
607b14fe12 common/gossmap: remove open-by-fd.
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.

In fact, it fixes a race if we don't have HAVE_PWRITEV.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
01650ebcd7 gossipd: make sure we never write bad entries.
We have reports of crashes on reading gossip_store, including from gossipd itself!

```
lightning_gossipd: common/gossmap.c:121: map_copy: Assertion `offset + len <= map->map_size' failed.
...
lightning_gossipd: FATAL SIGNAL (version v24.11)
0x6260c41d682a send_backtrace
  common/daemon.c:33
0x6260c41e098b status_failed
  common/status.c:221
0x6260c41e0b41 status_backtrace_exit
  common/subdaemon.c:18
0x6260c41d68b8 crashdump
  common/daemon.c:78
0x70508ea6913f ???
  ???:0
0x70508e8a0d51 ???
  ???:0
0x70508e88a536 ???
  ???:0
0x70508e88a40e ???
  ???:0
0x70508e8996d1 ???
  ???:0
0x6260c41d8b69 map_copy
  common/gossmap.c:121
0x6260c41d8bab map_be16
  common/gossmap.c:142
0x6260c41daa45 map_catchup
  common/gossmap.c:705
0x6260c41dab95 gossmap_refresh_mayfail
  common/gossmap.c:1192
0x6260c41daca6 gossmap_refresh
  common/gossmap.c:1213
0x6260c41cee32 gossmap_manage_get_gossmap
  gossipd/gossmap_manage.c:1314
0x6260c41d0686 gossmap_manage_new_block
  gossipd/gossmap_manage.c:1221
0x6260c41cbfdd new_blockheight
  gossipd/gossipd.c:473
0x6260c41cc363 recv_req
  gossipd/gossipd.c:584
0x6260c41d6b1d handle_read
  common/daemon_conn.c:35
0x6260c43175b5 next_plan
  ccan/ccan/io/io.c:60
0x6260c4317a40 do_plan
  ccan/ccan/io/io.c:422
0x6260c4317af9 io_ready
  ccan/ccan/io/io.c:439
0x6260c4319446 io_loop
  ccan/ccan/io/poll.c:455
0x6260c41cccf4 main
  gossipd/gossipd.c:665
```

This implies that we have a message shorter than 2 bytes, which should never happen.

An audit didn't shed any light, but let's make sure we don't ever write such a thing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
ebf784ef9c gossipd: use u64 for the one offset we don't.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-10-08 09:50:17 +02:00
Rusty Russell
744116e501 gossipd: make extra-sure we don't put in redundant channel_announcement messages.
We only write these in two places: one where we get a message from lightningd about
our own channel, and one where we get a reply from lightningd about a txout check.

The former case we explicitly check that we don't already have it in gossmap, so
add checks to the latter case, and give verbose detail if it's found.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-05-23 20:23:36 +02:00
Rusty Russell
1350194698 gossipd: don't assert on redundant flag write, just log message.
This happens to Vincenzo, and I think it's due to previous gossip_store issues.

Fixes: https://github.com/ElementsProject/lightning/issues/7051
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-12 18:05:06 +01:00
Rusty Russell
ca5b7b00b6 gossipd: clean up flags accessors.
We want to be able to clear them, and fetch them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-12 11:43:33 +01:00
Rusty Russell
6a02cfccd7 gossipd: simplify gossip store API.
Instead of "new" and "load", we don't really need to "load" anything,
so do everything in gossip_store_new.

Have it do the compaction/rewrite, and collect the dying records
2024-02-04 09:24:44 +10:30
Rusty Russell
c49fb2edd5 gossipd: clean up gossip_store offsets.
gossmap offsets are to the beginning of the message, whereas
the gossip_store uses the header offset.  Convert the internals
of gossip_store to use gossmap-style uniformly, even where it's
a little less convenient.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
7efa0a46a9 gossipd: clean up gossip_store routines.
We don't use the dying flag, and we can manually append the
addendum rather than having gossip_store_add present a
bizarre interface.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
f7b7cf3719 gossipd: remove routing.c and other unused functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
c286241ab3 gossipd: switch over to using gossmap_manage, not routing.c.
The gossip_store_load is now basically a noop, since gossmap
does that.

gossipd removes a pile of routines dealing with messages,
in favor of just handing them to gossmap_manage.

The stub gossmap_manage constructor is removed entirely.

We simplified behaviour around channel_announcements with
no channel update: we now add them to the store, and go
back to fix the timestamp later.  This changes a test,
which explicitly tests for the old behaviour.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
1384d56db3 gossmap_manage: new file for managing the gossip store.
This is a fair amount of code, but much is taken from
the old routing.c, with the difference that this uses
common/gossmap instead of our own structures.

The interfaces are fairly clear:

1. gossmap_manage_new - allocator
2. gossmap_manage_channel_announcement
   - handle new channel announcement msg
   - if too early, keeps it in early map
   - queues it, asks lightingd about UTXO.
3. gossmap_manage_handle_get_txout_reply
   - handle response from lightningd for above.
4. gossmap_manage_channel_update
   - handle channel_update message
   - may have to wait on pending channel_announcement
5. gossmap_manage_node_announcement
   - handle node_announcement msg
   - may have to wait on pending channel_announcement
6. gossmap_manage_new_block
   - see if early announces can now be processed.
7. gossmap_manage_channel_spent
   - lightningd tells us UTXO is spent
   - may prepare channel for closing in 12 blocks.
8. gossmap_manage_channel_dying
   - gossip_store load tells us channel was spent earlier.
   - like gossmap_manage_channel_spent, but maybe < 12.
9. gossmap_manage_get_gossmap
   - gossmap accessor: seeker and queries will need this.
10. gossmap_manage_new_peer
   - a new peer has connected, give them all our gossip.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
07cd4a809b gossipd: remove spam handling.
We weakened this progressively over time, and gossip v1.5 makes spam
impossible by protocol, so we can wait until then.

Removing this code simplifies things a great deal!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Protocol: we no longer ratelimit gossip messages by channel, making our code far simpler.
2024-02-04 09:24:44 +10:30
Rusty Russell
e7ceffd565 gossipd: remove zombie handling.
We never enabled it, because we seemed to be eliminating valid
channels.  We discard zombie-marked records on loading.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
8cfbc1e7ad gossipd: make gossip_store hold daemon ptr, not rstate.
Makes it easier to wean off routing.c.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
7f5fe52320 gossipd: remove online gossip_store compaction.
It was an obscure dev command, as it never worked reliably.
It would be much easier to re-implement once this is done.

This turned out to reveal a tiny leak on
tests/test_gossip.py::test_gossip_store_load_amount_truncated where we
didn't immedately free chan_ann if it was dangling.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
561859da0c gossipd: move tell_lightningd_peer_update from routing.c into gossipd.c
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
870c996628 gossipd: new routines to support gossmap compatibility.
gossip_store_del - takes a gossmap-style offset-of-msg not offset-of-hdr.
gossip_store_flag: set an arbitrary flag on a gossip_store hdr.
gossip_store_get_timestamp/gossip_store_set_timestamp: access gossip_store hdr.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-02-04 09:24:44 +10:30
Rusty Russell
2d15745f9e gossipd: don't put private channel info into store at all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-01-31 14:47:33 +10:30
Rusty Russell
8db7adc76f gossipd: no longer take private channel updates from lightningd
Lightningd now handles private channels, so we're dismantling the
gossipd infrastructure.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-01-31 14:47:33 +10:30
Rusty Russell
f2f43eeffa gossipd: strip private updates from gossip_store on startup.
We rename them to _obs, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-01-31 14:47:33 +10:30
Rusty Russell
1fc603ea6e gossipd: remove #if DEVELOPER in favor of runtime flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-09-21 20:08:24 +09:30
Rusty Russell
846cec4f2a gossipd: ignore redundant node_announcement in gossip_store.
Don't know how this is happening, but it is not harmful to ignore it for now.

Fixes: #6531
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-08-11 12:38:07 +09:30
Alex Myers
4a4da00d28 gossipd: handle upgrade from version 11 gossip_store 2023-07-27 06:41:44 +09:30
Rusty Russell
af7e641445 gossipd: don't "unmark" dying channels' updates if we receive them.
This looked like a test flake, but was real:

```
        l1.daemon.wait_for_log("closing soon due to the funding outpoint being spent")
    
        # We won't gossip the dead channel any more (but we still propagate node_announcement).  But connectd is not explicitly synced, so wait for "a bit".
        time.sleep(1)
>       assert len(get_gossip(l1)) == 2
E       assert 4 == 2
```

We can see that two channel_updates come in *after* we mark it dying:

```
gossipd: channel 103x1x0 closing soon due to the funding outpoint being spent
gossipd: REPLY WIRE_GOSSIPD_NEW_BLOCKHEIGHT_REPLY with 0 fds
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Received channel_update for channel 103x1x0/0 now DISABLED
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Received channel_update for channel 103x1x0/1 now DISABLED
```

We should keep marking channel_updates the same way.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-07-25 10:11:05 +09:30
Rusty Russell
d16797ce58 gossipd: add dying marker to channel_announcement/channel_update.
We don't actually delete them for 12 blocks, but we can't avoid
propagating them.  We don't mark node_announcements, which is a bit
weird, but avoids us tracking logic to un-dying them if a channel is
opened.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-07-20 11:47:32 +09:30
Rusty Russell
01670d5e5e gossipd: use htable, not linked list for peers.
This speeds up nodeid lookups, which is useful for the next simplification.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-07-09 16:49:48 +09:30
Alex Myers
40a50c1ada gossipd: don't fail on gossip deletion
Reported in #6270, there was an attempt to delete gossip overrunning
the end of the gossip_store. This logs the gossip type that was attempted to be deleted and avoids an immediate crash (tombstones would be fine to
skip over at least.)

Changelog-None
2023-06-02 12:02:39 +09:30
Alex Myers
54bd024910 gossip_store: remove now-redundant push bit
The push bit was convenient for connectd to send our own gossip
to peers upon connecting by naively traversing the gossip_store
and sending anything flagged `push`.  This function is now
performed by gossipd leaving no use for the push bit.

Changelog-Changed: `gossipd`: gossip_store PUSH bit is no longer set.
2023-04-13 08:48:50 -07:00
Rusty Russell
aaa14846c6 gossipd: ignore zombie flag when loading gossip_store.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-03-06 16:15:22 -06:00
Alex Myers
d1402e06f9 gossipd: load and store node_announcements correctly
Loading the gossip_store would not create a pending node announcement
when the node already had a zombie channel.  This would cause the node
announcement to attempt to be loaded, but fail because it had no
broadcastable channels.  Accepting a pending node announcement as when
normally loading from the channel corrects this.
`node_has_public_channels` taking into account zombie channels enables
this behavior.

Separately, node_announcements were still being flagged as zombies
in the gossip store despite that feature being removed.

Changelog-None
2023-03-01 15:36:13 -06:00
Alex Myers
d5246e43bb gossipd: flag zombie channels when loading from gossip_store
Without inheriting zombie status, gossipd would allow regular channel updates
into the store until the pruning cycle hits (and the channel is properly
flagged) which is 3.5 days. Applying zombie status when reading channel
updates from the store prevents this.

Changelog-None
2023-03-01 15:36:13 -06:00