Commit Graph

1212 Commits

Author SHA1 Message Date
Rusty Russell
e383e14cb3 connectd: don't be That Node when someone is gossipping crap.
As seen in my logs, we complain about nodes a *lot* (Hi old CLN!).

```
===>1589311 DEBUG   02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-connectd: peer_out WIRE_WARNING
     139993 DEBUG   lightningd: fixup_scan: block 786151 with 1203 txs
      55388 DEBUG   plugin-bcli: Log pruned 1001 entries (mem 10508118 -> 10298662)
      33000 DEBUG   gossipd: Unreasonable timestamp in 0102000a38ec41f9137a5a560dac6effbde059c12cb727344821cbdd4ef46964a4791a0f67cd997499a6062fc8b4284bf1b47a91541fd0e65129505f02e4d08542b16fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000d9d56000ba40001690fe262010100900000000000000001000003e8000001f30000000000989680
      23515 DEBUG   hsmd: Client: Received message 14 from client
      22269 DEBUG   024b9a1fa8e006f1e3937f65f66c408e6da8e1ca728ea43222a7381df1cc449605-hsmd: Got WIRE_HSMD_ECDH_REQ
      14409 DEBUG   gossipd: Enqueueing update for announce 0102002f7e4b4deb19947c67292e70cb22f7fac837fa9ee6269393f3c513d0431d52672e7387625856c19299cfd584e1a3f39e0f98df13c99090df9f4d5cca8446776fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000e216b0008050001692e1c390101009000000000000003e800000000000013880000004526945a00
      12534 DEBUG   gossipd: Previously-rejected announce for 514127x248x1
      10761 DEBUG   02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: Got it!
      10761 DEBUG   02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: ... , awaiting 1120
      10761 DEBUG   02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: Sending master 1020
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-01-27 09:31:02 +10:30
Rusty Russell
241324aa09 gossipd: don't shortcut dying phase for local channels.
This means that we won't complain to peers which gossip about our
channels, but it does mean that our channel graph (like other nodes on
the network) will show two channels, not one, for the duration.

For this reason, we need askrene to omit local dying channels.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2026-01-08 22:33:19 +10:30
Rusty Russell
d6f6d46c3b gossipd: move timestamp_reasonable into gossmap_manage.c.
It's only used in there anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-12-19 12:37:36 +01:00
Rusty Russell
4c8d656f77 gossipd: don't need hsm fd any more.
gossipd no longer makes gossip messages, and hasn't since v24.02, so it
doesn't actually need to talk to the hsm daemon.

Also, various comments were out of date, so fix those too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-12-19 12:37:36 +01:00
Rusty Russell
f0e9547661 gossipd: make sure we correctly move node announcement when *no* channel preceeds it in the gossip store.
We had the test backwards, so we moved it *all the time*.  This bloats our gossip store, as well as
not moving it in the case where we need to.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: we would occasionally not show a node announcement in listnodes().
2025-12-17 11:56:42 +10:30
Rusty Russell
8b9020d7b9 global: use clock_time in place of time_now().
Except for tracing, that sticks with time_now().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
522457a12b connectd, gossipd, pay, bcli: use timemono when solely measuring duration for timeouts.
This is immune to things like clock changes, and has the convenient side-effect that
it will *not* be overridden when we override time for developer purposes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
806dc89cad gossipd: remove --dev-gossip-time setting, we'll use CLN_DEV_SET_TIME.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
2086699b70 common: add randbytes() wrapper to override cryptographic entropy: $CLN_DEV_ENTROPY_SEED
Only in developer mode, ofc.

Notes:
1. We have to move the initialization before the lightningd main trace_start,
   since that uses pseudorand().
2. To make the results stable, we need to use per-caller values to randbytes().
   Otherwise external timing changes the call order.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-13 21:21:29 +10:30
Rusty Russell
75616f6b77 common: add new_htable() macro to allocate, initialize and setup memleak coverage for any typed hash table.
You can now simply add per-tal-object helpers for memleak, but our older pattern required
calling memleak functions explicitly during memleak handling.  Hash tables in particular need
to be dynamically allocated (we override the allocators using htable_set_allocator and assume
this), so it makes sense to have a helper macro that does all three.

This eliminates a huge amount of code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-24 11:30:17 +10:30
Rusty Russell
6e5cb299dd global: remove unnecessary includes from C files.
Basically, `devtools/reduce-includes.sh */*.c`.

Build time from make clean (RUST=0) (includes building external libs):

Before:
	real    0m38.944000-40.416000(40.1131+/-0.4)s
	user    3m6.790000-17.159000(15.0571+/-2.8)s
	sys     0m35.304000-37.336000(36.8942+/-0.57)s
After:
	real    0m37.872000-39.974000(39.5466+/-0.59)s
	user    3m1.211000-14.968000(12.4556+/-3.9)s
	sys     0m35.008000-36.830000(36.4143+/-0.5)s

Build time after touch config.vars (RUST=0):

Before:
	real    0m19.831000-21.862000(21.5528+/-0.58)s
	user    2m15.361000-30.731000(28.4798+/-4.4)s
	sys     0m21.056000-22.339000(22.0346+/-0.35)s

After:
	real    0m18.384000-21.307000(20.8605+/-0.92)s
	user    2m5.585000-26.843000(23.6017+/-6.7)s
	sys     0m19.650000-22.003000(21.4943+/-0.69)s

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-23 06:44:04 +10:30
Rusty Russell
f6a4e79420 global: remove unnecessary includes from headers.
Each header should only include the other headers it needs to compile;
`devtools/reduce-includes.sh */*.h` does this.  The C files then need
additional includes if they don't compile.

And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-23 06:44:04 +10:30
Rusty Russell
e120f87083 Makefile: create a library containing common, wire and bitcoin objects.
This means we don't have to manually choose what to link against,
which is much of the complexity of our Makefiles: the compiler will
automatically use any object files it needs to link.

We already do this for ccan as libccan.a, now we have libcommon.a.

We don't link against it for *everything*, as some tests require their own
versions.

Notes:
1. I get rid of the weird plugins/test/Makefile2 (accidental commit?)
2. Many tests change due to update-mocks.
3. In some places I added the missing dependency on the Makefile itself, though most are in the next
   patch.

Before:
	Total program size:     221366528
	Total tests size:       364243856

After:
	Total program size:     190733656
	Total tests size:       337880888

Build time from make clean (RUST=0) (includes building external libs):

Before:
	real    0m38.227000-44.245000(41.8222+/-1.6)s
	user    3m2.105000-33.696000(23.1442+/-8.4)s
	sys     0m35.054000-42.269000(39.7231+/-2)s
After:
	real    0m38.944000-40.416000(40.1131+/-0.4)s
	user    3m6.790000-17.159000(15.0571+/-2.8)s
	sys     0m35.304000-37.336000(36.8942+/-0.57)s

Build time after touch config.vars (RUST=0):

Before:
	real    0m18.928000-22.776000(21.5084+/-1.1)s
	user    2m8.613000-36.567000(27.7281+/-7.7)s
	sys     0m20.458000-23.436000(22.3963+/-0.77)s

After:
	real    0m19.831000-21.862000(21.5528+/-0.58)s
	user    2m15.361000-30.731000(28.4798+/-4.4)s
	sys     0m21.056000-22.339000(22.0346+/-0.35)s

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rusty@rusty-Framework:~/devel/cvs/lightni
2025-10-23 06:44:04 +10:30
Rusty Russell
75164d2c81 gossipd: save gossip store writes, try them again (and fsync) if we get a read issue.
This is a last resort, but what else are we supposed to do when we wrote
something and it didn't appear?

In particular, ZFS doesn't just "fix itself":

```
remaining_fd=200001b0c9761dff0000000001009411e26cd56d68aabc285ee1c8ee43d59be6f939b0ce353d80213918680a7438356b9c5ea6bb001a6
bb37a4dea93776f4abc8cd371525b4d1605a74b89d7cb1bfc8865ddf22288c7ea08b9d98b34155b4aed159eb81732957e6bf79b996752bf2a9995aae
ad1d65e7889e826ea0ba42f7746c176fe12f2fe6c04af1a74b4f0a262d20efd57133eb32693c789eb3f09caf4f4c6ecd2f734b3b36e751ffcc2748c5
8feabce4173c4ce6098a2c5397aabf1be5442cb67b5030be11ebd8b9841838dae127fe30000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000002000000a218b9d93000000001005000000000000c060
```

Note the record appended on the end *after all the zeroes*.

Changelog-Changed: gossipd: add gossip_store recovery for filesystems which do not synchronize read and write (e.g. ZFS on Linux), by disabling mmap reads and rewriting the last records.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Rusty Russell
5b3f3270f4 gossmap: use gossmap_disable_mmap() on corruption.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Rusty Russell
9fb8870f92 gossip: add COMPLETED bit to mark records which are complete.
This should detect partial writes more robustly, since we make a
separate pwrite() call to update this flag after the record is written.

Previously we were playing a bit loose with synchronization assumptions,
which seemed to work on Linux ext4, but not so well elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Rusty Russell
f882b69dfe gossipd: remove gossmap_fetch_tail.
It only gets called for diagnostics when something goes wrong (and we
were going to exit anyway), and it's only useful with mmap (which we now disable
on error) but it shouldn't crash:

```
**BROKEN** gossipd: Truncated gossmap record @7991501/7991523 (len 0): waiting
**BROKEN** gossipd: FATAL SIGNAL 6 (version v25.09)                                            
**BROKEN** gossipd: backtrace: common/daemon.c:41 (send_backtrace) 0x6506817cc529
**BROKEN** gossipd: backtrace: common/daemon.c:78 (crashdump) 0x6506817cc578
**BROKEN** gossipd: backtrace: ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 ((null)) 0x75e8267a032f
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:44 (__pthread_kill_implementation) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:78 (__pthread_kill_internal) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ./nptl/pthread_kill.c:89 (__GI___pthread_kill) 0x75e8267f9b2c
**BROKEN** gossipd: backtrace: ../sysdeps/posix/raise.c:26 (__GI_raise) 0x75e8267a027d
**BROKEN** gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x75e8267838fe
**BROKEN** gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x75e82678381a
**BROKEN** gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x75e826796516
**BROKEN** gossipd: backtrace: common/gossmap.c:111 (map_copy) 0x6506817cea77
**BROKEN** gossipd: backtrace: common/gossmap.c:1870 (gossmap_fetch_tail) 0x6506817d1f93
**BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:1442 (gossmap_manage_get_gossmap) 0x6506817c45fb
**BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:753 (gossmap_manage_handle_get_txout_reply) 0x6506817c5850
**BROKEN** gossipd: backtrace: gossipd/gossipd.c:574 (recv_req) 0x6506817c172b
```

Reported-by: @grubles
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-10-01 13:29:33 +09:30
Dusty Daemon
07f4bc39b1 splice: Add start_batch and an internal wire type
We add `start_batch` to match t-bast’s splicing spec and we add a new internal wire type `WIRE_PROTOCOL_BATCH_ELEMENT` using the type number 0

Changelog-Added: support for `start_batch`
2025-08-14 16:40:04 +09:30
Matt Whitlock
3f6cd59dc9 gossipd: check for existing channel announcement before sigcheck
Checking a signature is a CPU-intensive operation that should be performed only
if gossmap doesn't already have the channel announcement in question and we're
not already checking for the announcement's UTxO.

Changelog-Fixed: `gossipd` doesn't waste CPU cycles checking signatures on channel announcements that are already known
Issue: https://github.com/ElementsProject/lightning/issues/7972
2025-06-10 16:40:33 -05:00
Rusty Russell
6b77c95b95 gossipd: don't spam the log on duplicate channel_update.
This message was too verbose (even for trace!)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-05-15 16:40:33 +09:30
Rusty Russell
9a967d6770 gossipd: don't try to connect to ourselves if we need more peers.
Reported-by: JssDWt
Closes: https://github.com/ElementsProject/lightning/issues/8200
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-None: trivial
2025-05-08 15:30:14 -05:00
Rusty Russell
fff054f8ee gossipd: fix false memleak positive in gossmap_manage
If there are pending channel announcements, they'll look like a leak unless we scan into
the maps.

```
lightningd-2 2025-05-01T07:27:03.922Z **BROKEN** gossipd: MEMLEAK: 0x60d000000478
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:   label=gossipd/gossmap_manage.c:595:struct pending_cannounce
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:   alloc:
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:488 (tal_alloc_)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:595 (gossmap_manage_channel_announcement)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/gossipd/gossipd.c:205 (handle_recv_gossip)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/gossipd/gossipd.c:300 (connectd_req)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/common/daemon_conn.c:35 (handle_read)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:60 (next_plan)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:422 (do_plan)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:439 (io_ready)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:455 (io_loop)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     /home/runner/work/lightning/lightning/gossipd/gossipd.c:660 (main)
lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd:     ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main)
lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd:     ../csu/libc-start.c:392 (__libc_start_main_impl)
lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd:   parents:
lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd:     gossipd/gossmap_manage.c:475:struct gossmap_manage
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-05-08 14:01:38 +09:30
Rusty Russell
733efcf7dd BOLTs: import spec additions for option_simple_close.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-03-18 14:30:58 +10:30
Rusty Russell
67c91a7e5c BOLTs: Update to version with peer storage merged.
Unfortunately a spec typo means the data fields are missing (PR pending),
so we still patch those in.

The message "your_peer_storage" got renamed to "peer_storage_retrieval",
and the option "want_peer_backup_storage" was removed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `experimental-peer-storage` now only advertizes feature 43, not 41.
2025-03-18 14:30:58 +10:30
Alex Myers
1e5a577fb2 gossipd: fix typo
Changelog-None
2025-03-03 12:25:26 -06:00
Alex Myers
4f7df828b4 gossipd, chanbackup: reduce logging levels
The vast majority of incoming channel updates seem to be cut due
to age, which results in noisy logs.  Similarly, the chanbackup
logging verbosity might better match the equivalent actions in
channeld, which are at the debug level.

Fixes: #8058

Changelog-None: introduced in 25.02
2025-02-26 14:15:13 +10:30
Rusty Russell
d554206d7e gossipd: fix bogus message when dying channel is pruned.
```
2025-01-23T12:31:52.528Z DEBUG   gossipd: Pruning channel 839050x1246x0 from network view (ages 1736283379 and 1737600120)
2025-01-27T00:32:01.631Z DEBUG   gossipd: Pruning channel 839050x1246x0 from network view (ages 0 and 1737686520)
2025-01-27T00:50:05.998Z **BROKEN** gossipd: Dying channel 839050x1246x0 already deleted?
```

Easiest not to prune in this case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
2b4b1479ed gossipd: check that gossmap code sees updates from gossip_store writes.
After analyzing various weird cases where we ended up with duplicate
gossip_store entries, it could be explained by us not fully processing
the gossip store.

It's not clear that my assumptions that we would always see our own writes
are true: technically this may require an fsync().  So we now add the
check, and do an fsync and try again.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.
2025-02-11 15:11:47 -06:00
Rusty Russell
8156c83e11 gossipd: check that we are always appending.
We had at least one report of overwriting the gossip_store file at
offset 1.  Make sure this doesn't happen.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
769ccaa4c3 gossipd: correctly process dying channels.
Found by inspection.  Minor bug, since we'll catch it on the next block,
but annoying.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
1df1300cc9 gossip_store: don't need to check for truncated amounts.
That's actually caught by the gossmap load now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
9d98740e18 gossmap: stricter checks when gossipd itself loads the gossip_store.
This means we will correctly reset the store if it has redundant
records, for example.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
4f2a7039c6 gossipd: put gossip_store pointer inside gossmap_manage.
It's actually the only one that uses it.  We also tweak the way
gossip_store handles failure: gossmap_manage now tells it when to
reset the corrupted store.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
fdfc7ce62f gossmap: add (and use) logging hook.
Default goes to stderr for LOG_UNUSUAL and higher.

We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
607b14fe12 common/gossmap: remove open-by-fd.
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.

In fact, it fixes a race if we don't have HAVE_PWRITEV.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
01650ebcd7 gossipd: make sure we never write bad entries.
We have reports of crashes on reading gossip_store, including from gossipd itself!

```
lightning_gossipd: common/gossmap.c:121: map_copy: Assertion `offset + len <= map->map_size' failed.
...
lightning_gossipd: FATAL SIGNAL (version v24.11)
0x6260c41d682a send_backtrace
  common/daemon.c:33
0x6260c41e098b status_failed
  common/status.c:221
0x6260c41e0b41 status_backtrace_exit
  common/subdaemon.c:18
0x6260c41d68b8 crashdump
  common/daemon.c:78
0x70508ea6913f ???
  ???:0
0x70508e8a0d51 ???
  ???:0
0x70508e88a536 ???
  ???:0
0x70508e88a40e ???
  ???:0
0x70508e8996d1 ???
  ???:0
0x6260c41d8b69 map_copy
  common/gossmap.c:121
0x6260c41d8bab map_be16
  common/gossmap.c:142
0x6260c41daa45 map_catchup
  common/gossmap.c:705
0x6260c41dab95 gossmap_refresh_mayfail
  common/gossmap.c:1192
0x6260c41daca6 gossmap_refresh
  common/gossmap.c:1213
0x6260c41cee32 gossmap_manage_get_gossmap
  gossipd/gossmap_manage.c:1314
0x6260c41d0686 gossmap_manage_new_block
  gossipd/gossmap_manage.c:1221
0x6260c41cbfdd new_blockheight
  gossipd/gossipd.c:473
0x6260c41cc363 recv_req
  gossipd/gossipd.c:584
0x6260c41d6b1d handle_read
  common/daemon_conn.c:35
0x6260c43175b5 next_plan
  ccan/ccan/io/io.c:60
0x6260c4317a40 do_plan
  ccan/ccan/io/io.c:422
0x6260c4317af9 io_ready
  ccan/ccan/io/io.c:439
0x6260c4319446 io_loop
  ccan/ccan/io/poll.c:455
0x6260c41cccf4 main
  gossipd/gossipd.c:665
```

This implies that we have a message shorter than 2 bytes, which should never happen.

An audit didn't shed any light, but let's make sure we don't ever write such a thing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
fc9f062124 gossipd: extra debugging when inject fails.
test_closing_different_fees fails:

```
2024-10-14T08:43:30.2733614Z
2024-10-14T08:43:30.2734133Z         # Now wait for them all to hit normal state, do payments
2024-10-14T08:43:30.2735205Z >       l1.daemon.wait_for_logs(['update for channel .* now ACTIVE'] * num_peers
2024-10-14T08:43:30.2736233Z                                 + ['to CHANNELD_NORMAL'] * num_peers)
2024-10-14T08:43:30.2736725Z
2024-10-14T08:43:30.2736903Z tests/test_closing.py:230:
...
2024-10-14T08:43:30.2761325Z E                   TimeoutError: Unable to find "[re.compile('update for channel .* now ACTIVE')]" in logs.
```

For some reason one of the channel_update injections does *not* evoke this message
from gossipd...

Changelog-None: debug!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-04 13:18:00 -06:00
Rusty Russell
cc2be867dc gossipd: don't prune if we haven't seen on side's update at all.
This caused a "flake" in testing, because it's wrong:

```
_____________________________ test_gossip_pruning ______________________________
[gw2] linux -- Python 3.10.16 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.10/bin/python

node_factory = <pyln.testing.utils.NodeFactory object at 0x7f0267530490>
bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f0267532b30>

    def test_gossip_pruning(node_factory, bitcoind):
        """ Create channel and see it being updated in time before pruning
        """
        l1, l2, l3 = node_factory.get_nodes(3, opts={'dev-fast-gossip-prune': None,
                                                     'allow_bad_gossip': True,
                                                     'autoconnect-seeker-peers': 0})
    
        l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
        l2.rpc.connect(l3.info['id'], 'localhost', l3.port)
    
        scid1, _ = l1.fundchannel(l2, 10**6)
        scid2, _ = l2.fundchannel(l3, 10**6)
    
        mine_funding_to_announce(bitcoind, [l1, l2, l3])
        wait_for(lambda: l1.rpc.listchannels(source=l1.info['id'])['channels'] != [])
        l1_initial_cupdate_timestamp = only_one(l1.rpc.listchannels(source=l1.info['id'])['channels'])['last_update']
    
        # Get timestamps of initial updates, so we can ensure they change.
        # Channels should be activated locally
>       wait_for(lambda: [c['active'] for c in l1.rpc.listchannels()['channels']] == [True] * 4)
```

Here you can see it has pruned:

```
lightningd-1 2025-01-24T07:39:40.873Z DEBUG   gossipd: Pruning channel 105x1x0 from network view (ages 1737704380 and 0)
...
lightningd-1 2025-01-24T07:39:50.941Z UNUSUAL lightningd: Bad gossip order: could not find channel 105x1x0 for peer's channel update
```

Changelog-Fixed: Protocol: we were overzealous in pruning channels if we hadn't seen one side's gossip update yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-01-27 11:07:04 +10:30
Rusty Russell
b6c1ffa359 ccan/htable: update to explicit DUPS/NODUPS types.
The updated API requires typed htables to explicitly state whether they
allow duplicates: for most cases we don't, but we've had issues in the
past.

This is a big patch, but mainly mechanical.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-01-21 09:18:25 +10:30
Rusty Russell
b520543867 gossipd: log at trace, not debug for regular messages.
See: https://github.com/ElementsProject/lightning/issues/7899

A node with 23 connections gets far too many debug messages.

Changelog-Fixed: `gossipd` now does logging at trace, not debug level.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-12-05 11:43:50 +10:30
Alex Myers
bb30049eca gossipd: seeker: try to autoconnect with no peers
Testing autoconnect in --offline mode, the autoconnect never functions
if the seeker has not gotten out of the startup state.  This commit
overloads the  counter to start the autoconnect
attempt on the second round through the seeker_check.

Changelog-None
2024-11-28 19:02:35 +10:30
Rusty Russell
9499d5c682 gossipd: fix crash in seeker rotation code.
Reported-by: hMsats
Fixes: https://github.com/ElementsProject/lightning/issues/7875
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-None: introduced in rc1
2024-11-28 15:56:05 +10:30
Rusty Russell
c6fce50951 gossipd: don't tell connectd what address to connect to.
In fact, only 951 of 17419 (5%) of node announcements are missing an address
(and gossipd doesn't know if we can connect to Tor addresses anyway) so
just check it *has* a node_announcement.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-25 15:39:13 +10:30
Rusty Russell
23dc10cf81 connectd: get our own addresses to contact node from node_announcements.
Let lightningd feed us hints to try first, but we can extract the
addresses from node_announcement messages ourselves.

(Lightningd used to ask gossipd on our behalf: this is far simpler!)

One side effect of this is that we don't hand back address hints given to us
by lightningd: it would use these again for reconnecting.  This is breaks
test_sendpay_grouping, so we disable it temporarily.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-25 15:39:13 +10:30
Rusty Russell
5b92383b02 connectd: send self-advertizing gossip rather than having gossipd do it.
It's now trivial for us to do this ourselves, since we have gossmap.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-25 15:39:13 +10:30
Rusty Russell
d5c0d21db8 gossipd: hand gossmap to gossmap_manage_get_node_addresses, not gossmap_manage.
We don't want to to refresh the gossmap internally: this could invalidate the
gossmap held by the current callers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 15:21:45 +10:30
Rusty Russell
69c252e06f gossmap: implement gossmap_random_node(), use it in gossipd.
It's easy for gossmap, since it has access to the htable.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 15:21:45 +10:30
Alex Myers
363b721cd3 gossipd: use autoconnect-seeker-peers setting 2024-11-22 15:21:45 +10:30
Alex Myers
dff5c893e7 gossipd: seeker: select random peer and tell lightningd
This does not validate a node announcement and address, but it
does select a node at random from the gossmap and asks lightningd
to attempt a connection to it.
2024-11-22 15:21:45 +10:30
Alex Myers
7fc214a67f gossipd: add request to connect to new gossip peer
Gossipd uses this to ask lightningd -> connectd to initiate
a connection to a new gossip peer.  This can be used when
there are insufficient peers already connected to gossip with.

Changelog-Changed: Gossipd can now request connections to additional nodes for improved gossip sync
2024-11-22 15:21:45 +10:30