This means that we won't complain to peers which gossip about our
channels, but it does mean that our channel graph (like other nodes on
the network) will show two channels, not one, for the duration.
For this reason, we need askrene to omit local dying channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd no longer makes gossip messages, and hasn't since v24.02, so it
doesn't actually need to talk to the hsm daemon.
Also, various comments were out of date, so fix those too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had the test backwards, so we moved it *all the time*. This bloats our gossip store, as well as
not moving it in the case where we need to.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: we would occasionally not show a node announcement in listnodes().
This is immune to things like clock changes, and has the convenient side-effect that
it will *not* be overridden when we override time for developer purposes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Only in developer mode, ofc.
Notes:
1. We have to move the initialization before the lightningd main trace_start,
since that uses pseudorand().
2. To make the results stable, we need to use per-caller values to randbytes().
Otherwise external timing changes the call order.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
You can now simply add per-tal-object helpers for memleak, but our older pattern required
calling memleak functions explicitly during memleak handling. Hash tables in particular need
to be dynamically allocated (we override the allocators using htable_set_allocator and assume
this), so it makes sense to have a helper macro that does all three.
This eliminates a huge amount of code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically, `devtools/reduce-includes.sh */*.c`.
Build time from make clean (RUST=0) (includes building external libs):
Before:
real 0m38.944000-40.416000(40.1131+/-0.4)s
user 3m6.790000-17.159000(15.0571+/-2.8)s
sys 0m35.304000-37.336000(36.8942+/-0.57)s
After:
real 0m37.872000-39.974000(39.5466+/-0.59)s
user 3m1.211000-14.968000(12.4556+/-3.9)s
sys 0m35.008000-36.830000(36.4143+/-0.5)s
Build time after touch config.vars (RUST=0):
Before:
real 0m19.831000-21.862000(21.5528+/-0.58)s
user 2m15.361000-30.731000(28.4798+/-4.4)s
sys 0m21.056000-22.339000(22.0346+/-0.35)s
After:
real 0m18.384000-21.307000(20.8605+/-0.92)s
user 2m5.585000-26.843000(23.6017+/-6.7)s
sys 0m19.650000-22.003000(21.4943+/-0.69)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each header should only include the other headers it needs to compile;
`devtools/reduce-includes.sh */*.h` does this. The C files then need
additional includes if they don't compile.
And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we don't have to manually choose what to link against,
which is much of the complexity of our Makefiles: the compiler will
automatically use any object files it needs to link.
We already do this for ccan as libccan.a, now we have libcommon.a.
We don't link against it for *everything*, as some tests require their own
versions.
Notes:
1. I get rid of the weird plugins/test/Makefile2 (accidental commit?)
2. Many tests change due to update-mocks.
3. In some places I added the missing dependency on the Makefile itself, though most are in the next
patch.
Before:
Total program size: 221366528
Total tests size: 364243856
After:
Total program size: 190733656
Total tests size: 337880888
Build time from make clean (RUST=0) (includes building external libs):
Before:
real 0m38.227000-44.245000(41.8222+/-1.6)s
user 3m2.105000-33.696000(23.1442+/-8.4)s
sys 0m35.054000-42.269000(39.7231+/-2)s
After:
real 0m38.944000-40.416000(40.1131+/-0.4)s
user 3m6.790000-17.159000(15.0571+/-2.8)s
sys 0m35.304000-37.336000(36.8942+/-0.57)s
Build time after touch config.vars (RUST=0):
Before:
real 0m18.928000-22.776000(21.5084+/-1.1)s
user 2m8.613000-36.567000(27.7281+/-7.7)s
sys 0m20.458000-23.436000(22.3963+/-0.77)s
After:
real 0m19.831000-21.862000(21.5528+/-0.58)s
user 2m15.361000-30.731000(28.4798+/-4.4)s
sys 0m21.056000-22.339000(22.0346+/-0.35)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
rusty@rusty-Framework:~/devel/cvs/lightni
This is a last resort, but what else are we supposed to do when we wrote
something and it didn't appear?
In particular, ZFS doesn't just "fix itself":
```
remaining_fd=200001b0c9761dff0000000001009411e26cd56d68aabc285ee1c8ee43d59be6f939b0ce353d80213918680a7438356b9c5ea6bb001a6
bb37a4dea93776f4abc8cd371525b4d1605a74b89d7cb1bfc8865ddf22288c7ea08b9d98b34155b4aed159eb81732957e6bf79b996752bf2a9995aae
ad1d65e7889e826ea0ba42f7746c176fe12f2fe6c04af1a74b4f0a262d20efd57133eb32693c789eb3f09caf4f4c6ecd2f734b3b36e751ffcc2748c5
8feabce4173c4ce6098a2c5397aabf1be5442cb67b5030be11ebd8b9841838dae127fe30000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000002000000a218b9d93000000001005000000000000c060
```
Note the record appended on the end *after all the zeroes*.
Changelog-Changed: gossipd: add gossip_store recovery for filesystems which do not synchronize read and write (e.g. ZFS on Linux), by disabling mmap reads and rewriting the last records.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This should detect partial writes more robustly, since we make a
separate pwrite() call to update this flag after the record is written.
Previously we were playing a bit loose with synchronization assumptions,
which seemed to work on Linux ext4, but not so well elsewhere.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We add `start_batch` to match t-bast’s splicing spec and we add a new internal wire type `WIRE_PROTOCOL_BATCH_ELEMENT` using the type number 0
Changelog-Added: support for `start_batch`
Checking a signature is a CPU-intensive operation that should be performed only
if gossmap doesn't already have the channel announcement in question and we're
not already checking for the announcement's UTxO.
Changelog-Fixed: `gossipd` doesn't waste CPU cycles checking signatures on channel announcements that are already known
Issue: https://github.com/ElementsProject/lightning/issues/7972
Unfortunately a spec typo means the data fields are missing (PR pending),
so we still patch those in.
The message "your_peer_storage" got renamed to "peer_storage_retrieval",
and the option "want_peer_backup_storage" was removed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `experimental-peer-storage` now only advertizes feature 43, not 41.
The vast majority of incoming channel updates seem to be cut due
to age, which results in noisy logs. Similarly, the chanbackup
logging verbosity might better match the equivalent actions in
channeld, which are at the debug level.
Fixes: #8058
Changelog-None: introduced in 25.02
After analyzing various weird cases where we ended up with duplicate
gossip_store entries, it could be explained by us not fully processing
the gossip store.
It's not clear that my assumptions that we would always see our own writes
are true: technically this may require an fsync(). So we now add the
check, and do an fsync and try again.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.
We had at least one report of overwriting the gossip_store file at
offset 1. Make sure this doesn't happen.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's actually the only one that uses it. We also tweak the way
gossip_store handles failure: gossmap_manage now tells it when to
reset the corrupted store.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Default goes to stderr for LOG_UNUSUAL and higher.
We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.
In fact, it fixes a race if we don't have HAVE_PWRITEV.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
test_closing_different_fees fails:
```
2024-10-14T08:43:30.2733614Z
2024-10-14T08:43:30.2734133Z # Now wait for them all to hit normal state, do payments
2024-10-14T08:43:30.2735205Z > l1.daemon.wait_for_logs(['update for channel .* now ACTIVE'] * num_peers
2024-10-14T08:43:30.2736233Z + ['to CHANNELD_NORMAL'] * num_peers)
2024-10-14T08:43:30.2736725Z
2024-10-14T08:43:30.2736903Z tests/test_closing.py:230:
...
2024-10-14T08:43:30.2761325Z E TimeoutError: Unable to find "[re.compile('update for channel .* now ACTIVE')]" in logs.
```
For some reason one of the channel_update injections does *not* evoke this message
from gossipd...
Changelog-None: debug!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This caused a "flake" in testing, because it's wrong:
```
_____________________________ test_gossip_pruning ______________________________
[gw2] linux -- Python 3.10.16 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.10/bin/python
node_factory = <pyln.testing.utils.NodeFactory object at 0x7f0267530490>
bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f0267532b30>
def test_gossip_pruning(node_factory, bitcoind):
""" Create channel and see it being updated in time before pruning
"""
l1, l2, l3 = node_factory.get_nodes(3, opts={'dev-fast-gossip-prune': None,
'allow_bad_gossip': True,
'autoconnect-seeker-peers': 0})
l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
l2.rpc.connect(l3.info['id'], 'localhost', l3.port)
scid1, _ = l1.fundchannel(l2, 10**6)
scid2, _ = l2.fundchannel(l3, 10**6)
mine_funding_to_announce(bitcoind, [l1, l2, l3])
wait_for(lambda: l1.rpc.listchannels(source=l1.info['id'])['channels'] != [])
l1_initial_cupdate_timestamp = only_one(l1.rpc.listchannels(source=l1.info['id'])['channels'])['last_update']
# Get timestamps of initial updates, so we can ensure they change.
# Channels should be activated locally
> wait_for(lambda: [c['active'] for c in l1.rpc.listchannels()['channels']] == [True] * 4)
```
Here you can see it has pruned:
```
lightningd-1 2025-01-24T07:39:40.873Z DEBUG gossipd: Pruning channel 105x1x0 from network view (ages 1737704380 and 0)
...
lightningd-1 2025-01-24T07:39:50.941Z UNUSUAL lightningd: Bad gossip order: could not find channel 105x1x0 for peer's channel update
```
Changelog-Fixed: Protocol: we were overzealous in pruning channels if we hadn't seen one side's gossip update yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The updated API requires typed htables to explicitly state whether they
allow duplicates: for most cases we don't, but we've had issues in the
past.
This is a big patch, but mainly mechanical.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Testing autoconnect in --offline mode, the autoconnect never functions
if the seeker has not gotten out of the startup state. This commit
overloads the counter to start the autoconnect
attempt on the second round through the seeker_check.
Changelog-None
In fact, only 951 of 17419 (5%) of node announcements are missing an address
(and gossipd doesn't know if we can connect to Tor addresses anyway) so
just check it *has* a node_announcement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Let lightningd feed us hints to try first, but we can extract the
addresses from node_announcement messages ourselves.
(Lightningd used to ask gossipd on our behalf: this is far simpler!)
One side effect of this is that we don't hand back address hints given to us
by lightningd: it would use these again for reconnecting. This is breaks
test_sendpay_grouping, so we disable it temporarily.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't want to to refresh the gossmap internally: this could invalidate the
gossmap held by the current callers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This does not validate a node announcement and address, but it
does select a node at random from the gossmap and asks lightningd
to attempt a connection to it.
Gossipd uses this to ask lightningd -> connectd to initiate
a connection to a new gossip peer. This can be used when
there are insufficient peers already connected to gossip with.
Changelog-Changed: Gossipd can now request connections to additional nodes for improved gossip sync