Unfortunately the effect of leaving Nagle enabled is subtle. Here it
is in v25.12:
Normal:
tests/test_connection.py::test_no_delay PASSED
====================================================================== 1 passed in 13.87s
Nagle enabled:
tests/test_connection.py::test_no_delay PASSED
====================================================================== 1 passed in 21.70s
So it's hard to both catch this issue and not have false positives. Improve the
test by deliberately running with Nagle enabled, so we can do a direct comparison.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep a history of logs internally, so we can drop them to disk on a
crash. This "black box recorder" was some of the first code I wrote
for CLN, but I can't remember the last time we use a crash log to
diagnose a problem.
We attempt to prune it to keep it under 10MB, but the complexity
and cost is rarely worth it: simplify it to use a ringbuffer.
Changelog-Changed: lightningd: logging is now more efficient internally (no more pruning, simple ringbuffer).
```
139993 DEBUG lightningd: fixup_scan: block 786151 with 1203 txs
===> 55388 DEBUG plugin-bcli: Log pruned 1001 entries (mem 10508118 -> 10298662)
33000 DEBUG gossipd: Unreasonable timestamp in 0102000a38ec41f9137a5a560dac6effbde059c12cb727344821cbdd4ef46964a4791a0f67cd997499a6062fc8b4284bf1b47a91541fd0e65129505f02e4d08542b16fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000d9d56000ba40001690fe262010100900000000000000001000003e8000001f30000000000989680
23515 DEBUG hsmd: Client: Received message 14 from client
22269 DEBUG 024b9a1fa8e006f1e3937f65f66c408e6da8e1ca728ea43222a7381df1cc449605-hsmd: Got WIRE_HSMD_ECDH_REQ
14409 DEBUG gossipd: Enqueueing update for announce 0102002f7e4b4deb19947c67292e70cb22f7fac837fa9ee6269393f3c513d0431d52672e7387625856c19299cfd584e1a3f39e0f98df13c99090df9f4d5cca8446776fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000e216b0008050001692e1c390101009000000000000003e800000000000013880000004526945a00
12534 DEBUG gossipd: Previously-rejected announce for 514127x248x1
10761 DEBUG 02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: Got it!
10761 DEBUG 02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: ... , awaiting 1120
10761 DEBUG 02e01367e1d7818a7e9a0e8a52badd5c32615e07568dbe0497b6a47f9bef89d6af-channeld-chan#70770: Sending master 1020
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: we now re-transmit unseen funding transactions on startup, for more robustness.
Only in developer mode, ofc.
Notes:
1. We have to move the initialization before the lightningd main trace_start,
since that uses pseudorand().
2. To make the results stable, we need to use per-caller values to randbytes().
Otherwise external timing changes the call order.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This commit fixes an issue where BIP86 addresses were not being
discovered during wallet recovery/rescan operations.
The root cause was that init_txfilter() only populated the transaction
filter with BIP32-derived keys, preventing lightningd from recognizing
BIP86 UTXOs during blockchain scans. Now both BIP32 and BIP86 derived
scripts are included in the filter when BIP86 derivation is enabled.
This ensures that wallets restored from BIP39 mnemonics can properly
discover and display previously funded BIP86 addresses without requiring
manual address generation first.
[ We also move the slightly-lost comment about libbacktrace so it is
where we actually include <backtrace.h> --RR ]
Changelog-Changed: hsmd: New nodes will now be created with a BIP-39 12-word phrase as their root secret.
Changelog-Deprecated: config: `encrypted-hsm` to require a passphrase (use `hsm-passphrase`).
Changelog-Added: config: `hsm-passphrase` indicates we should use a manual passphrase with the hsm secret.
You can now simply add per-tal-object helpers for memleak, but our older pattern required
calling memleak functions explicitly during memleak handling. Hash tables in particular need
to be dynamically allocated (we override the allocators using htable_set_allocator and assume
this), so it makes sense to have a helper macro that does all three.
This eliminates a huge amount of code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically, `devtools/reduce-includes.sh */*.c`.
Build time from make clean (RUST=0) (includes building external libs):
Before:
real 0m38.944000-40.416000(40.1131+/-0.4)s
user 3m6.790000-17.159000(15.0571+/-2.8)s
sys 0m35.304000-37.336000(36.8942+/-0.57)s
After:
real 0m37.872000-39.974000(39.5466+/-0.59)s
user 3m1.211000-14.968000(12.4556+/-3.9)s
sys 0m35.008000-36.830000(36.4143+/-0.5)s
Build time after touch config.vars (RUST=0):
Before:
real 0m19.831000-21.862000(21.5528+/-0.58)s
user 2m15.361000-30.731000(28.4798+/-4.4)s
sys 0m21.056000-22.339000(22.0346+/-0.35)s
After:
real 0m18.384000-21.307000(20.8605+/-0.92)s
user 2m5.585000-26.843000(23.6017+/-6.7)s
sys 0m19.650000-22.003000(21.4943+/-0.69)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each header should only include the other headers it needs to compile;
`devtools/reduce-includes.sh */*.h` does this. The C files then need
additional includes if they don't compile.
And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For old channels, this can take a while, and it stops everything. But
we are only doing this to save space; it's not a *functional* necessity.
A quick and dirty test with 50,000 htlcs shows the htlc deletion took
450msec. I tried adding an index, and changing it to set hstate to
HTLC_STATE_INVALID instead of deleting entries, but it still took about 350ms.
Whereas the "COUNT(*)" only took 1.7msec, so it's worth keeping.
Reported-by: @michael1011
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: lightningd: we defer deletion of old htlcs on channel close, to avoid pausing for a long time (we clean them on startup)
Fixes: https://github.com/ElementsProject/lightning/issues/7962
We take over the --bookkeeper-dir and --bookkeeper-db options, and
then if we can find the bookkeeper db we extract the records to
initialize our chain_moves and channel_moves tables.
Of course, bookkeeper now needs to not register those options.
When bookkeeper gets invoked the first time, it will reconstruct
everything from listchannelmoves and listcoinmoves. It cannot
preserve manually-added descriptions, so we put those in the datastore
for it ready to go.
Note that the order of onchain_fee changes slightly from the original.
But this is fine.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Iterating through every peer and channel every time can be very slow
for large nodes, when calling wallet_coinmoves_extract for listcoinmoves.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As per BOLT recommendation https://github.com/lightning/bolts/pull/1232, this means
we will insist on this being available.
For CLN, we added this in 0.12.0 (2022-08-23), though there were fixes as late as 24.02. Either way that's well outside our support window.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Closes: https://github.com/ElementsProject/lightning/issues/8152
Changelog-Changed: Protocol: We now insist that peers support `option_channel_type` (in CLN since 0.12.0 in late 2022, similar for other implementations).
It was already disabled by Dusty due to a number conflict with splicing, and
the proposal probably needs updating to use quiescence now that is merged.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: The non-functional `experimental-upgrade-protocol` config option.
Now we've make it only on existing channels, and not have to call
listdatastore every time, that means we can safely turn it on by
default.
Changelog-Added: Protocol: we now offer peer storage to any peers who create a channel.
Changelog-Deprecated: Config: `--experimental-peer-storage` (it's now the default).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They can all call get_block_height(); the extra argument confused me and
I thought they were called before the block height was actually updated.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
One in 256 times, we will grind a signature to 70 bytes (or shorter). This breaks
our feerate tests. Unfortunately the grinding is deterministic, so there doesn't
seem to be a way to avoid it. So we add a log message, and then we skip the
feerate test if it happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I added this debugging because the next test revealed a mismatch, so
I wanted to see where it was happening.
The comment in lightningd suggests it's possible, but I can't see any
code which suspends in the lightningd io_loop, so I cannot see how
this is triggered.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can happen with 24.11 and later. We scan back to exposed channel
opens, or that release.
The BROKEN log messages cause some tests to fail, so we fix those.
Fixes: https://github.com/ElementsProject/lightning/issues/8169
Changelog-Fixed: wallet: rescan for missing close outputs (can happen if peer doesn't support option_shutdown_anysegwit)
Rather than have lightningd call us repeatedly to try to connect, have
it tell us what peers are transient and aren't, and connectd will
automatically try to maintain that connection.
There's a new "downgrade_peer" message to tell it a peer is now
transient: to make it non-transient we simply tell connectd to
connect as a non-transient.
The first time, I missed that dual_open_control does its own state
transitions :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `connectd` now handles maintaining/reconnecting to important peers, and we remember the last successful address we connected to.
This was a bit harder to identify: during an `io_loop` run we suspend
the current span before handing over to `io_loop`, and later when a callback
is called we resume the span again. Depending on how we return from
the `io_loop` instance that is used to drive the startup, we either
have resumed the last span, or we don't. Since we start a span before
`io_loop` and want it to be emitted afterwards, we need to take care
of the case where we returned from a callback that did not resume, and
therefore the current context is empty.
Making `trace_span_resume` idempotent means we can just resume it
manually.
Ideally we'd push the suspend / resume logic down into `io_loop`
itself, and then we'd have just one place. Maybe suspend and resume
callbacks that can be configured in `io_loop`?
This means we should support it by default.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: `option_quiesce` enabled by default.
Changelog-Deprecated: Config: --experimental-quiesce: it's now the default.
Commit a1fdeee76b "Makefile: clean up install path handling."
broke the ability to configure with one path and then run in a
different path. Turns out people actually do this! So, we have
to use relative paths, compared to our existing binary.
And we can't use path_rel, because that requires that the path
exist (thanks @Lagrang3!).
Fixes: https://github.com/ElementsProject/lightning/issues/7595
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Mainly, it just causes complaints if we're running a lightningd which is installed
(in which case, my_path is not stolen, unlike running locally installed).
Reported-by: daywalker90
Fixes: https://github.com/ElementsProject/lightning/issues/7569
Changelog-None: regression this release
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had some weird code to try to do relative paths. Instead, use absolute
ones and keep life simple. Also rename "daemon_dir" to the clearer
"subdaemon_dir" as that's what it's used for.
We now need special magic to do "installcheck", which is a bit awkward,
but at least it's contained.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: onion messages are now supported by default.
Changelog-Deprecated: Config: the --experimental-onion-messages option is ignored (on by default).
We currently stream gossip as fast as we can, even if they start at
timestamp 0. Instead, use a simple token bucket filter and only let
them have 1MB per second (500 bytes per second for testing).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: connectd: we now throttle outgoing gossip at 1MB/second per peer.
Handling half in main() and half here was a mess. And the name
"max_blockheight" was poor: it was the max in the db, or UINT32_MAX,
but then we changed it depending on what height we wanted to start at.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They're cheap. The 2x channels heuristic is nice, but does assume
they restart every so often. If someone hits 64k connections I would
like to know anyway!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1024 is a common limit, and people are starting to hit that many channels, so we should increase it: twice the number of channels seems reasonable, though we only do this at restart time.
Changelog-Changed: lightningd: we now try to increase the number of file descriptors, if it's less than twice the number of channels at startup (and log if we cannot!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We wire through --dev options into the hsmd, and test preapprove accept and deby
with both old and new protocols.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LDK doesn't set this feature if they don't have any useful gossip (mobile nodes)
and it was agreed at the spec meeting that we should repurpose this feature
to mean "I don't have any useful gossip".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>