Unfortunately LND does not force close the channels on receiving an error,
they blame us for this behaviour (abb1e3463f/htlcswitch/link.go (L2119))
To fix this we will send them a Bogus Channel Reestablish with 0 commitment_number and invalid last_per_commit_secret.
Key Changes:
- In connect_activate_subd, if we detect a stub channel, we serialize and send a bogus channel_reestablish message.
Changelog-Fixed: Fixing LND's non responsive behaviour on receiving an error.
We can fail the fundchannel because of a reconnect race: funder tells
us to reconnect so fast that we are still cleaning up from last time.
We deliberately defer clean up, to give the subds a chance to process any
final messages. However, on reconnection we force them to exit immediately:
but this causes new `connect` commands to see the exit and fail.
The workaround is to do this cleanup when a `connect` command is
issued (as well as the current case, which covers an automatic
reconnection or an incoming reconnection)
```
2025-05-09T00:40:37.1769508Z def test_reconnect_signed(node_factory):
2025-05-09T00:40:37.1770273Z # This will fail *after* both sides consider channel opening.
2025-05-09T00:40:37.1770850Z disconnects = ['<WIRE_FUNDING_SIGNED']
2025-05-09T00:40:37.1771298Z if EXPERIMENTAL_DUAL_FUND:
2025-05-09T00:40:37.1771735Z disconnects = ['<WIRE_COMMITMENT_SIGNED']
2025-05-09T00:40:37.1772155Z
2025-05-09T00:40:37.1772598Z l1 = node_factory.get_node(may_reconnect=True, disconnect=disconnects)
2025-05-09T00:40:37.1773210Z l2 = node_factory.get_node(may_reconnect=True)
2025-05-09T00:40:37.1773632Z
2025-05-09T00:40:37.1773917Z l1.fundwallet(2000000)
2025-05-09T00:40:37.1774268Z
2025-05-09T00:40:37.1774611Z l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
2025-05-09T00:40:37.1775151Z > l1.rpc.fundchannel(l2.info['id'], CHANNEL_SIZE)
2025-05-09T00:40:37.1775388Z
2025-05-09T00:40:37.1775485Z tests/test_connection.py:667:
...
2025-05-09T00:40:37.1799527Z > raise RpcError(method, payload, resp['error'])
2025-05-09T00:40:37.1800993Z E pyln.client.lightning.RpcError: RPC call failed: method: fundchannel, payload: {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'amount': 50000, 'announce': True}, error: {'code': -1, 'message': 'Disconnected', 'data': {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'method': 'openchannel_update'}}
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This speeds logging slightly, but most of the time is actually
spent in fflush() (which we need).
Result (10 runs, eatmydata):
FAILED tests/test_connection.py::test_bench - assert 24.086638-25.347475(24.6901+/-0.4) == 0
This is an 8% performance improvement (over the entire series), which is not bad.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we don't have to iterate through all plugins, making
our "do we even have to construct this notification" optimization
much more efficient.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If nobody is subscribed, have notify_start return NULL and the caller
can skip serialization. This is particularly useful for the "log"
notification which can get called a lot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
DNS seeds have been down/offline for a while, and this code (which
blocks!) has been a source of trouble. We should probably use a
canned set of "known nodes" if we want to bootstrap.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: we no longer use DNS seeds for peer lookup fallbacks.
Fixes: https://github.com/ElementsProject/lightning/issues/7913
Reported-by: daywalker90
Changelog-Deprecated: JSON-RPC: channel_state_changed notification field `old_state` value "unknown" (it will be omitted, instead)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
One in 256 times, we will grind a signature to 70 bytes (or shorter). This breaks
our feerate tests. Unfortunately the grinding is deterministic, so there doesn't
seem to be a way to avoid it. So we add a log message, and then we skip the
feerate test if it happens.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we've fixed various underlying weight mis-estimation, we can do a few final tweaks and
test that anchors work as intended.
1. We assumed a p2wpkh output, but modern change is p2tr.
2. We handed `2` instead of `1` to bitcoin_tx_core_weight (which doesn't matter, but was weird).
3. Make change calculation clearer. I'm not sure the previous one was wrong, but it was harder
to understand.
4. Fix the test and make it clearly test that we are aiming for (and achieving) the right feerate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: anchors' fees are now much closer to the feerate targets.
We use this for anchors, in which case we have a minimum value for
change. If we don't take this into account, we end up with a lower
feerate once we actually create the tx.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Due to a bug elsewhere I actually triggered this path, and it
broadcast the tx anyway, *then* closed the channel. We should abandon
the channel if we can, instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I'm about to update our utxo type, but Christian spotted that this is
part of the ABI for the hsm. So make that a private "hsm_utxo" type,
to insulate it from changes.
In particular, the HSM versions only contain the fields that the
hsm cares about, and the wire format is consistent (even though that
*did* include some of those fields, they are now dummies).
In the long term, this should be removed from the ABI: once we
no longer have "close_info" utxos, this information should already be
in the PSBT.
I tested this hadn't accidentally changed the wire format by disabling
version checks and using an old hsmd with the altered daemons and
running the test suite.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Not just the key index.
Also, remove FIXME: wallet_can_spend is no longer slow with lots of inputs!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to fix the feerate calculations in various places, and one
side effect is that we end up trying to add an empty anchor if none is
necessary (and failing, but we log a nasty message about it).
So don't do that, and fix the test which expected it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A log event LOG_TRACE submitted by a plugin was being logged as
**BROKEN** by lightningd before this commit.
Changelog-Fixed: plugins can now log events under the LOG_TRACE flag.
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
This may be useful for their recovery, though they should see the spend onchain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: We now reply to `channel_reestablish` even on long-closed channels.
This was always false. peer_start_channeld was called in various places
with the argument "NULL" instead of "false", which unfortunately compilers
didn't complain about :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael of Boltz told me this long ago, that when fees spiked some of their clients' opens got stuck. After two weeks their node forgot them, and when fees came back down the opens still failed. Unfortunately, I can't see an issue for this!
We can give some grace: we don't want to waste too many resources, but 100 channels is nothing.
The test needs to be adjusted to have *two* slow open channels, now.
Changelog-Changed: Protocol: we won't forget still-opening channels after 2016 blocks, unless there are more than 100.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to start loading them into memory for nicer responses if
people try to reestablish closed channels, but we don't care about ones
which were never actually opened. We could add a new state, but easier
to simply remove them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Note that documentation says invoice expiries can batch, but that's no
longer true, so delete it. Usually, we miss a number because the
change is too fast.
This adds the wait interface, but it doesn't actually fire until the next
commit, which wires it into the db code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSON-RPC: `wait` now supports the `htlcs` (`listhtlcs`) subsystem.
We're not going to increment one at a time for bulk deletion of htlcs
when a channel closes. There could be millions of HTLCs!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It makes the schema simpler, and indeed, expressable by GRPC.
Changelog-Added: JSON-RPC: `wait` now has separate `invoices`, `forwards` and `sendpays` objects for each subsystem.
Changelog-Deprecated: JSON-RPC: `wait` reply `details` object: use subsytem specific object instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. trace_span_start() is always called with a string literal, so
no copy needed (and we can use a macro to enforce this).
2. trace_span_tag() name and value are always longer-lived than
the span, so no need to copy these either.
Before:
real 0m18.524000-19.100000(18.7674+/-0.21)s
user 0m16.171000-16.833000(16.424+/-0.26)s
sys 0m2.259000-2.400000(2.337+/-0.059)s
After:
real 0m16.421000-18.407000(17.8128+/-0.72)s
user 0m14.242000-16.041000(15.5382+/-0.67)s
sys 0m2.179000-2.363000(2.273+/-0.061)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I added this debugging because the next test revealed a mismatch, so
I wanted to see where it was happening.
The comment in lightningd suggests it's possible, but I can't see any
code which suspends in the lightningd io_loop, so I cannot see how
this is triggered.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This can happen with 24.11 and later. We scan back to exposed channel
opens, or that release.
The BROKEN log messages cause some tests to fail, so we fix those.
Fixes: https://github.com/ElementsProject/lightning/issues/8169
Changelog-Fixed: wallet: rescan for missing close outputs (can happen if peer doesn't support option_shutdown_anysegwit)
When we make mocks (which the next patch will do), these names cause a warning:
```
wallet/test/run-db.c:32:64: error: declaration of ‘bitcoind’ shadows a parameter [-Werror=shadow=compatible-local]
32 | void (*cb)(struct bitcoind *bitcoind UNNEEDED,
wallet/test/run-db.c:30:53: note: shadowed declaration is here
30 | struct bitcoind *bitcoind UNNEEDED,
wallet/test/run-db.c:33:51: error: declaration of ‘height’ shadows a parameter [-Werror=shadow=compatible-local]
33 | u32 height UNNEEDED,
wallet/test/run-db.c:31:40: note: shadowed declaration is here
31 | u32 height UNNEEDED,
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We select the close key index at opening time, but the non-DF code didn't correctly register the
address as possibly used for P2WPKH for older nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: wallet: we could miss our own returned outputs on mutual closes if peer doesn't support option_shutdown_anysegwit (you will still need to rescan after update, if this happened to you!)
Reported-by: Grubles
We pre-close incoming under some circumstances, so this does happen (it
didn't when this code was written). Don't walk all the HTLCs complaining
about them in this case, and don't freak out.
Changelog-Fixed: lightningd: incorrect spamming of log and potential crash on testnet case of duplicate HTLCs and slow closing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/8176
It's not the *outgoing* HTLC which sets the deadline, it's the incoming.
Reported-by: @whitslack
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: Egregious anchor fee paid for unilateral close txs due to HTLC timeouts; it's not as urgent as our code made out!
Since we included the spec for it, this is a good time to implement
it.
I also asked chatgpt to write some unit tests. I had to mangle them a
bit, but it probably saved me a few minutes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Unfortunately a spec typo means the data fields are missing (PR pending),
so we still patch those in.
The message "your_peer_storage" got renamed to "peer_storage_retrieval",
and the option "want_peer_backup_storage" was removed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `experimental-peer-storage` now only advertizes feature 43, not 41.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: plugins which didn't accept string JSON RPC fields (deprecated v23.08, disabled by default in v24.11).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: allowing 0/1 instead of false/true for plugin options (deprecated v23.08, disabled by default in v24.11).