1. It was flaky, probably because it didn't wait for the remote update_channel.
2. Rusty applied a fix in 5f664dac77, not clear if it worked.
3. Christian disabled it altogether in 23ce9a947d.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Plugins: "filters" can be specified on the `custommsg` hook to limit what message types the hook will be called for.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
65dccea5bd "pytest: fix flake in test_reconnect_signed" accidentally
introduced a bug, where the connect command may not return.
If we call "connect" while a connection is still being processed
through the peer_connected hooks, we would call peer_channels_cleanup(),
which (if the peer has no channels) would free the peer.
Then when the peer_connected hook returned, it would lookup the peer,
see it was gone, and silently return. The connect_succeeded() function
was never called, and the connect command never woken.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-None: bug introduced this release.
This is important: if it's tor-only and we don't have a proxy, we will fail
to connect, but it's no indication that the node is unreachable. Same with
IPv6.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically, `devtools/reduce-includes.sh */*.c`.
Build time from make clean (RUST=0) (includes building external libs):
Before:
real 0m38.944000-40.416000(40.1131+/-0.4)s
user 3m6.790000-17.159000(15.0571+/-2.8)s
sys 0m35.304000-37.336000(36.8942+/-0.57)s
After:
real 0m37.872000-39.974000(39.5466+/-0.59)s
user 3m1.211000-14.968000(12.4556+/-3.9)s
sys 0m35.008000-36.830000(36.4143+/-0.5)s
Build time after touch config.vars (RUST=0):
Before:
real 0m19.831000-21.862000(21.5528+/-0.58)s
user 2m15.361000-30.731000(28.4798+/-4.4)s
sys 0m21.056000-22.339000(22.0346+/-0.35)s
After:
real 0m18.384000-21.307000(20.8605+/-0.92)s
user 2m5.585000-26.843000(23.6017+/-6.7)s
sys 0m19.650000-22.003000(21.4943+/-0.69)s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each header should only include the other headers it needs to compile;
`devtools/reduce-includes.sh */*.h` does this. The C files then need
additional includes if they don't compile.
And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
One issue we have in CI is reconnection races: if an incoming
connection arrives while an outgoing one is negotiated, we close the
outgoing one and issue a disconnect, which fails any connect attempts.
By sending a "reconnected" message instead of disconnect/connect we
can avoid disturbing in-progress connection attempts which happens in CI
quite a bit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We cannot use subd_req() here: replies will come out of order, and the
we should not simply assign the reponses in FIFO order.
Changelog-Fixed: lightningd: don't get confused with parallel ping commands.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can fail the fundchannel because of a reconnect race: funder tells
us to reconnect so fast that we are still cleaning up from last time.
We deliberately defer clean up, to give the subds a chance to process any
final messages. However, on reconnection we force them to exit immediately:
but this causes new `connect` commands to see the exit and fail.
The workaround is to do this cleanup when a `connect` command is
issued (as well as the current case, which covers an automatic
reconnection or an incoming reconnection)
```
2025-05-09T00:40:37.1769508Z def test_reconnect_signed(node_factory):
2025-05-09T00:40:37.1770273Z # This will fail *after* both sides consider channel opening.
2025-05-09T00:40:37.1770850Z disconnects = ['<WIRE_FUNDING_SIGNED']
2025-05-09T00:40:37.1771298Z if EXPERIMENTAL_DUAL_FUND:
2025-05-09T00:40:37.1771735Z disconnects = ['<WIRE_COMMITMENT_SIGNED']
2025-05-09T00:40:37.1772155Z
2025-05-09T00:40:37.1772598Z l1 = node_factory.get_node(may_reconnect=True, disconnect=disconnects)
2025-05-09T00:40:37.1773210Z l2 = node_factory.get_node(may_reconnect=True)
2025-05-09T00:40:37.1773632Z
2025-05-09T00:40:37.1773917Z l1.fundwallet(2000000)
2025-05-09T00:40:37.1774268Z
2025-05-09T00:40:37.1774611Z l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
2025-05-09T00:40:37.1775151Z > l1.rpc.fundchannel(l2.info['id'], CHANNEL_SIZE)
2025-05-09T00:40:37.1775388Z
2025-05-09T00:40:37.1775485Z tests/test_connection.py:667:
...
2025-05-09T00:40:37.1799527Z > raise RpcError(method, payload, resp['error'])
2025-05-09T00:40:37.1800993Z E pyln.client.lightning.RpcError: RPC call failed: method: fundchannel, payload: {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'amount': 50000, 'announce': True}, error: {'code': -1, 'message': 'Disconnected', 'data': {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'method': 'openchannel_update'}}
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
DNS seeds have been down/offline for a while, and this code (which
blocks!) has been a source of trouble. We should probably use a
canned set of "known nodes" if we want to bootstrap.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: we no longer use DNS seeds for peer lookup fallbacks.
Fixes: https://github.com/ElementsProject/lightning/issues/7913
We thought it was a good idea to terminate channel subds, but we included onchaind as well!
```
2025-01-29T21:31:46.053Z UNUSUAL 03…00-channeld-chan#255683: Adding HTLC 1737 too slow: killing connection
2025-01-29T21:31:46.053Z INFO 03…00-chan#255683: Peer transient failure in CHANNELD_NORMAL: Adding HTLC timed out: killed connection
2025-01-29T21:31:46.053Z DEBUG 03…00-channeld-chan#255683: Status closed, but not exited. Killing
2025-01-29T21:31:46.054Z DEBUG 03…00-chan#255683: Failing HTLC 1738 due to peer death
2025-01-29T21:31:46.058Z DEBUG 03…00-chan#255683: Failing HTLC 1737 due to peer death
2025-01-29T21:31:46.058Z DEBUG 03…00-chan#255673: Forcing disconnect due to One channel had an error
2025-01-29T21:31:46.059Z DEBUG 03…00-onchaind-chan#255673: Status closed, but not exited. Killing
2025-01-29T21:31:46.093Z DEBUG 03…00-connectd: disconnect_peer
2025-01-29T21:31:46.093Z DEBUG 03…00-lightningd: peer_disconnect_done
```
Reported-by: @whitslack
Fixes: https://github.com/ElementsProject/lightning/issues/8055
Changelog-Fixed: onchaind: don't die if we fail an unrelated channel with the same peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than have lightningd call us repeatedly to try to connect, have
it tell us what peers are transient and aren't, and connectd will
automatically try to maintain that connection.
There's a new "downgrade_peer" message to tell it a peer is now
transient: to make it non-transient we simply tell connectd to
connect as a non-transient.
The first time, I missed that dual_open_control does its own state
transitions :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `connectd` now handles maintaining/reconnecting to important peers, and we remember the last successful address we connected to.
We're going to use this to ask if there are any channels which make it
important to reconnect to the peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Let lightningd feed us hints to try first, but we can extract the
addresses from node_announcement messages ourselves.
(Lightningd used to ask gossipd on our behalf: this is far simpler!)
One side effect of this is that we don't hand back address hints given to us
by lightningd: it would use these again for reconnecting. This is breaks
test_sendpay_grouping, so we disable it temporarily.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we set them (i.e. at lockin), when we fire up channeld (for
aliases, which we create at channel init, but aren't really useful
until we have finished channel opening), and at startup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Unlike "sendonionmessage" which instructs us to send to a peer, this
process it locally (presumably, it contains the next hop). This is
useful because it allows us to process an onion message which starts
with us (a legal case for a blinded path supplied by someone else!).
It also opens the door to bolt12 self-pay.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This basically means moving the code from gossipd to connectd to handle
these queries.
This will get connectd have finer control over ratelimiting them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently stream gossip as fast as we can, even if they start at
timestamp 0. Instead, use a simple token bucket filter and only let
them have 1MB per second (500 bytes per second for testing).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: connectd: we now throttle outgoing gossip at 1MB/second per peer.
Currently, anything which doesn't have a live channel is considered transient.
We free this first under stress, and also if they're still connecting.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This has the benefit of being shorter, as well as more reliable (you
will get a link error if we can't print it, not a runtime one!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It then waits 10 more seconds (for plugins to call setlease, especially)
before it will update a node_announcement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I did some CHANGELOG and git digging to see when these were deprecated, and
some were very old (v0.8.2!). But since they didn't warn users loudly, I
chose to do so this release only.
I renamed ld's `deprecated_apis` to `deprecated_ok` to make sure I
caught them all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Create a notification that is triggered when a `costummsg` is received.
Changelog-Added: Plugins: notification custommsg for receiving an unknown protocol message
This makes `check` much more thorough, and useful.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSON-RPC: `check` now does much more checking on every command (not just basic parameter types).
We should use capability tests for states (can you add htlcs?) rather than vague
descriptions (are you closing?).
And as much as possible, use switch () statements to force us to think
about all the cases, especially when we add new states!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>