palladum-lightning

Author	SHA1	Message	Date
Rusty Russell	963b353a30	connectd: use membuf for more efficient output queue. This is exactly what membuf is for: it handles expansion much more neatly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2026-02-18 14:13:25 +10:30
Rusty Russell	2436ee6f6f	connectd: don't flush messages unless we have something important. This replaces our previous nagle-based toggling. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2026-02-18 14:13:25 +10:30
Rusty Russell	8b90d40a75	connectd: pad messages with dummy pings if needed to make size uniform. Messages are now constant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: Protocol: we now pad all peer messages to make them the same length.	2026-02-18 14:13:25 +10:30
Rusty Russell	c23b7a492d	connect: switch to using io_write_partial instead of io_write. This gives us finer control over write sizes: for now we just cap the write size. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2026-02-18 14:13:25 +10:30
Rusty Russell	bcce29eeb0	connectd: remove unused flag to connect_init. We haven't announced websocket addresses for some time! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2026-01-20 19:32:42 +10:30
Rusty Russell	79e609468a	connectd: don't complain if lightningd is unresponsive while doing dev-memleak. We had a flake of form: ``` 2025-11-18T04:42:23.489Z BROKEN 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: wake delay for WIRE_CHANNEL_REESTABLISH: 6789msec ``` Which happened as we're shutting down. Some investigation revealed the cause: `dev-memleak` can be extremely slow. Fair enough. So we change `dev-memleak` to call connectd first, and connectd uses that as a trigger to stop complaining about delays. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-11-19 14:29:08 +10:30
Rusty Russell	522457a12b	connectd, gossipd, pay, bcli: use timemono when solely measuring duration for timeouts. This is immune to things like clock changes, and has the convenient side-effect that it will not be overridden when we override time for developer purposes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-11-13 21:21:29 +10:30
Rusty Russell	21ad33151c	connected: tell lightningd if we didn't find an address we could even try to connect to. This is important: if it's tor-only and we don't have a proxy, we will fail to connect, but it's no indication that the node is unreachable. Same with IPv6. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-11-12 13:58:43 +10:30
Rusty Russell	565f7deec0	connectd: at disconnected, tell lightningd how long we were connected. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-11-12 13:58:43 +10:30
Rusty Russell	88b9b0bc28	connectd: report ping latencies (from ping probes) to lightningd. (Uninitialize ping_start on manual ping fixed by Alex Myers) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-11-12 13:58:43 +10:30
Rusty Russell	0f07578c3f	connectd: return reason, connect time to lightningd on connection results. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-11-12 13:58:43 +10:30
Rusty Russell	f6a4e79420	global: remove unnecessary includes from headers. Each header should only include the other headers it needs to compile; `devtools/reduce-includes.sh /.h` does this. The C files then need additional includes if they don't compile. And remove the entirely useless wire/onion_wire.h, which only serves to include wire/onion_wiregen.h. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-10-23 06:44:04 +10:30
Rusty Russell	694626f050	connectd: fix race where last msg can still get lost. openingd sends an ERROR, and exits. lightningd tells us to disconnect. We read from lightningd first, and don't read from openingd. We need to drain subds when we're told to disconnect.	2025-10-01 12:12:56 +09:30
Rusty Russell	0d97631075	connectd: simplify logic, and add a "reconnected" message. One issue we have in CI is reconnection races: if an incoming connection arrives while an outgoing one is negotiated, we close the outgoing one and issue a disconnect, which fails any connect attempts. By sending a "reconnected" message instead of disconnect/connect we can avoid disturbing in-progress connection attempts which happens in CI quite a bit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-10-01 12:12:56 +09:30
Rusty Russell	5f5440383d	lightningd: fix race with crossover pings. We cannot use subd_req() here: replies will come out of order, and the we should not simply assign the reponses in FIFO order. Changelog-Fixed: lightningd: don't get confused with parallel ping commands. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-08-14 17:35:39 +09:30
Rusty Russell	a0fd72eb5e	connectd: warn if we ignore peer incoming for longer than 5 seconds. One reason why ping processing could be slow is that, once we receive a message from the peer to send to a subdaemon, we don't listen for others until we've drained that subdaemon queue entirely. This can happens for reestablish: slow machines can take a while to set that subdaemon up. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-08-14 17:35:39 +09:30
Rusty Russell	0a94f3b570	connectd: remove DNS seed lookups. DNS seeds have been down/offline for a while, and this code (which blocks!) has been a source of trouble. We should probably use a canned set of "known nodes" if we want to bootstrap. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: Protocol: we no longer use DNS seeds for peer lookup fallbacks. Fixes: https://github.com/ElementsProject/lightning/issues/7913	2025-05-08 12:54:09 +09:30
Rusty Russell	c779abdcd2	connectd: don't run more than one reconnect timer at once. From grubles' logs: ``` 2025-01-06T15:30:31.449Z DEBUG lightningd: attempting connection to 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923 for additional gossip 2025-01-06T15:30:31.449Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Adding 0 addresses to important peer 2025-01-06T15:30:31.449Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:31.449Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:32.037Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:32.037Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:32.428Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:32.428Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:32.680Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:32.681Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:33.468Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:33.469Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:33.471Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:33.471Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:33.935Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:33.935Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:34.125Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:34.125Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:35.496Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:35.497Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:35.623Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:35.623Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:35.751Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:35.751Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds 2025-01-06T15:30:35.892Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Failed connected out: Unable to connect, no address known for peer 2025-01-06T15:30:35.892Z DEBUG 035ca2fe4793a5e789ce846062eb4834f573c060d9200ce77544a29b48a0aa5923-connectd: Will try reconnect in 300 seconds ``` We promised to wait 300 seconds! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-05-08 12:54:09 +09:30
Rusty Russell	b6c1ffa359	ccan/htable: update to explicit DUPS/NODUPS types. The updated API requires typed htables to explicitly state whether they allow duplicates: for most cases we don't, but we've had issues in the past. This is a big patch, but mainly mechanical. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2025-01-21 09:18:25 +10:30
Rusty Russell	faf7ae6ad4	pytest: add test for connection ratelimiting. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	3d294f813d	connectd: limit to 10 connections at once. We wait until a connection fails, or a subd is connected to the peer, before letting another one through. This should prevent us from overwhelming lightningd on large nodes, but unlike the previous back-off, it's based on how fast lightningd is, not an arbitrary time. We also let one through each second, in case we're connecting to many, but not doing anything but gossip (e.g. 100 explicit connect commands). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: Reconnecting to peers at startup should be significantly faster (dependent on machine speed).	2024-11-25 15:39:13 +10:30
Rusty Russell	3587afeaa2	connectd: remove transient flag. The important flag replaces it, and now we can be more intelligent about eviction in overload. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	15950bb7d4	connectd: reconnect for non-transient connections. Rather than have lightningd call us repeatedly to try to connect, have it tell us what peers are transient and aren't, and connectd will automatically try to maintain that connection. There's a new "downgrade_peer" message to tell it a peer is now transient: to make it non-transient we simply tell connectd to connect as a non-transient. The first time, I missed that dual_open_control does its own state transitions :( Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: `connectd` now handles maintaining/reconnecting to important peers, and we remember the last successful address we connected to.	2024-11-25 15:39:13 +10:30
Rusty Russell	4ee59e7a49	connectd: expose --dev-no-reconnect and --dev-fast-reconnect options. Once connectd is controlling reconnections, it'll need these. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	23dc10cf81	connectd: get our own addresses to contact node from node_announcements. Let lightningd feed us hints to try first, but we can extract the addresses from node_announcement messages ourselves. (Lightningd used to ask gossipd on our behalf: this is far simpler!) One side effect of this is that we don't hand back address hints given to us by lightningd: it would use these again for reconnecting. This is breaks test_sendpay_grouping, so we disable it temporarily. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-11-25 15:39:13 +10:30
Rusty Russell	5d42600076	connectd: ratelimit onion messages However fast we can handle them, it's antisocial to allow others to make us spam the rest of the network. Changelog-Protocol: onion messages: we limit incoming to 4 per second, allowing a little burst. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-10 13:34:00 +02:00
Rusty Russell	f122c0beb4	connectd: include map of scid->peer node id. This will let us fwd onion messages via scid, even if they're aliases. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-10 13:34:00 +02:00
Rusty Russell	4a78d17748	connectd: do response to gossip queries, don't hand them to gossipd. This basically means moving the code from gossipd to connectd to handle these queries. This will get connectd have finer control over ratelimiting them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-10 12:21:19 +09:30
Rusty Russell	d60977f37f	connectd: use gossmap streaming interface. This is more efficient in a few ways: 1. It's trivial to get to the end of the gossip_store, we don't have to iterate. 2. It tends to be mmaped so we don't have to call pread(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-07-10 12:21:19 +09:30
Rusty Russell	401533667d	connectd: throttle streaming gossip for peers. We currently stream gossip as fast as we can, even if they start at timestamp 0. Instead, use a simple token bucket filter and only let them have 1MB per second (500 bytes per second for testing). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Protocol: connectd: we now throttle outgoing gossip at 1MB/second per peer.	2024-07-10 12:21:19 +09:30
Rusty Russell	155311b053	connectd: --dev-handshake-no-reply so we can test pending connections. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-14 18:16:26 -05:00
Rusty Russell	a9b7402910	pytest: test dropping transient connections. Requires a hack to exhaust connectd fds and make us close a transient. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-14 18:16:26 -05:00
Rusty Russell	8268df9a4b	connectd: implement "transient" connections. Currently, anything which doesn't have a live channel is considered transient. We free this first under stress, and also if they're still connecting. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-14 18:16:26 -05:00
Rusty Russell	d3dbcf03fa	channeld: close an unimportant connection when fds get low. We use a crude heuristic: if we were trying to contact them, it's a "deliberate" connection, and should be preserved. Changelog-Changed: connectd: prioritize peers with channels (and log!) if we run low on file descriptors. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-09 01:23:46 -05:00
Rusty Russell	6a648fd2bc	connectd: use hash table, not linked list, for connecting structs. I thought I was going to want to have a convenient way of counting these, but it turns out unnecessary. Still, this is slightly more efficient and simple, so I am including it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2024-05-09 01:23:46 -05:00
Rusty Russell	ad7dcf381e	lightningd: tell connectd about the custom messages. We re-send whenever a plugin which allows them starts/finishes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-10-24 11:50:57 +10:30
Rusty Russell	0ff91e65dc	connectd: remove #if DEVELOPER We still refuse to run dev commands if lightningd sends it to us despite us not being in developer mode, but that's mainly paranoia. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-09-21 20:08:24 +09:30
Rusty Russell	a9f26b7d07	common/daemon.c: remove #ifdef DEVELOPER in favor of runtime flag. Also requires us to expose memleak when !DEVELOPER, however we only ever used the memleak tracking when the LIGHTNINGD_DEV_MEMLEAK environment variable was set, so keep that. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-09-21 20:08:24 +09:30
Rusty Russell	ed58c24bc7	connectd: log broken if TCP_CORK fails. But not if we're a developer using dev_disconnect, which substitutes the fd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-04-10 09:41:56 +09:30
Rusty Russell	295557ac50	connectd: don't try to set TCP_CORK on websocket pipe. Most of this is piping the flag through so we know it's a websocket! Reported-by: @ShahanaFarooqui Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-04-10 09:41:56 +09:30
Rusty Russell	2209d0149f	connectd: add new start_shutdown message. We stop listening, and also refuse to send "connectd_peer_spoke" to create new subdaemons. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-02-05 20:40:47 +01:00
Rusty Russell	05ac74fc44	connectd: keep array of our listening sockets. This allows us to free them if we want to stop listening. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-02-05 20:40:47 +01:00
Rusty Russell	6a95d3a25e	common: expose node_id_hash functions. They're used in several places, and we're about to add more. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-01-21 08:05:31 -06:00
Rusty Russell	81e57dce52	connectd: ensure htables are always tal objects. We want to change the htable allocator to use tal, which will need this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2023-01-12 11:44:10 +10:30
Rusty Russell	d31420211a	connectd: add counters to each peer connection. This allows us to detect when lightningd hasn't seen our latest disconnect/reconnect; in particular, we would hit the following pattern: 1. lightningd says to connect a subd. 2. connectd disconnects and reconnects. 3. connectd reads message, connects subd. 4. lightningd reads disconnect and reconnect, sends msg to connect to subd again. 5. connectd asserts because subd is alreacy connected. This way connectd can tell if lightningd is talking about the previous connection, and ignoere it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	9b6c97437e	connectd: remove reconnection logic. We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: #5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	8678c5efb3	connectd: release peer soon as lightingd tells us. Now we have separate peer draining logic, we can simply use it when connectd tells us to release the peer, without waiting. (We could simply free the peer, but that's a bit rude, as messages can get lost). This removes various complex flags and logic we had before. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.	2022-07-18 20:50:04 -05:00
Rusty Russell	9dc3880360	connectd: put peer into "draining" mode when we want to close it. This removes it from the hashtable, and forces it to do nothing but send out any remaining packets, then close. It is, in effect, reduced to a stub, with no further interactions with the rest of the system (all subds are freed already). Also removes the need for an explicit "final_msg" too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	6fd8fa4d95	connectd: optimize requests for "recent" gossip. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-15 21:18:29 +09:30
Rusty Russell	7dd8e27862	connectd: don't insist on ping replies when other traffic is flowing. Got complaints about us hanging up on some nodes because they don't respond to pings in a timely manner (e.g. ACINQ?), but that turned out to be something else. Nonetheless, we've had reports in the past of LND badly prioritizing gossip traffic, and thus important messages can get queued behind gossip dumps! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: connectd: give busy peers more time to respond to pings.	2022-07-09 12:27:05 +09:30

1 2

81 Commits