Commit Graph

1850 Commits

Author SHA1 Message Date
Erick Cestari
d7319795b4 common/bolt11: enforce minimum witness program length for fallback addresses
BIP-141 specifies that a witness program must be between 2 and 40 bytes in
length. In our fallback address parsing, we were already checking the upper
bound, but missing the lower bound check. This commit adds validation to
ensure fallback address witness programs are at least 2 bytes long, bringing
our implementation in line with the spec and other implementations like
rust-lightning.

Changelog-Fixed: Enforced minimum witness program length of 2 bytes for
fallback addresses to comply with BIP-141 and prevent invalid decodings.
2025-04-15 10:41:33 +09:30
Rusty Russell
2e6ad3ffc8 trace: handle key being freed while suspended.
This happens with autoclean, which does a datastore request then frees
the parent command without waiting for a response (see clean_finished).

This leaks a trace, and causes a crash if the pointer is later reused.

My solution is to create a trace variant which declares the trace key
to be a tal ptr and then we can clean up in the destructor if this happens.
This fixes the issue for me.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: autoclean: fixed occasional crash when tracepoints compiled in.
2025-04-03 08:27:27 -05:00
Rusty Russell
ee2ec8befa trace: minimal fix to avoid crash when > 128 traces active.
chanbackup with many peers can do more than 128 concurrent rpc commands.
autoclean is the other plugin which can do many requests at once, so I
expect a similar issue there.

I tested this by rebuilding with `MAX_ACTIVE_SPANS` 1, which autoclean
tests triggered immediately.

The real fix is probably to use a hash table with a large initial size.

```
Mar 24 06:30:45 mlbb2 sh[28000]: chanbackup: common/trace.c:190: trace_span_slot: Assertion `s' failed.
Mar 24 06:30:45 mlbb2 sh[28000]: chanbackup: FATAL SIGNAL 6 (version v25.02)
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232bac4f send_backtrace
Mar 24 06:30:45 mlbb2 sh[28000]:         common/daemon.c:33
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232baceb crashdump
Mar 24 06:30:45 mlbb2 sh[28000]:         common/daemon.c:78
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958cd851f ???
Mar 24 06:30:45 mlbb2 sh[28000]:         ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958d2c9fc __pthread_kill_implementation
Mar 24 06:30:45 mlbb2 sh[28000]:         ./nptl/pthread_kill.c:44
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958d2c9fc __pthread_kill_internal
Mar 24 06:30:45 mlbb2 sh[28000]:         ./nptl/pthread_kill.c:78
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958d2c9fc __GI___pthread_kill
Mar 24 06:30:45 mlbb2 sh[28000]:         ./nptl/pthread_kill.c:89
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958cd8475 __GI_raise
Mar 24 06:30:45 mlbb2 sh[28000]:         ../sysdeps/posix/raise.c:26
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958cbe7f2 __GI_abort
Mar 24 06:30:45 mlbb2 sh[28000]:         ./stdlib/abort.c:79
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958cbe71a __assert_fail_base
Mar 24 06:30:45 mlbb2 sh[28000]:         ./assert/assert.c:94
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958ccfe95 __GI___assert_fail
Mar 24 06:30:45 mlbb2 sh[28000]:         ./assert/assert.c:103
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232ab7fa trace_span_slot
Mar 24 06:30:45 mlbb2 sh[28000]:         common/trace.c:190
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232abc9f trace_span_start
Mar 24 06:30:45 mlbb2 sh[28000]:         common/trace.c:267
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232a7c34 send_outreq
Mar 24 06:30:45 mlbb2 sh[28000]:         plugins/libplugin.c:1112
```

Changelog-Fixed: autoclean/chanbackup: fixed tracepoint crash on large number of requests.
Fixes: https://github.com/ElementsProject/lightning/issues/8177
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-04-01 09:20:07 -05:00
Rusty Russell
8dae8be5fb common: fix crash when we have a localmod with unrepresentable fee values.
We handed NULL as the logcb, resulting in a very uninformative crash:

```
2025-03-14T03:46:36.447Z INFO    lightningd: Server started with public key 03d67f36c4f81789e2fe425028bacc96b199813eae426c517f589a45f1136c1fe5, alias Jubilee (color #dc42f4) and lightningd v25.02
topology: FATAL SIGNAL 11 (version v25.02)
0x560037f64aad send_backtrace
        common/daemon.c:33
0x560037f64b49 crashdump
        common/daemon.c:78
0x7f6c41ff351f ???
        ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x0 ???
        ???:0
```

Changelog-Fixed: `topology` crash on invoice creation if a peer had a really high feerate.
Fixes: https://github.com/ElementsProject/lightning/issues/8156
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-04-01 16:33:23 +10:30
Rusty Russell
71eb04064c common: implement op_return test.
Since we included the spec for it, this is a good time to implement
it.

I also asked chatgpt to write some unit tests.  I had to mangle them a
bit, but it probably saved me a few minutes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-03-18 14:30:58 +10:30
Rusty Russell
733efcf7dd BOLTs: import spec additions for option_simple_close.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-03-18 14:30:58 +10:30
Rusty Russell
58e14284e2 BOLTs: Add typo fixes and clarifications from "More clarifications around channel_announcement handling (#1220)"
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-03-18 14:30:58 +10:30
Rusty Russell
67c91a7e5c BOLTs: Update to version with peer storage merged.
Unfortunately a spec typo means the data fields are missing (PR pending),
so we still patch those in.

The message "your_peer_storage" got renamed to "peer_storage_retrieval",
and the option "want_peer_backup_storage" was removed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `experimental-peer-storage` now only advertizes feature 43, not 41.
2025-03-18 14:30:58 +10:30
Rusty Russell
2e81a40d77 BOLTs: update for BOLT, which removes requirement to wait 6 blocks before sending announcement_signatures.
We have a replacement quote which is suitable here, but it comes in a later BOLT commit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-03-18 14:30:58 +10:30
Rusty Russell
f4872f96db common: return location of a ".setconfig" file when we load config.
This is where we will put all the dynamic settings.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-24 19:38:37 +10:30
Rusty Russell
da2fb198c4 setconfig: make source for setconfig "setconfig transient".
Previously it would be the value of the previous config setting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-24 19:38:37 +10:30
Aditya Sharma
fe02d2f1c5 scb_wire: Define new subtype 'modern_scb_chan' with 'scb_tlvs'
We define a new subtype 'modern_scb_chan' and a new tlvtype 'scb_tlvs' which includes
all the relevant information to create a penalty transaction when the peer tries to cheat.

Key Changes:
 - Rename the old format to 'legacy_scb_chan' and define a new type 'modern_scb_chan'
 - Include TLVs to 'modern_scb_chan'
 - Create a new msgtype 'static_chan_backup_with_tlvs'
 - Modify 'struct channel' to include 'struct modern_scb_chan'
 - Add these two types to 'varsize_types' in generate.py
2025-02-22 11:51:54 -06:00
Lagrang3
35f899a588 fetch max total htlc value from listpeerchannel
The parameter max_htlc_value_in_flight_msat stablished by peers on
channel opening (BOLT02) can now be retrived from the
gossmods_from_listpeerchannels API.

Adapted the corresponding callback functions in renepay and askrene to
take into account that value as a constraint to the value we can send
through a channel.

Changelog-Add: fetch max_htlc_value_in_flight_msat from gossmods_listpeerchannels API

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
2025-02-21 17:03:36 -06:00
Rusty Russell
e2be010cbb offers: correctly advertize MPP in invoice features.
Turns out we weren't wiring them through!  And libplugin wasn't reading them anyway.

Changelog-Fixed: lightningd: tell plugins our bolt12 features (so our bolt12 invoices explicitly allow MPP).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-14 22:17:21 +10:30
Rusty Russell
7577af0d59 common: if something isn't deprecated yet, it's always ok.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-14 22:17:21 +10:30
Rusty Russell
c3362b057c BOLT12: remove -offers from bolt12 quotes, update them.
Typo fixes and wording changes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 20:19:01 -06:00
Rusty Russell
ba3e85bb43 decode: handle new bip353 fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 20:19:01 -06:00
Rusty Russell
fa188c80ca common: update bolts to include hash value in bolt11 test vectors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 20:19:01 -06:00
Rusty Russell
b879cb475f common: update bolt to neaten pubkey descriptions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 20:19:01 -06:00
Rusty Russell
1820423cbc common: fix memcpy error in Fischer-Yates shuffle.
Reported by Grubles on ARM64:

```
VALGRIND=1 valgrind -q --error-exitcode=7 --track-origins=yes --leak-check=full --show-reachable=yes --errors-for-leak-kinds=all common/test/run-tal_arr_randomize > /dev/null
==151138== Source and destination overlap in memcpy(0x4d69f08, 0x4d69f08, 8)
==151138==    at 0x48CB68C: __GI_memcpy (vg_replace_strmem.c:1147)
==151138==    by 0x41B50B: tal_arr_randomize_ (pseudorand.c:84)
==151138==    by 0x41BB07: main (run-tal_arr_randomize.c:166)
==151138==
make: *** [Makefile:750: unittest/common/test/run-tal_arr_randomize] Error 7
```

It is correct: you can't overlap src and dst in memcpy.  It *probably* works in
this case, but it's undefined!

Fixes: https://github.com/ElementsProject/lightning/issues/7030
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 16:54:08 -06:00
Rusty Russell
2b4b1479ed gossipd: check that gossmap code sees updates from gossip_store writes.
After analyzing various weird cases where we ended up with duplicate
gossip_store entries, it could be explained by us not fully processing
the gossip store.

It's not clear that my assumptions that we would always see our own writes
are true: technically this may require an fsync().  So we now add the
check, and do an fsync and try again.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.
2025-02-11 15:11:47 -06:00
Rusty Russell
1df1300cc9 gossip_store: don't need to check for truncated amounts.
That's actually caught by the gossmap load now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
9d98740e18 gossmap: stricter checks when gossipd itself loads the gossip_store.
This means we will correctly reset the store if it has redundant
records, for example.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
05bc4ca5f3 gossmap: use mmap directly to check checksums.
Instead of making a copy.

To measure the performance impact, I timed
tests/test_askrene.py::test_real_biases on my laptop.

	No checksum check: 194.52s
	Copying for checksum check: 202.81s
	Zero-copy checksum check: 194.40s

But these numbers proved noisy.  Still, doesn't hurt.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
4b5e5b27ae gossmap: check checksums.
We assume if it's incorrect, we simply need to wait.  If this proves incorrect,
we will see a stream of BROKEN log messages.

To measure the performance impact, I timed
tests/test_askrene.py::test_real_biases on my laptop.

	Before: 194.52s
	After: 202.81s

So it's marginal.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
307dbe3e62 tests: put proper checksums into test gossip_store files.
We're about to test them in gossmap.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
5e2f6c5028 gossmap: don't stop reading if we hit a redundant channel_announce.
While this shouldn't happen, it does (pending other fixes), and we stop reading the
gossip store until next time.  The result is partial gossip, demonstrated beautifully
by NicolasDorier's report:

```
lightning_gossipd: gossmap: redundant channel_announce for 864063x1306x1, offsets 1272259 and 1784859!"
```

Gossipd stalld there and don't make more progress.  So gossipd itself
doesn't see the entire gossip_store.

Then things get really batshit:

```
2025-02-04T05:53:28.582Z DEBUG   gossipd: Store compact time: 1429910 msec
```

This took 1429 seconds to process.  Why?

Because it hasn't been processing the gossip store fully, gossipd kept adding "new" records to the end:

```
2025-02-04T05:53:28.583Z DEBUG   gossipd: gossip_store: Read 62716143/1739952/5158256/0 cannounce/cupdate/nannounce/delete from store in 31634458462 bytes, now 31634458440 bytes (populated=true)
```

It has 31GB of gossip in there!  No wonder it took so long...

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/8035
Changelog-Fixed: gossipd: corruption in the gossip_store could cause ever-longer startup times and no gossip updates.
2025-02-11 15:11:47 -06:00
Rusty Russell
fdfc7ce62f gossmap: add (and use) logging hook.
Default goes to stderr for LOG_UNUSUAL and higher.

We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
607b14fe12 common/gossmap: remove open-by-fd.
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.

In fact, it fixes a race if we don't have HAVE_PWRITEV.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-02-11 15:11:47 -06:00
Rusty Russell
927d062b04 gossmap: don't crash if we hit a zero-length record.
We have a report of this happening under ZFS.  We cannot do much if
this really is a problem where we can't read back what we write, but
this avoids the immediate crash.

Fixes: https://github.com/ElementsProject/lightning/issues/7971
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossmap: occasional crash (at least on ZFS) reading gossip_store.
2025-02-11 15:11:47 -06:00
Dusty Daemon
8151c037bb Splice: Stricter interop errors
Error when our peer tries to use the shared output txid as well as the prevtx bytes.

Changelog-Changed: Stricter tests for interop with Eclair
2025-02-05 12:50:24 -06:00
Dusty Daemon
a22fe31904 Splice: Fix comment typo 2025-02-05 12:50:24 -06:00
Rusty Russell
89ca5aef9f common: don't crash on send_backtrace() if bt_print is NULL.
This (and bt_exit) are NULL for libplugin.  We don't always want to crash!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-01-28 10:53:22 +10:30
Rusty Russell
7556164e58 dev_disconnect: add '<' to close after an incoming msg.
I'd been resisting this, but some tests really do want "hang up after
*receiving* this", and it's more reliable than getting the peer to
"hang up afer *sending* this" which seems not always work.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-01-27 11:07:04 +10:30
Rusty Russell
44c6a22e5f dev_disconnect: rename to dev_disconnect_out, in preparation for incoming filters.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-01-27 11:07:04 +10:30
Rusty Russell
b6c1ffa359 ccan/htable: update to explicit DUPS/NODUPS types.
The updated API requires typed htables to explicitly state whether they
allow duplicates: for most cases we don't, but we've had issues in the
past.

This is a big patch, but mainly mechanical.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-01-21 09:18:25 +10:30
Rusty Russell
5cecdd7dba common: add test for htable churn.
I wanted to make sure we didn't have a bug in our htable routine,
so I wrote this test.  We don't, but leave it in.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-01-21 09:18:25 +10:30
2seaq
1da0c1fbe2 common: Chainparams for testnet4 support
The BIP70 name given as testnet4 to correspond with chaininfo value.
2025-01-13 15:45:19 -08:00
Rusty Russell
73415d35c9 common: don't send trace messages by default, don't ratelimit at all.
We ratelimited DEBUG messages, but that can be annoying and cause us to miss things.
We demoted the worst offenders in the last release, to TRACE level.

Now, only log trace if it's wanted, and never suppress DEBUG.

Changelog-Changed: Logging: we no longer suppress DEBUG messages from subdaemons.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/7917
2024-12-16 09:48:51 +10:30
Lakshya Singh
7703cf7fec splice: alpine build warning causing failure
header sys/errno.h gets re-directed to errno.h leading to warning and
then failure so instead directly referencing the header

Changelog-None: change header

Signed-off-by: Lakshya Singh <lakshay.singh1108@gmail.com>
2024-12-16 09:36:17 +10:30
Rusty Russell
b520543867 gossipd: log at trace, not debug for regular messages.
See: https://github.com/ElementsProject/lightning/issues/7899

A node with 23 connections gets far too many debug messages.

Changelog-Fixed: `gossipd` now does logging at trace, not debug level.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-12-05 11:43:50 +10:30
Rusty Russell
b8e5b122d2 decode: don't fail to decode just because a bolt12 invoice has expired.
In fact, there are several places where we try to decode old invoices,
and they should all work.  The only place we should enforce expiration is
when we're going to pay.

This also revealed that xpay wasn't checking bolt11 expiries!

Reported-by: hMsats
Fixes: https://github.com/ElementsProject/lightning/issues/7869
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: JSON-RPC: `decode` refused to decode expired bolt12 invoices.
2024-11-30 13:17:55 +01:00
Rusty Russell
22a481fbaa common: routine to make wireaddr_internal from wireaddr.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-25 15:39:13 +10:30
Christian Decker
c596550de1 common: Make trace debugging output configurate at compile time
Just added a couple of compile-time guards and sprinkled the invariant
checking in a couple of places (disabled if compile time guard is
unset).
2024-11-24 10:24:31 +10:30
Christian Decker
557142627f common: Fix a potential cycle in the trace structure
It turns out that under some circumstances we end up clearing the
pointee of `current` but not the pointer. Thus when we select the next
slot we can end up reusing the same slot, making it its own parent.

We forcefull break these cycles by enforcing that `current` should
never be returned and be set as its own parent.

Changelog-None
2024-11-24 10:24:31 +10:30
Christian Decker
a6c81d4174 common: Add a tree checker for trace spans
Trace spans form a tree, but we don't actually check that the
structure doesn't break. Breakage can for example come if we use the
same key accidentally, making a new span its own ancestor.
2024-11-24 10:24:31 +10:30
Christian Decker
2e59ab8f15 common: Remove the recursive parent resolution in traces
We have the space in memory set aside anyway, so let's just copy the
`trace_id` into the span itself, rather than resolving the `root` at
time of emission.
2024-11-24 10:24:31 +10:30
Christian Decker
4f3ea8c048 common: Add some debuggig capabilities to the trace subsystem
After adding the DB query instrumentation we ran into a couple of
issues, with spans not being resumed correctly, and it was rather hard
to identify the problem. This adds debug statements so we can trace
the tracing (traception if you will).

Changelog-None
2024-11-24 10:24:31 +10:30
Rusty Russell
69c252e06f gossmap: implement gossmap_random_node(), use it in gossipd.
It's easy for gossmap, since it has access to the htable.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 15:21:45 +10:30
Rusty Russell
9295b4f77e common/test: fix -O3 compile error with gcc-12 (Ubuntu 12.3.0-17ubuntu1) 12.3.0
```
common/test/run-splice_script.c: In function ‘main’:
common/test/run-splice_script.c:349:17: error: ‘%.*s’ directive argument is null [-Werror=format-overflow=]
  349 |         printf("%.*s\n", (int)len, str);
      |                 ^~~~
cc1: all warnings being treated as errors
make: *** [Makefile:297: common/test/run-splice_script.o] Error 1
make: *** Waiting for unfinished jobs....
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2024-11-22 15:21:45 +10:30