We handed NULL as the logcb, resulting in a very uninformative crash:
```
2025-03-14T03:46:36.447Z INFO lightningd: Server started with public key 03d67f36c4f81789e2fe425028bacc96b199813eae426c517f589a45f1136c1fe5, alias Jubilee (color #dc42f4) and lightningd v25.02
topology: FATAL SIGNAL 11 (version v25.02)
0x560037f64aad send_backtrace
common/daemon.c:33
0x560037f64b49 crashdump
common/daemon.c:78
0x7f6c41ff351f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x0 ???
???:0
```
Changelog-Fixed: `topology` crash on invoice creation if a peer had a really high feerate.
Fixes: https://github.com/ElementsProject/lightning/issues/8156
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we included the spec for it, this is a good time to implement
it.
I also asked chatgpt to write some unit tests. I had to mangle them a
bit, but it probably saved me a few minutes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Unfortunately a spec typo means the data fields are missing (PR pending),
so we still patch those in.
The message "your_peer_storage" got renamed to "peer_storage_retrieval",
and the option "want_peer_backup_storage" was removed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: `experimental-peer-storage` now only advertizes feature 43, not 41.
We define a new subtype 'modern_scb_chan' and a new tlvtype 'scb_tlvs' which includes
all the relevant information to create a penalty transaction when the peer tries to cheat.
Key Changes:
- Rename the old format to 'legacy_scb_chan' and define a new type 'modern_scb_chan'
- Include TLVs to 'modern_scb_chan'
- Create a new msgtype 'static_chan_backup_with_tlvs'
- Modify 'struct channel' to include 'struct modern_scb_chan'
- Add these two types to 'varsize_types' in generate.py
The parameter max_htlc_value_in_flight_msat stablished by peers on
channel opening (BOLT02) can now be retrived from the
gossmods_from_listpeerchannels API.
Adapted the corresponding callback functions in renepay and askrene to
take into account that value as a constraint to the value we can send
through a channel.
Changelog-Add: fetch max_htlc_value_in_flight_msat from gossmods_listpeerchannels API
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Turns out we weren't wiring them through! And libplugin wasn't reading them anyway.
Changelog-Fixed: lightningd: tell plugins our bolt12 features (so our bolt12 invoices explicitly allow MPP).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported by Grubles on ARM64:
```
VALGRIND=1 valgrind -q --error-exitcode=7 --track-origins=yes --leak-check=full --show-reachable=yes --errors-for-leak-kinds=all common/test/run-tal_arr_randomize > /dev/null
==151138== Source and destination overlap in memcpy(0x4d69f08, 0x4d69f08, 8)
==151138== at 0x48CB68C: __GI_memcpy (vg_replace_strmem.c:1147)
==151138== by 0x41B50B: tal_arr_randomize_ (pseudorand.c:84)
==151138== by 0x41BB07: main (run-tal_arr_randomize.c:166)
==151138==
make: *** [Makefile:750: unittest/common/test/run-tal_arr_randomize] Error 7
```
It is correct: you can't overlap src and dst in memcpy. It *probably* works in
this case, but it's undefined!
Fixes: https://github.com/ElementsProject/lightning/issues/7030
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
After analyzing various weird cases where we ended up with duplicate
gossip_store entries, it could be explained by us not fully processing
the gossip store.
It's not clear that my assumptions that we would always see our own writes
are true: technically this may require an fsync(). So we now add the
check, and do an fsync and try again.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossipd: more sanity checks that we are correctly updating the gossip_store file.
Instead of making a copy.
To measure the performance impact, I timed
tests/test_askrene.py::test_real_biases on my laptop.
No checksum check: 194.52s
Copying for checksum check: 202.81s
Zero-copy checksum check: 194.40s
But these numbers proved noisy. Still, doesn't hurt.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assume if it's incorrect, we simply need to wait. If this proves incorrect,
we will see a stream of BROKEN log messages.
To measure the performance impact, I timed
tests/test_askrene.py::test_real_biases on my laptop.
Before: 194.52s
After: 202.81s
So it's marginal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
While this shouldn't happen, it does (pending other fixes), and we stop reading the
gossip store until next time. The result is partial gossip, demonstrated beautifully
by NicolasDorier's report:
```
lightning_gossipd: gossmap: redundant channel_announce for 864063x1306x1, offsets 1272259 and 1784859!"
```
Gossipd stalld there and don't make more progress. So gossipd itself
doesn't see the entire gossip_store.
Then things get really batshit:
```
2025-02-04T05:53:28.582Z DEBUG gossipd: Store compact time: 1429910 msec
```
This took 1429 seconds to process. Why?
Because it hasn't been processing the gossip store fully, gossipd kept adding "new" records to the end:
```
2025-02-04T05:53:28.583Z DEBUG gossipd: gossip_store: Read 62716143/1739952/5158256/0 cannounce/cupdate/nannounce/delete from store in 31634458462 bytes, now 31634458440 bytes (populated=true)
```
It has 31GB of gossip in there! No wonder it took so long...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/8035
Changelog-Fixed: gossipd: corruption in the gossip_store could cause ever-longer startup times and no gossip updates.
Default goes to stderr for LOG_UNUSUAL and higher.
We have to whitelist more cases in map_catchup so we don't spam the logs
with perfectly-expected (but ignored) messages though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We only use it in one place, and that was simply to share an fd between
gossipd writing and gossipd reading, which may be causing our zfs problem
anyway.
In fact, it fixes a race if we don't have HAVE_PWRITEV.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a report of this happening under ZFS. We cannot do much if
this really is a problem where we can't read back what we write, but
this avoids the immediate crash.
Fixes: https://github.com/ElementsProject/lightning/issues/7971
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: gossmap: occasional crash (at least on ZFS) reading gossip_store.
I'd been resisting this, but some tests really do want "hang up after
*receiving* this", and it's more reliable than getting the peer to
"hang up afer *sending* this" which seems not always work.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The updated API requires typed htables to explicitly state whether they
allow duplicates: for most cases we don't, but we've had issues in the
past.
This is a big patch, but mainly mechanical.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I wanted to make sure we didn't have a bug in our htable routine,
so I wrote this test. We don't, but leave it in.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We ratelimited DEBUG messages, but that can be annoying and cause us to miss things.
We demoted the worst offenders in the last release, to TRACE level.
Now, only log trace if it's wanted, and never suppress DEBUG.
Changelog-Changed: Logging: we no longer suppress DEBUG messages from subdaemons.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: https://github.com/ElementsProject/lightning/issues/7917
header sys/errno.h gets re-directed to errno.h leading to warning and
then failure so instead directly referencing the header
Changelog-None: change header
Signed-off-by: Lakshya Singh <lakshay.singh1108@gmail.com>
In fact, there are several places where we try to decode old invoices,
and they should all work. The only place we should enforce expiration is
when we're going to pay.
This also revealed that xpay wasn't checking bolt11 expiries!
Reported-by: hMsats
Fixes: https://github.com/ElementsProject/lightning/issues/7869
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: JSON-RPC: `decode` refused to decode expired bolt12 invoices.
It turns out that under some circumstances we end up clearing the
pointee of `current` but not the pointer. Thus when we select the next
slot we can end up reusing the same slot, making it its own parent.
We forcefull break these cycles by enforcing that `current` should
never be returned and be set as its own parent.
Changelog-None
Trace spans form a tree, but we don't actually check that the
structure doesn't break. Breakage can for example come if we use the
same key accidentally, making a new span its own ancestor.
We have the space in memory set aside anyway, so let's just copy the
`trace_id` into the span itself, rather than resolving the `root` at
time of emission.
After adding the DB query instrumentation we ran into a couple of
issues, with spans not being resumed correctly, and it was rather hard
to identify the problem. This adds debug statements so we can trace
the tracing (traception if you will).
Changelog-None
It is possible for prevtx to be larger than max packet size, so for shared outputs (currently only the funding tx) we add support for sending the `txid` only across the wire and filling in the prevtx locally.
Changelog-None