Commit Graph

17236 Commits

Author SHA1 Message Date
Sangbida Chaudhuri
4c7e2d449d hsmd: fix HSM sent an unknown message type error
When we enter the wrong passphrase hsmd crashes like this with an unknown message type:

lightning_hsmd: Failed to load hsm_secret: Wrong passphrase (version v25.12rc1-7-g7713a42-modded)
0x102ba44bf ???
        send_backtrace+0x4f:0
0x102b0900f status_failed
        common/status.c:207
0x102af1a37 hsmd_send_init_reply_failure
        hsmd/hsmd.c:301
0x102af1497 load_hsm
        hsmd/hsmd.c:446
0x102af1497 init_hsm
        hsmd/hsmd.c:548
0x102b29e63 next_plan
        ccan/ccan/io/io.c:60
0x102b29e63 do_plan
        ccan/ccan/io/io.c:422
0x102b29d8b io_ready
        ccan/ccan/io/io.c:439
0x102b2b4bf io_loop
        ccan/ccan/io/poll.c:470
0x102af0a83 main
        hsmd/hsmd.c:886
lightningd: HSM sent unknown message type

This change swaps write_all() to wire_synce_write() because write_all() is missing the wire protocol length prefix. We also don't send a stack trace anymore if the user has entered the wrong passphrase and exit cleanly.
2025-11-27 14:06:17 +10:30
ShahanaFarooqui
192fc6ae60 doc: Update documentation for reproducible Fedora binaries
Changelog-None: Already added details in PR #8692.
2025-11-27 14:04:48 +10:30
ShahanaFarooqui
e04153f1df tools: Sort Fedora tar by name
And do not replace Fedora shasums from verification because Fedora binaries are deterministic now.
2025-11-27 14:04:48 +10:30
ShahanaFarooqui
4a67100c34 ci: Add skip_validation option to test the release on non-tagged commit
Changelog-None: Improved release action CI testing
2025-11-27 14:04:48 +10:30
Madeline Paech
7713a427b6 change log for release candidate 1 to include 8690 2025-11-25 10:24:46 +10:30
Rusty Russell
9627bf9ba1 CI: don't run configure on *host* for release.
It breaks, but more importantly we don't need to install lowdown any more,
since the check in build-release.sh has been removed.

```
Run sudo apt-get install -y lowdown
Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
  lowdown
0 upgraded, 1 newly installed, 0 to remove and 21 not upgraded.
Need to get 129 kB of archives.
After this operation, 314 kB of additional disk space will be used.
Get:1 file:/etc/apt/apt-mirrors.txt Mirrorlist [144 B]
Get:2 http://azure.archive.ubuntu.com/ubuntu noble/universe amd64 lowdown amd64 1.1.0-1 [129 kB]
Fetched 129 kB in 0s (2971 kB/s)
Selecting previously unselected package lowdown.
(Reading database ...
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 216225 files and directories currently installed.)
Preparing to unpack .../lowdown_1.1.0-1_amd64.deb ...
Unpacking lowdown (1.1.0-1) ...
Setting up lowdown (1.1.0-1) ...
Processing triggers for man-db (2.12.0-4build2) ...
Not building database; man-db/auto-update is not 'true'.

Running kernel seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.
checking for getpagesize() in <unistd.h>... yes
checking for isblank() in <ctype.h>... yes
checking for little endian... yes
checking for memmem in <string.h>... yes
checking for memrchr in <string.h>... yes
checking for mmap() declaration... yes
checking for /proc/self/maps exists... yes
checking for qsort_r cmp takes trailing arg... yes
checking for __attribute__((section)) and __start/__stop... yes
checking for stack grows upwards... no
checking for statement expression support... yes
checking for <sys/filio.h>... no
checking for <sys/termios.h>... yes
checking for <sys/unistd.h>... yes
checking for __typeof__ support... yes
checking for unaligned access to int... yes
checking for utime() declaration... yes
checking for __attribute__((warn_unused_result))... yes
checking for #pragma omp and -fopenmp support... yes
checking for <valgrind/memcheck.h>... no
checking for working <ucontext.h... yes
checking for passing pointers via makecontext()... yes
checking for __builtin_cpu_supports()... yes
checking for closefrom() offered by system... yes
checking for F_CLOSEM defined for fctnl.... no
checking for close_range syscall available as __NR_close_range.... yes
checking for F_MAXFD defined for fcntl.... no
checking for zlib support... yes
checking for libsodium with IETF chacha20 variants... no
checking for sqlite3... yes
checking for postgres... yes
checking for User Statically-Defined Tracing (USDT)... no
checking for compiler is GCC... yes
checking for GCC version is 7 or above... yes
Writing variables to config.vars.2200... yes
Writing header to ccan/config.h.2200... yes
checking for python3-mako... not found
checking for lowdown... found
checking for sha256sum... found
checking for jq... found
Setting PREFIX... /usr/local
Setting CC... cc
Setting CONFIGURATOR_CC... cc
Setting CWARNFLAGS... -Wall -Wundef -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wold-style-definition -Werror -Wno-maybe-uninitialized -Wshadow=local
Setting CDEBUGFLAGS... -std=gnu11 -g -fstack-protector-strong
Setting COPTFLAGS... -Og
CSANFLAGS not found
FUZZFLAGS not found
FUZZER_LIB not found
LLVM_LDFLAGS not found
SQLITE3_CFLAGS not found
Setting SQLITE3_LDLIBS... -lsqlite3
Setting POSTGRES_INCLUDE... -I/usr/include/postgresql
Setting POSTGRES_LDLIBS... -L/usr/lib/x86_64-linux-gnu -lpq
SODIUM_CFLAGS not found
SODIUM_LDLIBS not found
Setting VALGRIND... 0
Setting DEBUGBUILD... 0
Setting COMPAT... 1
Setting PYTEST... python3 -m pytest
Setting STATIC... 0
Setting CLANG_COVERAGE... 0
Setting ASAN... 0
Setting UBSAN... 0
Setting TEST_NETWORK... regtest
Setting HAVE_PYTHON3_MAKO... 0
Setting SHA256SUM... sha256sum
Setting FUZZING... 0
Setting RUST... 1
Setting PYTHON... python3
Setting SED... sed
*** We need a libsodium >= 1.0.4 (released 2015-06-11).
Error: Process completed with exit code 1.
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-24 16:55:46 +10:30
wqxoxo
b906507968 Fix BOLT11 annotation loss after sendonion failure
Fixes #6978 where bolt11 annotations were lost when sendonion failed early and payment was retried.

When sendonion RPC fails before saving payment to database, invstring_used flag would remain true, causing retry attempts to omit bolt11 parameter. Successful retries would then save to DB without bolt11 annotation.

Move invstring_used flag setting from payment_createonion_success to payment_sendonion_success. This ensures the flag is only set after sendonion actually succeeds. The bolt11 will be sent with every sendonion attempt until the first successful one, accepting the minor redundancy for cleaner state management.
Changelog-Fixed: Plugins: `listpays` can be missing the bolt11 information in some cases where `pay` is used.
2025-11-24 14:32:24 +10:30
Rusty Russell
ea0b8040c2 doc: include delnetworkevent in generated documentation, and grpc.
Also added missing "added" annotation.  This meant that I had to manually
change contrib/msggen/msggen/patch.py to insert that added notation where it
was missing from .msggen.json.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-None: introduced this release.
2025-11-24 14:31:02 +10:30
daywalker90
b4ef5d9a8e msggen: fix primitive serialization for special names
Changelog-None
2025-11-24 14:30:29 +10:30
Madeline Paech
5166fd55bb release candidate PR for 25.12 with Shahana's Makefile update
Changelog-None
2025-11-24 02:50:41 +00:00
ShahanaFarooqui
147ffecc18 make: Remove printing the version from Makefile
Changelog-None: Fixes error from `tools/check-release.sh`
2025-11-24 13:15:52 +10:30
Rusty Russell
10b10eb981 CHANGELOG.md: fix header format for rc1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-21 15:00:30 +10:30
Madeline Paech
f16b198cdc change log for 25.12rc1 2025-11-21 14:32:47 +10:30
daywalker90
d125b3c720 msggen: add missing methods from v25.12
Changelog-None
2025-11-21 13:51:28 +10:30
daywalker90
ab73388902 msggen: add missing methods from v25.09 2025-11-21 13:51:28 +10:30
Peter Neuroth
719fb2ce52 plugin: change method name of lsps-jitchannel
The original method name was lsps-lsps2-invoice but I somehow messed it
up and renamed during a rebase.

Changelog-Changed: lsps-jitchannel is now lsps-lsps2-invoice

Signed-off-by: Peter Neuroth <pet.v.ne@gmail.com>
2025-11-21 13:48:29 +10:30
Rusty Russell
87324103de lightningd: print last method we called if we abort processing loop.
We are seeing this in the CI logs, eg tests/test_connection.py::test_reconnect_sender_add1:

   lightningd-1 2025-11-17T05:48:00.665Z DEBUG   jsonrpc#84: Pausing parsing after 1 requests

followed by:

   lightningd-1 2025-11-17T05:48:02.068Z **BROKEN** 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: wake delay for WIRE_CHANNEL_REESTABLISH: 8512msec

So, what is consuming lightningd for 8 or so seconds?

This message helped diagnose that the issue was dev-memleak: fixed in a different branch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
47ab467bc1 db: don't actually create an explicit transaction for read commands.
Since we are the only writer, we don't need one.

Name (time in s)                               Min     Max    Mean  StdDev  Median
sqlite: test_spam_listcommands  (before)    2.1193  2.4524  2.2343  0.1341  2.2229
sqlite: test_spam_listcommands  (after)     2.0140  2.2349  2.1001  0.0893  2.0644
Postgres: test_spam_listcommands (before)   6.5572  6.8440  6.7067  0.1032  6.6967
Postgres: test_spam_listcommands (after)    4.4237  5.0024  4.6495  0.2278  4.6717

A nice 31% speedup!

Changelog-Changed: Postgres: significant speedup on read-only operations (e.g. 30% on empty SELECTs)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
b4f17508b1 libplugin: add spamlistcommand
This hammers lightingd with `listinvoices` commands.

	$ VALGRIND=0 TEST_DB_PROVIDER=postgres eatmydata uv run pytest -v tests/benchmark.py::test_spam_listcommands

sqlite3:

   test_spam_listcommands     2.1193  2.4524  2.2343  0.1341  2.2229  0.1709       1;0  0.4476       5           1

PostgreSQL:

   test_spam_listcommands     6.5572  6.8440  6.7067  0.1032  6.6967  0.1063       2;0  0.1491       5           1

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
ce425b465c pytest: move test_coinmoves.py::test_generate_coinmoves and test_plugin.py::test_spam_commands to benchmark.py
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
5fd95b3d22 tests/benchmark.py: tune nodes a little.
Drop the log level, don't do extra memory checks, don't dump I/O.  These are not
realistic for testing non-development nodes.

Here's the comparison, using:
	VALGRIND=0 eatmydata uv run pytest -v --benchmark-compare=0001_baseline tests/benchmark.py

Name (time in us)                                       Min                       Max                      Mean                  StdDev                    Median                     IQR            Outliers         OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_invoice (NOW)                                 414.9430 (1.0)         39,115.6150 (12.35)          834.7296 (1.0)        2,274.1198 (6.59)           611.7745 (1.0)          162.0230 (1.0)          1;33  1,197.9927 (1.0)         290           1
test_invoice (0001_baselin)                        951.9740 (2.29)         3,166.4061 (1.0)          1,366.7944 (1.64)         345.1460 (1.0)          1,328.6110 (2.17)         339.3517 (2.09)        48;15    731.6389 (0.61)        221           1

test_pay (NOW)                                  36,339.2329 (87.58)       69,477.8530 (21.94)       51,719.9459 (61.96)      8,033.4262 (23.28)       52,639.5601 (86.04)      9,590.1425 (59.19)         6;0     19.3349 (0.02)         17           1
test_pay (0001_baselin)                         61,741.5591 (148.80)     108,801.6961 (34.36)       88,284.6752 (105.76)    15,875.4417 (46.00)       96,006.0760 (156.93)    27,500.9771 (169.74)        6;0     11.3270 (0.01)         13           1

test_single_payment (NOW)                       46,721.4010 (112.60)      66,027.6250 (20.85)       56,699.4597 (67.93)      5,829.7234 (16.89)       54,659.9385 (89.35)      9,810.9820 (60.55)         6;0     17.6369 (0.01)         14           1
test_single_payment (0001_baselin)              52,215.3670 (125.84)     109,608.0400 (34.62)       74,521.8032 (89.28)     16,175.6833 (46.87)       72,881.5976 (119.13)    17,668.8581 (109.05)        4;1     13.4189 (0.01)         12           1

test_forward_payment (NOW)                     108,338.2401 (261.09)     115,570.7800 (36.50)      111,353.7021 (133.40)     2,483.2338 (7.19)       111,981.6790 (183.04)     3,360.6182 (20.74)         3;0      8.9804 (0.01)         11           1
test_forward_payment (0001_baselin)            108,917.7490 (262.49)     168,348.2911 (53.17)      140,321.5990 (168.10)    22,375.2216 (64.83)      143,746.4900 (234.97)    36,363.4459 (224.43)        3;0      7.1265 (0.01)          7           1

test_start (NOW)                               299,278.4000 (721.25)     330,340.2610 (104.33)     314,121.8292 (376.32)    11,385.4700 (32.99)      314,603.4899 (514.25)    13,876.4871 (85.65)         2;0      3.1835 (0.00)          5           1
test_start (0001_baselin)                      305,928.9111 (737.28)     575,270.0820 (181.68)     419,496.8460 (502.55)   138,248.1937 (400.55)     334,207.0500 (546.29)   254,339.0035 (>1000.0)       2;0      2.3838 (0.00)          5           1

test_long_forward_payment (NOW)              1,088,077.8680 (>1000.0)  1,131,035.0260 (357.20)   1,108,896.7970 (>1000.0)   20,494.1195 (59.38)    1,098,544.8329 (>1000.0)   36,904.4899 (227.77)        3;0      0.9018 (0.00)          5           1
test_long_forward_payment (0001_baselin)     1,282,326.5721 (>1000.0)  1,450,350.8301 (458.04)   1,369,618.5776 (>1000.0)   73,432.8716 (212.76)   1,380,547.3910 (>1000.0)  132,647.3573 (818.69)        2;0      0.7301 (0.00)          5           1

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
19f0b04a3e pyln-testing: don't assume we're doing debug logging for fundwallet and line_graph helpers.
We want to use log-level info for benchmarking, for example.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
f49818c08c Python: update pyproject.toml so we can run tests/benchmarks.py.
I had forgotten this file existed, but it needs tqdm and pytest-benchmark, so add those dev
requirements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
bfbee055ea db: don't start transactions unless we really need to.
We always start a transaction before processing, but there are cases where
we don't need to.  Switch to doing it on-demand.

This doesn't make a big difference for sqlite3, but it can for Postgres because
of the latency: 12% or so.  Every bit helps!

30 runs, min-max(mean+/-stddev):

	Postgres before:  8.842773-9.769030(9.19531+/-0.21)
	Postgres after: 8.007967-8.321856(8.14172+/-0.066)

	sqlite3 before: 7.486042-8.371831(8.15544+/-0.19)
	sqlite3 after: 7.973411-8.576135(8.3025+/-0.12)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
b03562b980 pytest: test for 1M JSONRPC calls which don't need transactions.
To measure the improvement (if any) if we don't actually create empty transactions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
6865fe380d pytest: increase test_generate_coinmoves to 5M entries.
This is slow, but will make sure we find out if we add latency spikes in future.

tests/test_coinmoves.py::test_generate_coinmoves (5,000,000, sqlite3):
	Time (from start to end of l2 node):	 223 seconds
	Latency min/median/max:			 0.0023 / 0.0033 / 0.113 seconds

tests/test_coinmoves.py::test_generate_coinmoves (5,000,000, Postgres):
	Time (from start to end of l2 node):	 470 seconds
	Worst latency:				 0.0024 / 0.0098 / 0.124 seconds

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: lightningd: multiple signficant speedups for large nodes, especially preventing "freezes" under exceptionally high load.
2025-11-20 16:30:50 +10:30
Rusty Russell
a877e285ef sql: limit how many chainmoves/channelmoves entries we ask for at once.
This avoids latency spikes when we ask lightningd to give us 2M entries.

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	 88 seconds (was 95)
	Worst latency:				 0.028 seconds **WAS 4.5**

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
873d4102c8 bookkeeper: restore limit on asking for all channelmoves at once.
Now we've found all the issues, the latency spike (4 seconds on my laptop)
for querying 2M elements remains.

Restore the limited sampling which we reverted, but make it 10,000 now.

This doesn't help our worst-case latency, because sql still asks for all 2M entries on
first access.  We address that next.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
98bd95cb2e lightningd: optimize find_cmd.
We have a reasonable number of commands now, and we *already* keep a
strmap for the usage strings.  So simply keep the usage and the command
in the map, and skip the array.

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	 95 seconds (was 102)
	Worst latency:				 4.5 seconds

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, Postgres):
	Time (from start to end of l2 node):	 231 seconds
	Worst latency:				 4.8 seconds

Note the values compare against 25.09.2 (Postgres):

	sqlite3:
	Time (from start to end of l2 node):	 403 seconds

	Postgres:
	Time (from start to end of l2 node):	 671 seconds

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
7910ee44ba sql: use wait RPC so we don't have to check listchannelmoves/listchainmoves each time.
tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	 102 seconds **WAS 126**
	Worst latency:				 4.5 seconds **WAS 5.1**

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
bb7db3926e sql: if we use dev-sqlfilename, don't bother syncing it to disk.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
b63034bd37 plugins/sql: use modern data style, not globals.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
94d582f24e lightningd: don't process more than 100 commands from a plugin at once.
Now that ccan/io rotates through callbacks, we can call io_always() to
yield.

We're now fast enough that this doesn't have any effect on this test,
bit it's still good to have.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
b51e2827cd lightningd: don't process more than 100 commands from a JSONRPC at once.
Now that ccan/io rotates through callbacks, we can call io_always() to yield.

Though it doesn't fire on our benchmark, it's a good thing to do.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
5fc9e5a4e3 ccan: update to get io_loop fairness.
This rotates through fds explicitly, to avoid unfairness.
This doesn't really make a difference until we start using it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
54d4bf117f common: optimize json parsing.
We would keep parsing if we were out of tokens, even if we had actually
finished one object!

These are comparison against the "xpay: use filtering on rpc_command
so we only get called on "pay"." not the disasterous previous one!

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	 126 seconds (was 135)
	Worst latency:				 5.1 seconds **WAS 12.1**

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
d2a6091149 common: increase jsonrpc_io buffer size temporarily to aggrevate perf issues.
A client can do this by sending a large request, so this allows us to see what
happens if they do that, even though 1MB (2MB buffer) is more than we need.

This drives our performance through the floor: see next patch which gets
us back on track.

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	 271 seconds **WAS 135**
	Worst latency:				 105 seconds **WAS 12.1**

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
aff1d6b97f commando, chanbackup: use custommsg hooks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
213cbba5bf lightningd: allow filtering on custommsg hook too.
Changelog-Added: Plugins: "filters" can be specified on the `custommsg` hook to limit what message types the hook will be called for.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
81f0d0540b pyln-client: support hook filters.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: pyln-client: optional filters can be given when hooks are registered (for supported hooks)
2025-11-20 16:30:50 +10:30
Rusty Russell
9961f6bf0e xpay: use filtering on rpc_command so we only get called on "pay".
tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	 135 seconds **WAS 227**
	Worst latency:				 12.1 seconds **WAS 62.4**

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
1d4a1cdd8a libplugin: don't wait for clean_tmpctx() to free requests as we process them.
xpay is relying on the destructor to send another request.  This means
that it doesn't actually submit the request until *next time* we wake.

This has been in xpay from the start, but it is not noticeable until
xpay stops subscribing to every command on the rpc_command hook.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
ebe5f2e68f libplugin: allow plugins to register optional filters for each hook they want.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
d9d82ac5bd lightningd: add support for filters on "rpc_command" hook.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Plugins: the `rpc_command` hook can now specify a "filter" on what commands it is interested in.
2025-11-20 16:30:50 +10:30
Rusty Russell
d76a9050ad lightningd: support "filters" in plugins manifest to restrict when hooks are called.
We're going to use this on the "rpc_command" hook, to allow xpay to specify that it
only wants to be called on "pay" commands.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
9e04d044a2 JSONRPC: use a bigger default buffer.
This potentially saves us some reads (not measurably though), at cost
of less fairness.  It's important to measure though, because a single
large request will increase buffer size for successive requests, so we
can see this pattern in real usage.

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	227 seconds (was 239)
	Worst latency:				62.4 seconds (was 56.9)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
8707b7312a lightningd: handle large numbers of command outputs gracefully.
Profiling shows us spending all our time in tal_arr_remove when dealing
with a giant number of output streams.  This applies both for RPC output
and plugin output.

Use linked list instead.

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	239 seconds **WAS 518**
	Worst latency:				56.9 seconds **WAS 353**

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
6006467824 pytest: increase test_generate_coinmoves to 2M entries.
Now we've rid ourselves of the worst offenders, we can make this a real
stress test.  We remove plugin io saving and low-level logging, to avoid
benchmarking testing artifacts.

Here are the results:

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3):
	Time (from start to end of l2 node):	518 seconds
	Worst latency:				353 seconds

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, Postgres):
	Time (from start to end of l2 node):	 417 seconds
	Worst latency:				 96.6 seconds

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
cbd52d49af common: avoid allocations for small numbers of traces.
If we only have 8 or fewer spans at once (as is the normal case), don't
do allocation, which might interfere with tracing.

This doesn't change our test_generate_coinmoves() benchmark.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30
Rusty Russell
7f55a8ea1a common: remove tracing exponential behaviour from large numbers of requests.
If we have USDT compiled in, scanning the array of spans becomes
prohibitive if we have really large numbers of requests.  In the
bookkeeper code, when catching up with 1.6M channel events, this
became clear in profiling.

Use a hash table instead.

Before:
tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3):
	Time (from start to end of l2 node):	269 seconds (vs 14 with HAVE_USDT=0)
	Worst latency:				4.0 seconds

After:
tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3):
	Time (from start to end of l2 node):	14 seconds
	Worst latency:				4.3 seconds

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2025-11-20 16:30:50 +10:30