Commits · bb-10.5-monty · nexedi / MariaDB

01 Oct, 2024 3 commits

MDEV-34533 asan error about stack overflow when writing record in Aria · 5f3eb8f0

Monty authored Oct 01, 2024

The problem was that when using clang + asan, we do not get a correct value
for the thread stack as some local variables are not allocated at the
normal stack.

It looks like that for example clang 18.1.3, when compiling with
-O2 -fsanitize=addressan it puts local variables and things allocated by
alloca() in other areas than on the stack.

The following code shows the issue

Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection
    (connect=0x5080000027b8,
    put_in_cache=<optimized out>) at sql/sql_connect.cc:1399

THD *thd;
1399      thd->thread_stack= (char*) &thd;
(gdb) p &thd
(THD **) 0x7fffedee7060
(gdb) p $sp
(void *) 0x7fffef4e7bc0

The address of thd is 24M away from the stack pointer

(gdb) info reg
...
rsp            0x7fffef4e7bc0      0x7fffef4e7bc0
...
r13            0x7fffedee7060      140737185214560

r13 is pointing to the address of the thd. Probably some kind of
"local stack" used by the sanitizer

I have verified this with gdb on a recursive call that calls alloca()
in a loop. In this case all objects was stored in a local heap,
not on the stack.

To solve this issue in a portable way, I have added two functions:

my_get_stack_pointer() returns the address of the current stack pointer.
The code is using asm instructions for intel 32/64 bit, powerpc,
arm 32/64 bit and sparc 32/64 bit.
Supported compilers are gcc and clang and MSCV.
For MSCV 64 bit we are using _AddressOfReturnAddress()

As a fallback for other compilers/arch we use the address of a local
variable.

my_get_stack_bounds() that will return the address of the base stack
and stack size using pthread_attr_getstack() or NtCurrentTed() with
fallback to using the address of a local variable and user provided
stack size.

Server changes are:

- Moving setting of thread_stack to THD::store_globals() using
  my_get_stack_bounds().
- Removing setting of thd->thread_stack, except in functions that
  allocates a lot on the stack before calling store_globals().  When
  using estimates for stack start, we reduce stack_size with
  MY_STACK_SAFE_MARGIN (8192) to take into account the stack used
  before calling store_globals().

I also added a unittest, stack_allocation-t, to verify the new code.

5f3eb8f0

MDEV-29537 Creation of view with UNION and SELECT ... FOR UPDATE in definition is failed with error · 8d810e94
Oleksandr Byelkin authored Sep 20, 2024
```
lock_type is writen in the last SELECT of the unit even if it parsed last,
so it should be printed last from the last select of the unit.
```
8d810e94

MDEV-34392 Inplace algorithm violates the foreign key constraint · cc810e64

Thirunarayanan Balathandayuthapani authored Sep 30, 2024

Don't allow the referencing key column from NULL TO NOT NULL
when

 1) Foreign key constraint type is ON UPDATE SET NULL
 2) Foreign key constraint type is ON DELETE SET NULL
 3) Foreign key constraint type is UPDATE CASCADE and referenced
 column declared as NULL

Don't allow the referenced key column from NOT NULL to NULL
when foreign key constraint type is UPDATE CASCADE
and referencing key columns doesn't allow NULL values

get_foreign_key_info(): InnoDB sends the information about
nullability of the foreign key fields and referenced key fields.

fk_check_column_changes(): Enforce the above rules for COPY
algorithm

innobase_check_foreign_drop_col(): Checks whether the dropped
column exists in existing foreign key relation

innobase_check_foreign_low() : Enforce the above rules for
INPLACE algorithm

dict_foreign_t::check_fk_constraint_valid(): This is used
by CREATE TABLE statement to check nullability for foreign
key relation.

cc810e64

30 Sep, 2024 7 commits

sql/handler: referenced_by_foreign_key() returns bool · 45298b73

Max Kellermann authored Sep 20, 2024

The method was declared to return an unsigned integer, but it is
really a boolean (and used as such by all callers).

A secondary change is the addition of "const" and "noexcept" to this
method.

In ha_mroonga.cpp, I also added "inline" to the two helper methods of
referenced_by_foreign_key().  This allows the compiler to flatten the
method.

45298b73

MDEV-33373 part 2: Unexpected ER_FILE_NOT_FOUND upon reading from logging... · b88f1267

Sergei Golubchik authored Sep 19, 2024

MDEV-33373 part 2: Unexpected ER_FILE_NOT_FOUND upon reading from logging table after crash recovery

CSV engine shoud set my_errno if use it.

b88f1267

MDEV-33373 part 1: Unexpected ER_FILE_NOT_FOUND upon reading from logging... · 20f57a85

Oleksandr Byelkin authored Sep 19, 2024

MDEV-33373 part 1: Unexpected ER_FILE_NOT_FOUND upon reading from logging table after crash recovery

We have found that my_errno can be "passed" to the next commad in some cases.

It is practically impossible to check/fix all cases of my_errno in the server,
plugins and engines so we will reset it as we reset other errors.

The test case will be fixed by CSV engine fix so will be added with it
(see part2).

20f57a85

MDEV-34589 Do not execute before queries in spider_db_mbase::rollback() · 282b92f0
Yuchen Pei authored Jul 18, 2024
```
Rollback is not supposed to fail. This prevents false failures in
spider rollback.
```
282b92f0
MDEV-34636 Spider: reset wide_handler->trx in two occasions · 42735c55
Yuchen Pei authored Aug 02, 2024
```
ha_spider::update_create_info()
ha_spider::append_lock_tables_list()
```
42735c55
MDEV-34636 Remove implementation of ha-spider::extra() with MERGE flags · f43ea935
Yuchen Pei authored Jul 25, 2024

f43ea935
MDEV-34828 Remove some obsolete cmake code related to the removed spider handlersocket support · 69874ee9
Yuchen Pei authored Sep 30, 2024
```
A fixup of MDEV-26858
```
69874ee9

29 Sep, 2024 1 commit
- MDEV-30307 addendum: support for compilation in release mode · 95d285fb
  Julius Goryavsky authored Sep 30, 2024
  
  95d285fb
27 Sep, 2024 3 commits

MDEV-30307 KILL command inside a transaction causes problem for galera replication · cf0c3ec2

sjaakola authored Dec 28, 2022

Added new test scenario in galera.galera_bf_kill
test to make the issue surface. The tetst scenario has
a multi statement transaction containing a KILL command.
When the KILL is submitted, another transaction is
replicated, which causes BF abort for the KILL command
processing. Handling BF abort rollback while executing
KILL command causes node hanging, in this scenario.

sql_kill() and sql_kill_user() functions have now fix,
to perform implicit commit before starting the KILL command
execution. BEcause of the implicit commit, the KILL execution
will not happen inside transaction context anymore.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

cf0c3ec2

MDEV-34808 Update HeidiSQL to v12.8 · 78e640ea
Vladislav Vaintroub authored Sep 27, 2024

78e640ea
Windows : support Wix toolset 3.14 · 2c0b7ff2
Vladislav Vaintroub authored Sep 27, 2024
```
Chocolatey package manager installs this one.
```
2c0b7ff2

26 Sep, 2024 1 commit

ssl_cipher parameter cannot configure TLSv1.3 and TLSv1.2 ciphers at the same time · be164fc4

Tony Chen authored Sep 04, 2024

SSL_CTX_set_ciphersuites() sets the TLSv1.3 cipher suites.

SSL_CTX_set_cipher_list() sets the ciphers for TLSv1.2 and below.

The current TLS configuration logic will not perform SSL_CTX_set_cipher_list()
to configure TLSv1.2 ciphers if the call to SSL_CTX_set_ciphersuites() was
successful. The call to SSL_CTX_set_ciphersuites() is successful if any TLSv1.3
cipher suite is passed into `--ssl-cipher`.

This is a potential security vulnerability because users trying to restrict
specific secure ciphers for TLSv1.3 and TLSv1.2, would unknowingly still have
the database support insecure TLSv1.2 ciphers.

For example:
If setting `--ssl_cipher=TLS_AES_128_GCM_SHA256:ECDHE-RSA-AES128-GCM-SHA256`,
the database would still support all possible TLSv1.2 ciphers rather than only
ECDHE-RSA-AES128-GCM-SHA256.

The solution is to execute both SSL_CTX_set_ciphersuites() and
SSL_CTX_set_cipher_list() even if the first call succeeds.

This allows the configuration of exactly which TLSv1.3 and TLSv1.2 ciphers to
support.

Note that there is 1 behavior change with this. When specifying only TLSv1.3
ciphers to `--ssl-cipher`, the database will not support any TLSv1.2 cipher.
However, this does not impose a security risk and considering TLSv1.3 is the
modern protocol, this behavior should be fine.

All TLSv1.3 ciphers are still supported if only TLSv1.2 ciphers are specified
through `--ssl-cipher`.

All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.

be164fc4

25 Sep, 2024 8 commits

MDEV-34822 pre-fix: Make wsrep_ready flag read lock-free · 9f61aa4f

Denis Protivensky authored Aug 16, 2024

It's read for every command execution, and during slave replication
for every applied event.

It's also planned to be used during write set applying, so it means
mostly every server thread is going to compete for the mutex covering
this variable, especially considering how rarely it changes.
Converting wsrep_ready to atomic relaxes the things.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

9f61aa4f

MDEV-32996 : galera.galera_var_ignore_apply_errors -> [ERROR] WSREP: Inconsistency detected · 024e9512

Jan Lindström authored Sep 03, 2024

Add wait_until_ready waits after wsrep_on is set on again to
make sure that node is ready for next step before continuing.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

024e9512

MDEV-33035 : Galera test case MDEV-16509 unstable · 0ce5603b

Jan Lindström authored Sep 03, 2024

Stabilize test by reseting DEBUG_SYNC and add wait_condition
for expected table contents.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

0ce5603b

MDEV-34976 Server crash report broken if Galera is not loaded · b2429e20

Teemu Ollakka authored Sep 20, 2024

The crash report terminates prematurely when Galera library was
not loaded.

As a fix, check whether the provider is loaded before shutting down
Galera connections.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

b2429e20

MDEV-31636 Memory leak in Sys_var_gtid_binlog_state::do_check() · f7c5182b

Dave Gosselin authored Sep 18, 2024

Move memory allocations performed during Sys_var_gtid_binlog_state::do_check
to Sys_var_gtid_binlog_state::global_update where they will be freed before
the latter method returns.

f7c5182b

Debugging: add dbug_print_join_prefix() to use in best_access_path · d67c8894

Sergei Petrunia authored Sep 22, 2024

A call to

  dbug_print_join_prefix(join_positions, idx, s)

returns a const char* ponter to string with current join prefix,
including the table being added to it.

d67c8894

MDEV-34996 Buildbot MSAN options should be in server · 42eb64e6

Daniel Black authored Sep 24, 2024

All the options that where in buildbot, should
be in the server making it accessible to all
without any special invocation.

If WITH_MSAN=ON, we want to make sure that the
compiler options are supported and it will result
in an error if not supported.

We make the -WITH_MSAN=ON append -stdlib=libc++
to the CXX_FLAGS if supported.

With SECURITY_HARDENING options the bootstrap
currently crashes, so for now, we disable SECRUITY_HARDENING
if there is MSAN enable.

Option WITH_DBUG_TRACE has no effect in MSAN builds.

42eb64e6

MDEV-27944: View-protocol fails if database was changed · ad5b9c20

Lena Startseva authored Sep 23, 2024

This is a limitation of the view protocol.
Tests were fixed with workaround (via disable/enable service connection)

ad5b9c20

24 Sep, 2024 3 commits

MDEV-34994: sql/mysqld: stop accept() loop after the first EAGAIN · 53f5ee79

Max Kellermann authored Sep 24, 2024

Each time a listener socket becomes ready, MariaDB calls accept() ten
times (MAX_ACCEPT_RETRY), even if all but the first one return EAGAIN
because there are no more connections.  This causes unnecessary CPU
usage - on our server, the CPU load of that thread, which does nothing
but accept(), saturates one CPU core by ~45%.  The loop should stop
after the first EAGAIN.

Perf report:

    11.01%  mariadbd  libc.so.6          [.] accept4
     6.42%  mariadbd  [kernel.kallsyms]  [k] finish_task_switch.isra.0
     5.50%  mariadbd  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     5.50%  mariadbd  [kernel.kallsyms]  [k] syscall_enter_from_user_mode
     4.59%  mariadbd  [kernel.kallsyms]  [k] __fget_light
     3.67%  mariadbd  [kernel.kallsyms]  [k] kmem_cache_alloc
     2.75%  mariadbd  [kernel.kallsyms]  [k] fput
     2.75%  mariadbd  [kernel.kallsyms]  [k] mod_objcg_state
     1.83%  mariadbd  [kernel.kallsyms]  [k] __inode_wait_for_writeback
     1.83%  mariadbd  [kernel.kallsyms]  [k] __sys_accept4
     1.83%  mariadbd  [kernel.kallsyms]  [k] _raw_spin_unlock_irq
     1.83%  mariadbd  [kernel.kallsyms]  [k] alloc_inode
     1.83%  mariadbd  [kernel.kallsyms]  [k] call_rcu

53f5ee79

reformat galera sst error messages · 8fd1b060

Sergei Golubchik authored Sep 22, 2024

put the command line at the end. so that when a very long command line
is truncated, it doesn't take the actual error message with it

8fd1b060

galera_3nodes.MDEV-29171 fails · dd1cad7e

Sergei Golubchik authored Sep 22, 2024

set transferfmt in .cnf file like other galera tests do.
otherwise it defaults to socat when mtr detected that only nc is available

dd1cad7e

23 Sep, 2024 3 commits
- MDEV-33990: SHOW STATUS counts ER_CON_COUNT_ERROR as Connection_errors_internal · c9f54e20
  Oleksandr Byelkin authored Sep 19, 2024
```
Bring info about cause of closing connection in the place where we increment
statistics to do it correctly.
```
  c9f54e20
- clarify --thread-pool-mode usage · bbc62b1b
  Sergei Golubchik authored Sep 06, 2024
  
  bbc62b1b
- restore --clent-rr after 7d86751d · 99837b6d
  Sergei Golubchik authored Sep 18, 2024
  
  99837b6d
20 Sep, 2024 4 commits

MDEV-32891 Assertion `value <= ((ulonglong) 0xFFFFFFFFL) * 10000ULL' failed in... · 681609d8
Alexander Barkov authored Sep 20, 2024
```
MDEV-32891 Assertion `value <= ((ulonglong) 0xFFFFFFFFL) * 10000ULL' failed in str_to_DDhhmmssff_internal

Fixing the wrong assert.
```
681609d8

MDEV-31302 Assertion `mon > 0 && mon < 13' failed in my_time_t... · 607fc153

Alexander Barkov authored Sep 20, 2024

MDEV-31302 Assertion `mon > 0 && mon < 13' failed in my_time_t sec_since_epoch(int, int, int, int, int, int)

The code erroneously called sec_since_epoch() for dates with zeros,
e.g. '2024-00-01'.
Fixi: adding a test that the date does not have zeros before
calling TIME_to_native().

607fc153

MDEV-31221 UBSAN runtime error: negation of -9223372036854775808 cannot be... · 9ac8172a

Alexander Barkov authored Sep 20, 2024

MDEV-31221 UBSAN runtime error: negation of -9223372036854775808 cannot be represented in type 'long long int' in my_strtoll10_utf32

The code in my_strtoll10_mb2 and my_strtoll10_utf32
could hit undefinite behavior by negation of LONGLONG_MIN.
Fixing to avoid this.

Also, fixing my_strtoll10() in the same style.
The previous reduction produced a redundant warning on
CAST(_latin1'-9223372036854775808' AS SIGNED)

9ac8172a

MDEV-28386 UBSAN: runtime error: negation of -X cannot be represented in type... · 841dc07e

Alexander Barkov authored Sep 20, 2024

MDEV-28386 UBSAN: runtime error: negation of -X cannot be represented in type 'long long int'; cast to an unsigned type to negate this value to itself in my_strntoull_8bit on SELECT ... OCT

The code in my_strntoull_8bit() and my_strntoull_mb2_or_mb4()
could hit undefinite behavior by negating of LONGLONG_MIN.
Fixing the code to avoid this.

841dc07e

18 Sep, 2024 3 commits

MDEV-31005: Make working cursor-protocol · 0a5e4a01

Lena Startseva authored May 23, 2024

Updated tests: cases with bugs or which cannot be run
with the cursor-protocol were excluded with
"--disable_cursor_protocol"/"--enable_cursor_protocol"

Fix for v.10.5

0a5e4a01

MDEV-31005: Make working cursor-protocol · ab569524

Lena Startseva authored May 21, 2024

Added ability to disable/enable (--disable_cursor_protocol/
--enable_cursor_protocol) cursor-protocol in tests. If
"--disable_cursor_protocol" is used then ps-protocol is also
disabled. With cursor-protocol prepare statement is executed
only once. For "--cursor-protocol" added filter for queries:
it is executed only for "SELECT" queries.

ab569524

MDEV-34952 main.log_slow test failure on opensuse builder · 450040e0

Daniel Black authored Sep 18, 2024

The loose regex for the MDEV-34539 test ended up
matching the opensuse in the path in buildbot.

Adjust to more complete regex including space,
backtick and \n, which becomes much less common
as a path name.

450040e0

17 Sep, 2024 2 commits

MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail · 68938d2b

Brandon Nesterenko authored Sep 09, 2024

The failing test case validates Seconds_Behind_Master for a delayed
slave, while STOP SLAVE is executed during a delay. The test fixes
initially added to the test (commit b04c8575) added a table lock
to ensure a transaction could not finish before validating the
Seconds_Behind_Master field after SLAVE START, but did not address a
possibility that the transaction could finish before running the
STOP SLAVE command, which invalidates the validations for the rest
of the test case. Specifically, this would result in 1) a timeout in
“Waiting for table metadata lock” on the replica, which expects the
transaction to retry after slave restart and hit a lock conflict on
the locked tables (added in b04c8575), and 2) that
Seconds_Behind_Master should have increased, but did not.

The failure can be reproduced by synchronizing the slave to the master
before the MDEV-32265 echo statement (i.e. before the SLAVE STOP).

This patch fixes the test by adding a mechanism to use DEBUG_SYNC to
synchronize a MASTER_DELAY, rather than continually increase the
duration of the delay each time the test fails on buildbot. This is
to ensure that on slow machines, a delay does not pass before the
test gets a chance to validate results. Additionally, it decreases
overall test time because the test can continue immediately after
validation, thereby bypassing the remainder of a full delay for each
transaction.

68938d2b

MDEV-25900 Assertion `octets < 1024' failed in... · a1adabdd

Alexander Barkov authored Sep 17, 2024

MDEV-25900 Assertion `octets < 1024' failed in Binlog_type_info_fixed_string::Binlog_type_info_fixed_string OR Assertion `field_length < 1024' failed in Field_string::save_field_metadata

A CHAR column cannot be longer than 1024, because
Binlog_type_info_fixed_string::Binlog_type_info_fixed_string
replies on this fact - it cannot store binlog metadata for longer columns.

In case of the filename character set mbmaxlen is equal to 5,
so only 1024/5=204 characters can fit into the 1024 limit.
- In strict mode:
  Disallowing creation of a CHAR column with octet length grater than 1024.
- In non-strict mode:
  Automatically convert CHAR with octet length>1024 into VARCHAR.

a1adabdd

16 Sep, 2024 1 commit
- galera SST scripts: fixing glitchy sockstat issues for FreeBSD · 222744c5
  Julius Goryavsky authored Sep 16, 2024
  
  222744c5
15 Sep, 2024 1 commit
- galera SST scripts: added missing 'datadir' parameter for mysqldump method · 45be538c
  Julius Goryavsky authored Sep 15, 2024
  
  45be538c