Commit a3dc40e2 authored by unknown's avatar unknown

Applied InnoDB snapshot innodb-5.0-ss2095

Fixes the following bugs:

- Bug #29560: InnoDB >= 5.0.30 hangs on adaptive hash rw-lock 'waiting for an X-lock'

  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.

- Bug #32125: Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase

  When unknown find_flag is encountered in convert_search_mode_to_innobase()
  do not call assert(0); instead queue a MySQL error using my_error() and
  return the error code PAGE_CUR_UNSUPP. Change the functions that call
  convert_search_mode_to_innobase() to handle that error code by "canceling"
  execution and returning appropriate error code further upstream.


innobase/include/db0err.h:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2091:
  branches/5.0:
   
  Merge r2088 from trunk:
   
  log for r2088:
  
  Fix Bug#32125 (http://bugs.mysql.com/32125)
  "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase":
  
  When unknown find_flag is encountered in convert_search_mode_to_innobase()
  do not call assert(0); instead queue a MySQL error using my_error() and
  return the error code PAGE_CUR_UNSUPP. Change the functions that call
  convert_search_mode_to_innobase() to handle that error code by "canceling"
  execution and returning appropriate error code further upstream.
  
  Approved by:	Heikki
innobase/include/os0sync.h:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
innobase/include/page0cur.h:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2091:
  branches/5.0:
   
  Merge r2088 from trunk:
   
  log for r2088:
  
  Fix Bug#32125 (http://bugs.mysql.com/32125)
  "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase":
  
  When unknown find_flag is encountered in convert_search_mode_to_innobase()
  do not call assert(0); instead queue a MySQL error using my_error() and
  return the error code PAGE_CUR_UNSUPP. Change the functions that call
  convert_search_mode_to_innobase() to handle that error code by "canceling"
  execution and returning appropriate error code further upstream.
  
  Approved by:	Heikki
innobase/include/sync0rw.h:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
innobase/include/sync0rw.ic:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
innobase/include/sync0sync.ic:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
innobase/os/os0sync.c:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
innobase/srv/srv0srv.c:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
innobase/sync/sync0arr.c:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
innobase/sync/sync0rw.c:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
innobase/sync/sync0sync.c:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2082:
  branches/5.0:  bug#29560
  
  Fixed a race condition in the rw_lock where an os_event_reset()
  can overwrite an earlier os_event_set() triggering an indefinite
  wait.
  NOTE: This fix for windows is different from that for other platforms.
  NOTE2: This bug is introduced in the scalability fix to the
  sync0arr which was applied to 5.0 only. Therefore, it need not be
  applied to the 5.1 tree. If we decide to port the scalability fix
  to 5.1 then this fix should be ported as well.
  
  Reviewed by: Heikki
sql/ha_innodb.cc:
  Applied InnoDB snapshot innodb-5.0-ss2095
  
  Revision r2091:
  branches/5.0:
   
  Merge r2088 from trunk:
   
  log for r2088:
  
  Fix Bug#32125 (http://bugs.mysql.com/32125)
  "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase":
  
  When unknown find_flag is encountered in convert_search_mode_to_innobase()
  do not call assert(0); instead queue a MySQL error using my_error() and
  return the error code PAGE_CUR_UNSUPP. Change the functions that call
  convert_search_mode_to_innobase() to handle that error code by "canceling"
  execution and returning appropriate error code further upstream.
  
  Approved by:	Heikki
  
  
  Revision r2095:
  branches/5.0: Merge r2093 from trunk:
  
  convert_search_mode_to_innobase(): Add the missing case label
  HA_READ_MBR_EQUAL that was forgotten in r2088.
parent 49934f49
...@@ -57,6 +57,18 @@ Created 5/24/1996 Heikki Tuuri ...@@ -57,6 +57,18 @@ Created 5/24/1996 Heikki Tuuri
buffer pool (for big transactions, buffer pool (for big transactions,
InnoDB stores the lock structs in the InnoDB stores the lock structs in the
buffer pool) */ buffer pool) */
#define DB_FOREIGN_DUPLICATE_KEY 46 /* foreign key constraints
activated by the operation would
lead to a duplicate key in some
table */
#define DB_TOO_MANY_CONCURRENT_TRXS 47 /* when InnoDB runs out of the
preconfigured undo slots, this can
only happen when there are too many
concurrent transactions */
#define DB_UNSUPPORTED 48 /* when InnoDB sees any artefact or
a feature that it can't recoginize or
work with e.g., FT indexes created by
a later version of the engine. */
/* The following are partial failure codes */ /* The following are partial failure codes */
#define DB_FAIL 1000 #define DB_FAIL 1000
......
...@@ -112,9 +112,13 @@ os_event_set( ...@@ -112,9 +112,13 @@ os_event_set(
os_event_t event); /* in: event to set */ os_event_t event); /* in: event to set */
/************************************************************** /**************************************************************
Resets an event semaphore to the nonsignaled state. Waiting threads will Resets an event semaphore to the nonsignaled state. Waiting threads will
stop to wait for the event. */ stop to wait for the event.
The return value should be passed to os_even_wait_low() if it is desired
that this thread should not wait in case of an intervening call to
os_event_set() between this os_event_reset() and the
os_event_wait_low() call. See comments for os_event_wait_low(). */
void ib_longlong
os_event_reset( os_event_reset(
/*===========*/ /*===========*/
os_event_t event); /* in: event to reset */ os_event_t event); /* in: event to reset */
...@@ -125,16 +129,38 @@ void ...@@ -125,16 +129,38 @@ void
os_event_free( os_event_free(
/*==========*/ /*==========*/
os_event_t event); /* in: event to free */ os_event_t event); /* in: event to free */
/************************************************************** /**************************************************************
Waits for an event object until it is in the signaled state. If Waits for an event object until it is in the signaled state. If
srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS this also exits the srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS this also exits the
waiting thread when the event becomes signaled (or immediately if the waiting thread when the event becomes signaled (or immediately if the
event is already in the signaled state). */ event is already in the signaled state).
Typically, if the event has been signalled after the os_event_reset()
we'll return immediately because event->is_set == TRUE.
There are, however, situations (e.g.: sync_array code) where we may
lose this information. For example:
thread A calls os_event_reset()
thread B calls os_event_set() [event->is_set == TRUE]
thread C calls os_event_reset() [event->is_set == FALSE]
thread A calls os_event_wait() [infinite wait!]
thread C calls os_event_wait() [infinite wait!]
Where such a scenario is possible, to avoid infinite wait, the
value returned by os_event_reset() should be passed in as
reset_sig_count. */
#define os_event_wait(event) os_event_wait_low((event), 0)
void void
os_event_wait( os_event_wait_low(
/*==========*/ /*==============*/
os_event_t event); /* in: event to wait */ os_event_t event, /* in: event to wait */
ib_longlong reset_sig_count);/* in: zero or the value
returned by previous call of
os_event_reset(). */
/************************************************************** /**************************************************************
Waits for an event object until it is in the signaled state or Waits for an event object until it is in the signaled state or
a timeout is exceeded. In Unix the timeout is always infinite. */ a timeout is exceeded. In Unix the timeout is always infinite. */
......
...@@ -22,6 +22,7 @@ Created 10/4/1994 Heikki Tuuri ...@@ -22,6 +22,7 @@ Created 10/4/1994 Heikki Tuuri
/* Page cursor search modes; the values must be in this order! */ /* Page cursor search modes; the values must be in this order! */
#define PAGE_CUR_UNSUPP 0
#define PAGE_CUR_G 1 #define PAGE_CUR_G 1
#define PAGE_CUR_GE 2 #define PAGE_CUR_GE 2
#define PAGE_CUR_L 3 #define PAGE_CUR_L 3
......
...@@ -418,6 +418,17 @@ field. Then no new readers are allowed in. */ ...@@ -418,6 +418,17 @@ field. Then no new readers are allowed in. */
struct rw_lock_struct { struct rw_lock_struct {
os_event_t event; /* Used by sync0arr.c for thread queueing */ os_event_t event; /* Used by sync0arr.c for thread queueing */
#ifdef __WIN__
os_event_t wait_ex_event; /* This windows specific event is
used by the thread which has set the
lock state to RW_LOCK_WAIT_EX. The
rw_lock design guarantees that this
thread will be the next one to proceed
once the current the event gets
signalled. See LEMMA 2 in sync0sync.c */
#endif
ulint reader_count; /* Number of readers who have locked this ulint reader_count; /* Number of readers who have locked this
lock in the shared mode */ lock in the shared mode */
ulint writer; /* This field is set to RW_LOCK_EX if there ulint writer; /* This field is set to RW_LOCK_EX if there
......
...@@ -382,6 +382,9 @@ rw_lock_s_unlock_func( ...@@ -382,6 +382,9 @@ rw_lock_s_unlock_func(
mutex_exit(mutex); mutex_exit(mutex);
if (UNIV_UNLIKELY(sg)) { if (UNIV_UNLIKELY(sg)) {
#ifdef __WIN__
os_event_set(lock->wait_ex_event);
#endif
os_event_set(lock->event); os_event_set(lock->event);
sync_array_object_signalled(sync_primary_wait_array); sync_array_object_signalled(sync_primary_wait_array);
} }
...@@ -463,6 +466,9 @@ rw_lock_x_unlock_func( ...@@ -463,6 +466,9 @@ rw_lock_x_unlock_func(
mutex_exit(&(lock->mutex)); mutex_exit(&(lock->mutex));
if (UNIV_UNLIKELY(sg)) { if (UNIV_UNLIKELY(sg)) {
#ifdef __WIN__
os_event_set(lock->wait_ex_event);
#endif
os_event_set(lock->event); os_event_set(lock->event);
sync_array_object_signalled(sync_primary_wait_array); sync_array_object_signalled(sync_primary_wait_array);
} }
......
...@@ -207,7 +207,7 @@ mutex_exit( ...@@ -207,7 +207,7 @@ mutex_exit(
perform the read first, which could leave a waiting perform the read first, which could leave a waiting
thread hanging indefinitely. thread hanging indefinitely.
Our current solution call every 10 seconds Our current solution call every second
sync_arr_wake_threads_if_sema_free() sync_arr_wake_threads_if_sema_free()
to wake up possible hanging threads if to wake up possible hanging threads if
they are missed in mutex_signal_object. */ they are missed in mutex_signal_object. */
......
...@@ -151,7 +151,14 @@ os_event_create( ...@@ -151,7 +151,14 @@ os_event_create(
ut_a(0 == pthread_cond_init(&(event->cond_var), NULL)); ut_a(0 == pthread_cond_init(&(event->cond_var), NULL));
#endif #endif
event->is_set = FALSE; event->is_set = FALSE;
event->signal_count = 0;
/* We return this value in os_event_reset(), which can then be
be used to pass to the os_event_wait_low(). The value of zero
is reserved in os_event_wait_low() for the case when the
caller does not want to pass any signal_count value. To
distinguish between the two cases we initialize signal_count
to 1 here. */
event->signal_count = 1;
#endif /* __WIN__ */ #endif /* __WIN__ */
/* The os_sync_mutex can be NULL because during startup an event /* The os_sync_mutex can be NULL because during startup an event
...@@ -244,13 +251,20 @@ os_event_set( ...@@ -244,13 +251,20 @@ os_event_set(
/************************************************************** /**************************************************************
Resets an event semaphore to the nonsignaled state. Waiting threads will Resets an event semaphore to the nonsignaled state. Waiting threads will
stop to wait for the event. */ stop to wait for the event.
The return value should be passed to os_even_wait_low() if it is desired
that this thread should not wait in case of an intervening call to
os_event_set() between this os_event_reset() and the
os_event_wait_low() call. See comments for os_event_wait_low(). */
void ib_longlong
os_event_reset( os_event_reset(
/*===========*/ /*===========*/
/* out: current signal_count. */
os_event_t event) /* in: event to reset */ os_event_t event) /* in: event to reset */
{ {
ib_longlong ret = 0;
#ifdef __WIN__ #ifdef __WIN__
ut_a(event); ut_a(event);
...@@ -265,9 +279,11 @@ os_event_reset( ...@@ -265,9 +279,11 @@ os_event_reset(
} else { } else {
event->is_set = FALSE; event->is_set = FALSE;
} }
ret = event->signal_count;
os_fast_mutex_unlock(&(event->os_mutex)); os_fast_mutex_unlock(&(event->os_mutex));
#endif #endif
return(ret);
} }
/************************************************************** /**************************************************************
...@@ -335,18 +351,38 @@ os_event_free( ...@@ -335,18 +351,38 @@ os_event_free(
Waits for an event object until it is in the signaled state. If Waits for an event object until it is in the signaled state. If
srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS this also exits the srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS this also exits the
waiting thread when the event becomes signaled (or immediately if the waiting thread when the event becomes signaled (or immediately if the
event is already in the signaled state). */ event is already in the signaled state).
Typically, if the event has been signalled after the os_event_reset()
we'll return immediately because event->is_set == TRUE.
There are, however, situations (e.g.: sync_array code) where we may
lose this information. For example:
thread A calls os_event_reset()
thread B calls os_event_set() [event->is_set == TRUE]
thread C calls os_event_reset() [event->is_set == FALSE]
thread A calls os_event_wait() [infinite wait!]
thread C calls os_event_wait() [infinite wait!]
Where such a scenario is possible, to avoid infinite wait, the
value returned by os_event_reset() should be passed in as
reset_sig_count. */
void void
os_event_wait( os_event_wait_low(
/*==========*/ /*==============*/
os_event_t event) /* in: event to wait */ os_event_t event, /* in: event to wait */
ib_longlong reset_sig_count)/* in: zero or the value
returned by previous call of
os_event_reset(). */
{ {
#ifdef __WIN__ #ifdef __WIN__
DWORD err; DWORD err;
ut_a(event); ut_a(event);
UT_NOT_USED(reset_sig_count);
/* Specify an infinite time limit for waiting */ /* Specify an infinite time limit for waiting */
err = WaitForSingleObject(event->handle, INFINITE); err = WaitForSingleObject(event->handle, INFINITE);
...@@ -360,7 +396,11 @@ os_event_wait( ...@@ -360,7 +396,11 @@ os_event_wait(
os_fast_mutex_lock(&(event->os_mutex)); os_fast_mutex_lock(&(event->os_mutex));
old_signal_count = event->signal_count; if (reset_sig_count) {
old_signal_count = reset_sig_count;
} else {
old_signal_count = event->signal_count;
}
for (;;) { for (;;) {
if (event->is_set == TRUE if (event->is_set == TRUE
......
...@@ -1881,12 +1881,6 @@ srv_lock_timeout_and_monitor_thread( ...@@ -1881,12 +1881,6 @@ srv_lock_timeout_and_monitor_thread(
os_thread_sleep(1000000); os_thread_sleep(1000000);
/* In case mutex_exit is not a memory barrier, it is
theoretically possible some threads are left waiting though
the semaphore is already released. Wake up those threads: */
sync_arr_wake_threads_if_sema_free();
current_time = time(NULL); current_time = time(NULL);
time_elapsed = difftime(current_time, last_monitor_time); time_elapsed = difftime(current_time, last_monitor_time);
...@@ -2083,9 +2077,15 @@ srv_error_monitor_thread( ...@@ -2083,9 +2077,15 @@ srv_error_monitor_thread(
srv_refresh_innodb_monitor_stats(); srv_refresh_innodb_monitor_stats();
} }
/* In case mutex_exit is not a memory barrier, it is
theoretically possible some threads are left waiting though
the semaphore is already released. Wake up those threads: */
sync_arr_wake_threads_if_sema_free();
if (sync_array_print_long_waits()) { if (sync_array_print_long_waits()) {
fatal_cnt++; fatal_cnt++;
if (fatal_cnt > 5) { if (fatal_cnt > 10) {
fprintf(stderr, fprintf(stderr,
"InnoDB: Error: semaphore wait has lasted > %lu seconds\n" "InnoDB: Error: semaphore wait has lasted > %lu seconds\n"
...@@ -2103,7 +2103,7 @@ srv_error_monitor_thread( ...@@ -2103,7 +2103,7 @@ srv_error_monitor_thread(
fflush(stderr); fflush(stderr);
os_thread_sleep(2000000); os_thread_sleep(1000000);
if (srv_shutdown_state < SRV_SHUTDOWN_CLEANUP) { if (srv_shutdown_state < SRV_SHUTDOWN_CLEANUP) {
......
...@@ -40,7 +40,15 @@ because we can do with a very small number of OS events, ...@@ -40,7 +40,15 @@ because we can do with a very small number of OS events,
say 200. In NT 3.51, allocating events seems to be a quadratic say 200. In NT 3.51, allocating events seems to be a quadratic
algorithm, because 10 000 events are created fast, but algorithm, because 10 000 events are created fast, but
100 000 events takes a couple of minutes to create. 100 000 events takes a couple of minutes to create.
*/
As of 5.0.30 the above mentioned design is changed. Since now
OS can handle millions of wait events efficiently, we no longer
have this concept of each cell of wait array having one event.
Instead, now the event that a thread wants to wait on is embedded
in the wait object (mutex or rw_lock). We still keep the global
wait array for the sake of diagnostics and also to avoid infinite
wait The error_monitor thread scans the global wait array to signal
any waiting threads who have missed the signal. */
/* A cell where an individual thread may wait suspended /* A cell where an individual thread may wait suspended
until a resource is released. The suspending is implemented until a resource is released. The suspending is implemented
...@@ -62,6 +70,14 @@ struct sync_cell_struct { ...@@ -62,6 +70,14 @@ struct sync_cell_struct {
ibool waiting; /* TRUE if the thread has already ibool waiting; /* TRUE if the thread has already
called sync_array_event_wait called sync_array_event_wait
on this cell */ on this cell */
ib_longlong signal_count; /* We capture the signal_count
of the wait_object when we
reset the event. This value is
then passed on to os_event_wait
and we wait only if the event
has not been signalled in the
period between the reset and
wait call. */
time_t reservation_time;/* time when the thread reserved time_t reservation_time;/* time when the thread reserved
the wait cell */ the wait cell */
}; };
...@@ -216,6 +232,7 @@ sync_array_create( ...@@ -216,6 +232,7 @@ sync_array_create(
cell = sync_array_get_nth_cell(arr, i); cell = sync_array_get_nth_cell(arr, i);
cell->wait_object = NULL; cell->wait_object = NULL;
cell->waiting = FALSE; cell->waiting = FALSE;
cell->signal_count = 0;
} }
return(arr); return(arr);
...@@ -282,16 +299,23 @@ sync_array_validate( ...@@ -282,16 +299,23 @@ sync_array_validate(
/*********************************************************************** /***********************************************************************
Puts the cell event in reset state. */ Puts the cell event in reset state. */
static static
void ib_longlong
sync_cell_event_reset( sync_cell_event_reset(
/*==================*/ /*==================*/
/* out: value of signal_count
at the time of reset. */
ulint type, /* in: lock type mutex/rw_lock */ ulint type, /* in: lock type mutex/rw_lock */
void* object) /* in: the rw_lock/mutex object */ void* object) /* in: the rw_lock/mutex object */
{ {
if (type == SYNC_MUTEX) { if (type == SYNC_MUTEX) {
os_event_reset(((mutex_t *) object)->event); return(os_event_reset(((mutex_t *) object)->event));
#ifdef __WIN__
} else if (type == RW_LOCK_WAIT_EX) {
return(os_event_reset(
((rw_lock_t *) object)->wait_ex_event));
#endif
} else { } else {
os_event_reset(((rw_lock_t *) object)->event); return(os_event_reset(((rw_lock_t *) object)->event));
} }
} }
...@@ -345,8 +369,11 @@ sync_array_reserve_cell( ...@@ -345,8 +369,11 @@ sync_array_reserve_cell(
sync_array_exit(arr); sync_array_exit(arr);
/* Make sure the event is reset */ /* Make sure the event is reset and also store
sync_cell_event_reset(type, object); the value of signal_count at which the event
was reset. */
cell->signal_count = sync_cell_event_reset(type,
object);
cell->reservation_time = time(NULL); cell->reservation_time = time(NULL);
...@@ -388,7 +415,14 @@ sync_array_wait_event( ...@@ -388,7 +415,14 @@ sync_array_wait_event(
if (cell->request_type == SYNC_MUTEX) { if (cell->request_type == SYNC_MUTEX) {
event = ((mutex_t*) cell->wait_object)->event; event = ((mutex_t*) cell->wait_object)->event;
} else { #ifdef __WIN__
/* On windows if the thread about to wait is the one which
has set the state of the rw_lock to RW_LOCK_WAIT_EX, then
it waits on a special event i.e.: wait_ex_event. */
} else if (cell->request_type == RW_LOCK_WAIT_EX) {
event = ((rw_lock_t*) cell->wait_object)->wait_ex_event;
#endif
} else {
event = ((rw_lock_t*) cell->wait_object)->event; event = ((rw_lock_t*) cell->wait_object)->event;
} }
...@@ -413,7 +447,7 @@ sync_array_wait_event( ...@@ -413,7 +447,7 @@ sync_array_wait_event(
#endif #endif
sync_array_exit(arr); sync_array_exit(arr);
os_event_wait(event); os_event_wait_low(event, cell->signal_count);
sync_array_free_cell(arr, index); sync_array_free_cell(arr, index);
} }
...@@ -457,7 +491,11 @@ sync_array_cell_print( ...@@ -457,7 +491,11 @@ sync_array_cell_print(
#endif /* UNIV_SYNC_DEBUG */ #endif /* UNIV_SYNC_DEBUG */
(ulong) mutex->waiters); (ulong) mutex->waiters);
} else if (type == RW_LOCK_EX || type == RW_LOCK_SHARED) { } else if (type == RW_LOCK_EX
#ifdef __WIN__
|| type == RW_LOCK_WAIT_EX
#endif
|| type == RW_LOCK_SHARED) {
fputs(type == RW_LOCK_EX ? "X-lock on" : "S-lock on", file); fputs(type == RW_LOCK_EX ? "X-lock on" : "S-lock on", file);
...@@ -638,7 +676,8 @@ sync_array_detect_deadlock( ...@@ -638,7 +676,8 @@ sync_array_detect_deadlock(
return(FALSE); /* No deadlock */ return(FALSE); /* No deadlock */
} else if (cell->request_type == RW_LOCK_EX) { } else if (cell->request_type == RW_LOCK_EX
|| cell->request_type == RW_LOCK_WAIT_EX) {
lock = cell->wait_object; lock = cell->wait_object;
...@@ -734,7 +773,8 @@ sync_arr_cell_can_wake_up( ...@@ -734,7 +773,8 @@ sync_arr_cell_can_wake_up(
return(TRUE); return(TRUE);
} }
} else if (cell->request_type == RW_LOCK_EX) { } else if (cell->request_type == RW_LOCK_EX
|| cell->request_type == RW_LOCK_WAIT_EX) {
lock = cell->wait_object; lock = cell->wait_object;
...@@ -783,6 +823,7 @@ sync_array_free_cell( ...@@ -783,6 +823,7 @@ sync_array_free_cell(
cell->waiting = FALSE; cell->waiting = FALSE;
cell->wait_object = NULL; cell->wait_object = NULL;
cell->signal_count = 0;
ut_a(arr->n_reserved > 0); ut_a(arr->n_reserved > 0);
arr->n_reserved--; arr->n_reserved--;
...@@ -839,6 +880,14 @@ sync_arr_wake_threads_if_sema_free(void) ...@@ -839,6 +880,14 @@ sync_arr_wake_threads_if_sema_free(void)
mutex = cell->wait_object; mutex = cell->wait_object;
os_event_set(mutex->event); os_event_set(mutex->event);
#ifdef __WIN__
} else if (cell->request_type
== RW_LOCK_WAIT_EX) {
rw_lock_t* lock;
lock = cell->wait_object;
os_event_set(lock->wait_ex_event);
#endif
} else { } else {
rw_lock_t* lock; rw_lock_t* lock;
......
...@@ -132,6 +132,10 @@ rw_lock_create_func( ...@@ -132,6 +132,10 @@ rw_lock_create_func(
lock->last_x_line = 0; lock->last_x_line = 0;
lock->event = os_event_create(NULL); lock->event = os_event_create(NULL);
#ifdef __WIN__
lock->wait_ex_event = os_event_create(NULL);
#endif
mutex_enter(&rw_lock_list_mutex); mutex_enter(&rw_lock_list_mutex);
if (UT_LIST_GET_LEN(rw_lock_list) > 0) { if (UT_LIST_GET_LEN(rw_lock_list) > 0) {
...@@ -168,6 +172,10 @@ rw_lock_free( ...@@ -168,6 +172,10 @@ rw_lock_free(
mutex_enter(&rw_lock_list_mutex); mutex_enter(&rw_lock_list_mutex);
os_event_free(lock->event); os_event_free(lock->event);
#ifdef __WIN__
os_event_free(lock->wait_ex_event);
#endif
if (UT_LIST_GET_PREV(list, lock)) { if (UT_LIST_GET_PREV(list, lock)) {
ut_a(UT_LIST_GET_PREV(list, lock)->magic_n == RW_LOCK_MAGIC_N); ut_a(UT_LIST_GET_PREV(list, lock)->magic_n == RW_LOCK_MAGIC_N);
} }
...@@ -521,7 +529,15 @@ rw_lock_x_lock_func( ...@@ -521,7 +529,15 @@ rw_lock_x_lock_func(
rw_x_system_call_count++; rw_x_system_call_count++;
sync_array_reserve_cell(sync_primary_wait_array, sync_array_reserve_cell(sync_primary_wait_array,
lock, RW_LOCK_EX, lock,
#ifdef __WIN__
/* On windows RW_LOCK_WAIT_EX signifies
that this thread should wait on the
special wait_ex_event. */
(state == RW_LOCK_WAIT_EX)
? RW_LOCK_WAIT_EX :
#endif
RW_LOCK_EX,
file_name, line, file_name, line,
&index); &index);
......
...@@ -95,17 +95,47 @@ have happened that the thread which was holding the mutex has just released ...@@ -95,17 +95,47 @@ have happened that the thread which was holding the mutex has just released
it and did not see the waiters byte set to 1, a case which would lead the it and did not see the waiters byte set to 1, a case which would lead the
other thread to an infinite wait. other thread to an infinite wait.
LEMMA 1: After a thread resets the event of the cell it reserves for waiting LEMMA 1: After a thread resets the event of a mutex (or rw_lock), some
======== =======
for a mutex, some thread will eventually call sync_array_signal_object with thread will eventually call os_event_set() on that particular event.
the mutex as an argument. Thus no infinite wait is possible. Thus no infinite wait is possible in this case.
Proof: After making the reservation the thread sets the waiters field in the Proof: After making the reservation the thread sets the waiters field in the
mutex to 1. Then it checks that the mutex is still reserved by some thread, mutex to 1. Then it checks that the mutex is still reserved by some thread,
or it reserves the mutex for itself. In any case, some thread (which may be or it reserves the mutex for itself. In any case, some thread (which may be
also some earlier thread, not necessarily the one currently holding the mutex) also some earlier thread, not necessarily the one currently holding the mutex)
will set the waiters field to 0 in mutex_exit, and then call will set the waiters field to 0 in mutex_exit, and then call
sync_array_signal_object with the mutex as an argument. os_event_set() with the mutex as an argument.
Q.E.D.
LEMMA 2: If an os_event_set() call is made after some thread has called
=======
the os_event_reset() and before it starts wait on that event, the call
will not be lost to the second thread. This is true even if there is an
intervening call to os_event_reset() by another thread.
Thus no infinite wait is possible in this case.
Proof (non-windows platforms): os_event_reset() returns a monotonically
increasing value of signal_count. This value is increased at every
call of os_event_set() If thread A has called os_event_reset() followed
by thread B calling os_event_set() and then some other thread C calling
os_event_reset(), the is_set flag of the event will be set to FALSE;
but now if thread A calls os_event_wait_low() with the signal_count
value returned from the earlier call of os_event_reset(), it will
return immediately without waiting.
Q.E.D.
Proof (windows): If there is a writer thread which is forced to wait for
the lock, it may be able to set the state of rw_lock to RW_LOCK_WAIT_EX
The design of rw_lock ensures that there is one and only one thread
that is able to change the state to RW_LOCK_WAIT_EX and this thread is
guaranteed to acquire the lock after it is released by the current
holders and before any other waiter gets the lock.
On windows this thread waits on a separate event i.e.: wait_ex_event.
Since only one thread can wait on this event there is no chance
of this event getting reset before the writer starts wait on it.
Therefore, this thread is guaranteed to catch the os_set_event()
signalled unconditionally at the release of the lock.
Q.E.D. */ Q.E.D. */
ulint sync_dummy = 0; ulint sync_dummy = 0;
......
...@@ -522,6 +522,9 @@ convert_error_code_to_mysql( ...@@ -522,6 +522,9 @@ convert_error_code_to_mysql(
mark_transaction_to_rollback(thd, TRUE); mark_transaction_to_rollback(thd, TRUE);
return(HA_ERR_LOCK_TABLE_FULL); return(HA_ERR_LOCK_TABLE_FULL);
} else if (error == DB_UNSUPPORTED) {
return(HA_ERR_UNSUPPORTED);
} else { } else {
return(-1); // Unknown error return(-1); // Unknown error
} }
...@@ -3713,11 +3716,22 @@ convert_search_mode_to_innobase( ...@@ -3713,11 +3716,22 @@ convert_search_mode_to_innobase(
and comparison of non-latin1 char type fields in and comparison of non-latin1 char type fields in
innobase_mysql_cmp() to get PAGE_CUR_LE_OR_EXTENDS to innobase_mysql_cmp() to get PAGE_CUR_LE_OR_EXTENDS to
work correctly. */ work correctly. */
case HA_READ_MBR_CONTAIN:
default: assert(0); case HA_READ_MBR_INTERSECT:
case HA_READ_MBR_WITHIN:
case HA_READ_MBR_DISJOINT:
case HA_READ_MBR_EQUAL:
my_error(ER_TABLE_CANT_HANDLE_SPKEYS, MYF(0));
return(PAGE_CUR_UNSUPP);
/* do not use "default:" in order to produce a gcc warning:
enumeration value '...' not handled in switch
(if -Wswitch or -Wall is used)
*/
} }
return(0); my_error(ER_CHECK_NOT_IMPLEMENTED, MYF(0), "this functionality");
return(PAGE_CUR_UNSUPP);
} }
/* /*
...@@ -3855,11 +3869,18 @@ ha_innobase::index_read( ...@@ -3855,11 +3869,18 @@ ha_innobase::index_read(
last_match_mode = (uint) match_mode; last_match_mode = (uint) match_mode;
innodb_srv_conc_enter_innodb(prebuilt->trx); if (mode != PAGE_CUR_UNSUPP) {
ret = row_search_for_mysql((byte*) buf, mode, prebuilt, match_mode, 0); innodb_srv_conc_enter_innodb(prebuilt->trx);
innodb_srv_conc_exit_innodb(prebuilt->trx); ret = row_search_for_mysql((byte*) buf, mode, prebuilt,
match_mode, 0);
innodb_srv_conc_exit_innodb(prebuilt->trx);
} else {
ret = DB_UNSUPPORTED;
}
if (ret == DB_SUCCESS) { if (ret == DB_SUCCESS) {
error = 0; error = 0;
...@@ -5174,8 +5195,16 @@ ha_innobase::records_in_range( ...@@ -5174,8 +5195,16 @@ ha_innobase::records_in_range(
mode2 = convert_search_mode_to_innobase(max_key ? max_key->flag : mode2 = convert_search_mode_to_innobase(max_key ? max_key->flag :
HA_READ_KEY_EXACT); HA_READ_KEY_EXACT);
n_rows = btr_estimate_n_rows_in_range(index, range_start, if (mode1 != PAGE_CUR_UNSUPP && mode2 != PAGE_CUR_UNSUPP) {
mode1, range_end, mode2);
n_rows = btr_estimate_n_rows_in_range(index, range_start,
mode1, range_end,
mode2);
} else {
n_rows = 0;
}
dtuple_free_for_mysql(heap1); dtuple_free_for_mysql(heap1);
dtuple_free_for_mysql(heap2); dtuple_free_for_mysql(heap2);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment