Commit 92878e97 authored by Sergei Golubchik's avatar Sergei Golubchik

subdist optimization

1. randomize all vectors via multiplication by a random orthogonal
   matrix
   * to generate the matrix fill the square matrix with normally
     distributed random values and create an orthogonal matrix with
     the QR decomposition
   * the rnd generator is seeded with the number of dimensions,
     so the matrix will be always the same for a given table
   * multiplication by an orthogonal matrix is a "rotation", so
     does not change distances or angles
2. when calculating the distance, first calculate a "subdistance",
   the distance between projections to the first subdist_part
   coordinates (=192, best by test, if it's larger it's less efficient,
   if it's smaller the error rate is too high)
3. calculate the full distance only if "subdistance" isn't confidently
   higher (above subdist_margin) than the distance we're comparing with
   * it might look like it would make sense to do a second projection
     at, say, subdist_part*2, and so on - but in practice one check
     is enough, the projected distance converges quickly and if it
     isn't confidently higher at subdist_part, it won't be later either

This optimization introduces a constant overhead per insert/search
operation - an input/query vector has to be multiplied by the matrix.
And the optimization saves on every distance calculation. Thus it is only
beneficial when a number of distance calculations (which grows with M
and with the table size) is high enough to outweigh the constant
overhead. Let's use MIN_ROWS table option to estimate the number of rows
in the table. use_subdist_heuristic() is optimal for mnist and
fashion-mnist (784 dimensions, 60k rows) and variations of gist (960
dimensions, 200k, 400k, 600k, 800k, 1000k rows)
parent e5d56bc2
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
# along with this program; if not, write to the Free Software # along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA
CMAKE_MINIMUM_REQUIRED(VERSION 2.8.12) CMAKE_MINIMUM_REQUIRED(VERSION 3.0)
IF(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES) IF(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
# Setting build type to RelWithDebInfo as none was specified. # Setting build type to RelWithDebInfo as none was specified.
......
...@@ -225,13 +225,14 @@ RECOMPILE_FOR_EMBEDDED) ...@@ -225,13 +225,14 @@ RECOMPILE_FOR_EMBEDDED)
MYSQL_ADD_PLUGIN(online_alter_log online_alter.cc STORAGE_ENGINE MANDATORY MYSQL_ADD_PLUGIN(online_alter_log online_alter.cc STORAGE_ENGINE MANDATORY
STATIC_ONLY NOT_EMBEDDED) STATIC_ONLY NOT_EMBEDDED)
FIND_PACKAGE(Eigen3 3.3 REQUIRED NO_MODULE)
ADD_LIBRARY(sql STATIC ${SQL_SOURCE}) ADD_LIBRARY(sql STATIC ${SQL_SOURCE})
MAYBE_DISABLE_IPO(sql) MAYBE_DISABLE_IPO(sql)
DTRACE_INSTRUMENT(sql) DTRACE_INSTRUMENT(sql)
TARGET_LINK_LIBRARIES(sql TARGET_LINK_LIBRARIES(sql
mysys mysys_ssl dbug strings vio pcre2-8 mysys mysys_ssl dbug strings vio pcre2-8
tpool tpool Eigen3::Eigen
online_alter_log online_alter_log
${LIBWRAP} ${LIBCRYPT} ${CMAKE_DL_LIBS} ${CMAKE_THREAD_LIBS_INIT} ${LIBWRAP} ${LIBCRYPT} ${CMAKE_DL_LIBS} ${CMAKE_THREAD_LIBS_INIT}
${SSL_LIBRARIES} ${SSL_LIBRARIES}
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment