Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
G
gitlab-ce
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
nexedi
gitlab-ce
Commits
75445e75
Commit
75445e75
authored
Mar 13, 2020
by
Matthias Kaeppler
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Document `bulk_upsert!` and `skip_duplicates`
Also add a note about `jsonb` columns.
parent
26dfc31b
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
45 additions
and
5 deletions
+45
-5
doc/development/insert_into_tables_in_batches.md
doc/development/insert_into_tables_in_batches.md
+45
-5
No files found.
doc/development/insert_into_tables_in_batches.md
View file @
75445e75
...
@@ -32,12 +32,12 @@ The `BulkInsertSafe` concern has two functions:
...
@@ -32,12 +32,12 @@ The `BulkInsertSafe` concern has two functions:
-
It performs checks against your model class to ensure that it does not use ActiveRecord
-
It performs checks against your model class to ensure that it does not use ActiveRecord
APIs that are not safe to use with respect to bulk insertions (more on that below).
APIs that are not safe to use with respect to bulk insertions (more on that below).
-
It adds
a new class method
`bulk_in
sert!`
, which you can use to insert many records at once.
-
It adds
new class methods
`bulk_insert!`
and
`bulk_up
sert!`
, which you can use to insert many records at once.
## Insert records
via `bulk_in
sert!`
## Insert records
with `bulk_insert!` and `bulk_up
sert!`
If the target class passes the checks performed by
`BulkInsertSafe`
, you can
proceed to use
If the target class passes the checks performed by
`BulkInsertSafe`
, you can
insert an array of
the
`bulk_insert!`
class method
as follows:
ActiveRecord model objects
as follows:
```
ruby
```
ruby
records
=
[
MyModel
.
new
,
...
]
records
=
[
MyModel
.
new
,
...
]
...
@@ -45,6 +45,28 @@ records = [MyModel.new, ...]
...
@@ -45,6 +45,28 @@ records = [MyModel.new, ...]
MyModel
.
bulk_insert!
(
records
)
MyModel
.
bulk_insert!
(
records
)
```
```
Note that calls to
`bulk_insert!`
will always attempt to insert _new records_. If instead
you would like to replace existing records with new values, while still inserting those
that do not already exist, then you can use
`bulk_upsert!`
:
```
ruby
records
=
[
MyModel
.
new
,
existing_model
,
...
]
MyModel
.
bulk_upsert!
(
records
,
unique_by:
[
:name
])
```
In this example,
`unique_by`
specifies the columns by which records are considered to be
unique and as such will be updated if they existed prior to insertion. For example, if
`existing_model`
has a
`name`
attribute, and if a record with the same
`name`
value already
exists, its fields will be updated with those of
`existing_model`
.
The
`unique_by`
parameter can also be passed as a
`Symbol`
, in which case it specifies
a database index by which a column is considered unique:
```
ruby
MyModel
.
bulk_insert!
(
records
,
unique_by: :index_on_name
)
```
### Record validation
### Record validation
The
`bulk_insert!`
method guarantees that
`records`
will be inserted transactionally, and
The
`bulk_insert!`
method guarantees that
`records`
will be inserted transactionally, and
...
@@ -74,6 +96,23 @@ Since this will also affect the number of `INSERT`s that occur, make sure you me
...
@@ -74,6 +96,23 @@ Since this will also affect the number of `INSERT`s that occur, make sure you me
performance impact this might have on your code. There is a trade-off between the number of
performance impact this might have on your code. There is a trade-off between the number of
`INSERT`
statements the database has to process and the size and cost of each
`INSERT`
.
`INSERT`
statements the database has to process and the size and cost of each
`INSERT`
.
### Handling duplicate records
NOTE:
**Note:**
This parameter applies only to
`bulk_insert!`
. If you intend to update existing
records, use
`bulk_upsert!`
instead.
It may happen that some records you are trying to insert already exist, which would result in
primary key conflicts. There are two ways to address this problem: failing fast by raising an
error or skipping duplicate records. The default behavior of
`bulk_insert!`
is to fail fast
and raise an
`ActiveRecord::RecordNotUnique`
error.
If this is undesirable, you can instead skip duplicate records with the
`skip_duplicates`
flag:
```
ruby
MyModel
.
bulk_insert!
(
records
,
skip_duplicates:
true
)
```
### Requirements for safe bulk insertions
### Requirements for safe bulk insertions
Large parts of ActiveRecord's persistence API are built around the notion of callbacks. Many
Large parts of ActiveRecord's persistence API are built around the notion of callbacks. Many
...
@@ -145,11 +184,12 @@ simply be treated as if you had invoked `save` from outside the block.
...
@@ -145,11 +184,12 @@ simply be treated as if you had invoked `save` from outside the block.
There are a few restrictions to how these APIs can be used:
There are a few restrictions to how these APIs can be used:
-
Bulk inserts only work for new records;
`UPDATE`
s or "upserts" are not supported yet.
-
`ON CONFLICT`
behavior cannot currently be configured; an error will be raised on primary key conflicts.
-
`ON CONFLICT`
behavior cannot currently be configured; an error will be raised on primary key conflicts.
-
`BulkInsertableAssociations`
furthermore has the following restrictions:
-
`BulkInsertableAssociations`
furthermore has the following restrictions:
-
only compatible with
`has_many`
relations.
-
only compatible with
`has_many`
relations.
-
does not support
`has_many through: ...`
relations.
-
does not support
`has_many through: ...`
relations.
-
Writing
[
`jsonb`
](
https://www.postgresql.org/docs/current/datatype-json.html
)
content is
[
not currently supported
](
https://gitlab.com/gitlab-org/gitlab/-/issues/210560
)
.
Moreover, input data should either be limited to around 1000 records at most,
Moreover, input data should either be limited to around 1000 records at most,
or already batched prior to calling bulk insert. The
`INSERT`
statement will run in a single
or already batched prior to calling bulk insert. The
`INSERT`
statement will run in a single
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment