When working with large databases in production applications, writing safe and efficient database migration code is crucial to avoid downtime and performance issues. Here are some best practices to follow:
Adding Indexes In Large Table
In Rails, migrations are wrapped in a transaction by default. While this is generally a good practice, it can lead to long-running transactions that lock tables and degrade performance when dealing with large datasets. To mitigate this, you can either disable the automatic transaction wrapping for a migration or manually commit transactions at strategic points.
execute("commit;") This command is used to manually commit a transaction in a migration. This is particularly useful when you want to
break up a long-running migration into smaller, more manageable parts. By committing the transaction, you can ensure
that changes are saved incrementally, reducing the risk of long locks on the database.
1
2
3
4
5
6
class AddOrders < ActiveRecord::Migration[8.2]
def up
execute("commit;")
add_column :orders, :shopify_order_id, :string, limit: 255
end
end
disable_ddl_transaction! is used at the beginning of a migration to disable the automatic transaction wrapping. This is useful for
operations that cannot be performed within a transaction, such as certain types of index creation or
when working with large tables where you want to avoid long locks.
Example:
1
2
3
4
5
6
7
8
9
10
11
class AddIndexToAVeryLargeTable < ActiveRecord::Migration[8.0]
disable_ddl_transaction!
def up
add_index :activities, [:action, :user_id], algorithm: :concurrently
end
def down
remove_index :activities, [:action, :user_id]
end
end
People also use disable_ddl_transaction! in migrations that perform large data backfills or schema changes that
would otherwise lock the table for an extended period.
Using savepoints to manage long migrations
In DBMS like PostgreSQL and MySQL, savepoints are markers within a transaction that will allow you to perform partial rollbacks to a specific point without affecting the entire transaction. This is specially useful when dealing with long migrations on large databases. See this for more details.
Example of using savepoints in a migration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class LongMigration < ActiveRecord::Migration[6.0]
# up method automatically wrapped in a transaction by Rails
def up
# Perform initial operations here
# ...
execute("SAVEPOINT before_long_operation;")
begin
# Perform long-running operation here
# ...
execute("UPDATE large_table SET column = 'new_value' WHERE condition;")
execute("RELEASE SAVEPOINT before_long_operation;")
rescue => e
execute("ROLLBACK TO SAVEPOINT before_long_operation;")
raise e
end
end
end
ActiveRecord::IrreversibleMigration In Migrations
When writing migrations, there may be cases where you want to indicate that a migration cannot be reversed. This is particularly important for large databases where rolling back a migration could lead to significant downtime or data loss.
Example:
1
2
3
4
5
6
7
8
9
10
11
class AddLargeDataColumn < ActiveRecord::Migration[6.0]
def up
add_column :large_table, :new_column, :text
# Perform data backfill or other operations here
# ....
end
def down
raise ActiveRecord::IrreversibleMigration, "This migration cannot be reversed."
end
end
Adding a New Column With Default Value
When adding a new column to a large table, it’s best to avoid setting a default value directly in the migration. Setting a default value can lock the table for an extended period, leading to downtime because it rewrites the whole table.
Instead, do this:-
- add the column as nullable in one migration,
- then backfill the data in batches in another migration, and
- finally set the default value in a separate step.
Examples
1
2
3
4
5
6
7
8
# Step 1: Add column without default
add_column :users, :role, :string
# Step 2: Update existing rows in batches
User.in_batches.update_all(role: 'member')
# Step 3: Add default safely
change_column_default :users, :role, from: nil, to: 'member'
Always test migrations on a staging environment with a copy of the production database to ensure they run smoothly without causing performance issues.
Removing Columns Safely
When removing columns from large tables, ensure that the application code no longer references those columns. Dropping a column that is still in use can lead to application errors.
A simple code to drop a column:
1
remove_column :users, :old_column
Active Record caches database columns at runtime, so if you drop a column, it can cause exceptions until your app reboots. So, you need to ensure that the code is fully deployed and no longer references the column before dropping it.
Best Practices for Dropping Columns:
Deprecate the Column: First, remove references to the column in your application code. Deploy this change and ensure that the application is stable.
1 2 3
class User < ApplicationRecord self.ignored_columns += ["some_column"] end
Monitor Usage: Use logging or monitoring tools to ensure that the column is no longer being accessed.
Drop the Column: Once you are confident that the column is no longer in use, you can safely drop it in a separate migration.
1 2 3 4 5
class RemoveSomeColumnFromUsers < ActiveRecord::Migration[8.1] def change remove_column :users, :some_column end end
Test Thoroughly: Always test the migration in a staging environment that mirrors production to ensure there are no issues.
Data Backfill in Batches
When backfilling data in a large table, it’s important to do so in small batches to avoid long-running transactions
that can lock the table and degrade performance. You can use the in_batches method provided by ActiveRecord
to process records in manageable chunks.
Example:
ruby
User.where(role: nil).in_batches(of: 1000) do |batch|
batch.update_all(role: 'member')
end
Using Shopify’s Maintenance Tasks
For very large datasets, consider using tools like Shopify’s Maintenance Tasks to handle data backfills and other long-running operations in the background without impacting the application’s performance.
With these best practices, you can write database migration code that is safe and efficient for production applications with large databases, minimizing downtime and ensuring smooth operations.
Safety Check For Migrations
You can install the strong_migrations gem to help identify potentially dangerous migrations before they are run in production. This gem provides warnings and suggestions for safer migration practices.
Importants Links
- https://github.com/ankane/strong_migrations
- https://github.com/github/gh-ost/blob/master/doc/cheatsheet.md
- https://github.com/Shopify/maintenance_tasks
