Eager Loading
It is a feature provided by ActiveRecord which helps you to reduce number of queries you make to the database. It loads all the records that are to be used below before you write actual logic to prevent multiple unnecessary database hits.
Advantages:-
- faster response time
- lower DB server usage, better throughput
- reduced server cost to the company
preload method
This method executes one extra query per association.
Its like includes and
- supports
array,hashand,nested-hashas parameters - executes 1 query per association
Unline includes method
- it does not support
whereclause - it does not switch to
eager_loadand does not make join tables
Example - preload
1
Customer.preload(orders: { books: [:supplier, :author] }).find(1)
Output:-
1
2
3
4
5
6
Customer Load (0.0 ms) SELECT "customers".* FROM "customers" WHERE "customers"."id" = ? LIMIT ? [["id", 1], ["LIMIT", 1]]
Order Load (0.1 ms) SELECT "orders".* FROM "orders" WHERE "orders"."customer_id" = ? [["customer_id", 1]]
HABTM_Books Load (0.0 ms) SELECT "books_orders".* FROM "books_orders" WHERE "books_orders"."order_id" IN (?, ?) [["order_id", 1], ["order_id", 2]]
Book Load (0.0 ms) SELECT "books".* FROM "books" WHERE "books"."id" IN (?, ?, ?) [["id", 1], ["id", 2], ["id", 3]]
Supplier Load (0.0 ms) SELECT "suppliers".* FROM "suppliers" WHERE "suppliers"."id" IN (?, ?) [["id", 1], ["id", 2]]
Author Load (0.0 ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" IN (?, ?) [["id", 1], ["id", 2]]
Pros - preload
Advantages of using preload are
- its best for long chain of associations
- because it does not join tables
- it best when tables are huge and have large number of columns
- as it does not use
DISTINCTandJOINclause, its response time would be smaller - avoids memory bloating
Cons - preload
Disadvantages
- executes multiple queries β 1 per association ( 1 per HABTM table)
- does not support conditions using
whereso avoid using it when you need to filter rows on associated tables.
eager_load method
It will prepare a single query if no condition is attached. It is slower and less efficient because it joins all the
tables including all the columns. It will always prepare a LEFT OUTER JOIN based query.
Cons of this method are
- it always prepares 2 join table and 2 queries when supplied with a condition
- slower than
preloadmethod
Example 1 - without condition
1
Customer.eager_load(:orders).each { |c| puts c.orders.size }
Output:-
1
Customer Eager Load (0.0 ms) SELECT "customers"."id" AS t0_r0, "customers". "name" AS t0_r1, "customers". "created_at" AS t0_r2, "customers". "updated_at" AS t0_r3, "orders". "id" AS t1_r0, "orders". "customer_id" AS t1_r1, "orders". "number" AS t1_r2, "orders". "created_at" AS t1_r3, "orders". "updated_at" AS t1_r4 FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id"
As, you can see it executes only one query.
Example 2 - with condition
1
Customer.eager_load(:orders).limit(10).each { |c| puts c.orders.size }
Output:-
1
2
3
SQL (0.2 ms) SELECT DISTINCT "customers"."id" FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id" LIMIT ? [["LIMIT", 10]]
Customer Eager Load (0.1 ms) SELECT "customers"."id" AS t0_r0, "customers". "name" AS t0_r1, "customers". "created_at" AS t0_r2, "customers". "updated_at" AS t0_r3, "orders". "id" AS t1_r0, "orders". "customer_id" AS t1_r1, "orders". "number" AS t1_r2, "orders". "created_at" AS t1_r3, "orders". "updated_at" AS t1_r4 FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id" WHERE "customers"."id" IN (?, ?, ?) [["id", 1], ["id", 2], ["id", 3]]
Keep in mind that even limit(10) is a condition and triggered 2 queries.
Example 3 - Nested Params & Nested where
1
2
cs = Customer.eager_load(orders: { books: [:supplier, :author] }).where(orders: { number: 'ORD-001' }).limit(10)
cs.each { |c| puts c.orders.size }
This result to 2 queries with join tables
Pros - eager_load
Advantages:-
- loads eveything in one query when no condition provided
- give you advantage of filtering records before-hand in maximum of 2 queries
Cons - eager_load
Disadvantages:-
- always creates large join tables β which is expensive
- results in duplicate rows so uses
DISTINCTβ which is an expensive operation - can become very slow in deeply nested associations
includes method
To eager load associated records earlier before making nested attribute calls, you can use includes method in model,
model-collection or association object.
includes tries to reduce number of queries as possible.
Example 1 - Simple include
1
2
customers = Customer.includes(:orders).limit(10)
customers.each { |c| puts c.orders.number }
this will just make 2 queries.
1
2
Customer Load (0.2 ms) SELECT "customers".* FROM "customers" LIMIT ? [["LIMIT", 10]]
Order Load (0.1 ms) SELECT "orders".* FROM "orders" WHERE "orders"."customer_id" IN (?, ?, ?) [["customer_id", 1], ["customer_id", 2], ["customer_id", 3]]
Example 2 - nested include with condition
1
Customer.includes(orders: { books: [:supplier, :author] }).find(1)
This will make 6 separate queries. It makes one query per association. Keep in mind that find(1) is also a condition.
Wait, but why 6 queries?
1 extra query for the join table books_orders which was defined by has_and_belongs_to_many relation.
1
2
3
4
5
6
Customer Load (0.0ms) SELECT "customers".* FROM "customers" WHERE "customers"."id" = ? LIMIT ? [["id", 1], ["LIMIT", 1]]
Order Load (0.1ms) SELECT "orders".* FROM "orders" WHERE "orders"."customer_id" = ? [["customer_id", 1]]
HABTM_Books Load (0.2ms) SELECT "books_orders".* FROM "books_orders" WHERE "books_orders"."order_id" IN (?, ?) [["order_id", 1], ["order_id", 2]]
Book Load (0.1ms) SELECT "books".* FROM "books" WHERE "books"."id" IN (?, ?, ?) [["id", 1], ["id", 2], ["id", 3]]
Supplier Load (0.0ms) SELECT "suppliers".* FROM "suppliers" WHERE "suppliers"."id" IN (?, ?) [["id", 1], ["id", 2]]
Author Load (0.0ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" IN (?, ?) [["id", 1], ["id", 2]]
Example 3 - using where clause
Chaining where method, you can make conditional eager loading. This is useful when you are eventually going to make
where queries below to find specific records. Eager loading conditionally will resolve the complexity and boost
performance.
Using where clause, you will force it to use eager_load method internally and it will make 2 queries. One to get the
valid owner records(in this case Customer with condition) and, another query to ger records from the big join table it
will make.
You can make where query either using
- Structured query:
hash,arrayornested hashparams - string
SQLfragments
Structured where Query
It is simply supported
1
2
3
4
5
6
7
8
cs = Customer.includes(orders: { books: [:supplier, :author] }).where(orders: { number: 'ORD-001' }).limit(10)
# rails does not execute the query till there comes a enforcing method call in the chain
# like `first`, `all`, `last`, ....
cs.each { |c| puts c.orders.size }
# now it executes the query
Because of the conditions, instead of making separate where queries like in examples above, it will make
left outer join query. So number of queries will be just 2. First query will satisfy the condition and second query
will fetch data based on results of the first query.
1
2
3
SQL (0.3ms) SELECT DISTINCT "customers"."id" FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id" LEFT OUTER JOIN "books_orders" ON "books_orders"."order_id" = "orders"."id" LEFT OUTER JOIN "books" ON "books"."id" = "books_orders"."book_id" LEFT OUTER JOIN "suppliers" ON "suppliers"."id" = "books"."supplier_id" LEFT OUTER JOIN "authors" ON "authors"."id" = "books"."author_id" WHERE "orders"."number" = ? LIMIT ? [["number", "ORD-001"], ["LIMIT", 10]]
Customer Eager Load (0.1ms) SELECT "customers"."id" AS t0_r0, "customers"."name" AS t0_r1, "customers"."created_at" AS t0_r2, "customers"."updated_at" AS t0_r3, "orders"."id" AS t1_r0, "orders"."customer_id" AS t1_r1, "orders"."number" AS t1_r2, "orders"."created_at" AS t1_r3, "orders"."updated_at" AS t1_r4, "books"."id" AS t2_r0, "books"."title" AS t2_r1, "books"."supplier_id" AS t2_r2, "books"."author_id" AS t2_r3, "books"."created_at" AS t2_r4, "books"."updated_at" AS t2_r5, "suppliers"."id" AS t3_r0, "suppliers"."name" AS t3_r1, "suppliers"."created_at" AS t3_r2, "suppliers"."updated_at" AS t3_r3, "authors"."id" AS t4_r0, "authors"."name" AS t4_r1, "authors"."created_at" AS t4_r2, "authors"."updated_at" AS t4_r3 FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id" LEFT OUTER JOIN "books_orders" ON "books_orders"."order_id" = "orders"."id" LEFT OUTER JOIN "books" ON "books"."id" = "books_orders"."book_id" LEFT OUTER JOIN "suppliers" ON "suppliers"."id" = "books"."supplier_id" LEFT OUTER JOIN "authors" ON "authors"."id" = "books"."author_id" WHERE "orders"."number" = ? AND "customers"."id" = ? [["number", "ORD-001"], ["id", 1]]
Unstructured where query - String SQL Fragments
1
2
cs = Customer.includes(orders: { books: [:supplier, :author] }).where("orders.name = 'ORD-001'").limit(10)
cs.each { |c| puts c.orders.size }
this will result in error
1
2
no such column: orders.name: (SQLite3::SQLException)
SELECT "customers".* FROM "customers" WHERE (orders.number = 'ORD-001') LIMIT ?
What happens here is, Rails does not know whether you are referring to the orders table’s name column or
orders.name is a column name.
So, you have to explicitly specify that you are referencing a table orders using references method.
1
2
cs = Customer.includes(orders: { books: [:supplier, :author] }).where("orders.name = 'ORD-001'").references(:orders).limit(10)
cs.each { |c| puts c.orders.size }
Pros - includes
Advantages of includes are
- It auto switches between
preloadandeager_loadmethod internally whenever it seeswhereclause attached to the chain.whereβeager_load- no
whereβpreload
Which one is better & when to use?
Choosing between includes, preload and eager_load depends on what you are doing.
- if you need associated records but do not need filtering
- use
preload
- use
- if you need associated records and also you need filtering or sorting of associated records based on certain conditions
- use
eager_load
- use
Conclusion
Using includes seems to be the best option as it switches strategy based on your needs.
