Eager Loading

It is a feature provided by ActiveRecord which helps you to reduce number of queries you make to the database. It loads all the records that are to be used below before you write actual logic to prevent multiple unnecessary database hits.

Advantages:-

  • faster response time
  • lower DB server usage, better throughput
  • reduced server cost to the company

preload method

This method executes one extra query per association.

Its like includes and

  • supports array, hash and, nested-hash as parameters
  • executes 1 query per association

Unline includes method

  • it does not support where clause
  • it does not switch to eager_load and does not make join tables

Example - preload

1
Customer.preload(orders: { books: [:supplier, :author] }).find(1)

Output:-

1
2
3
4
5
6
Customer Load (0.0 ms) SELECT "customers".* FROM "customers" WHERE "customers"."id" = ? LIMIT ? [["id", 1], ["LIMIT", 1]]
Order Load (0.1 ms) SELECT "orders".* FROM "orders" WHERE "orders"."customer_id" = ? [["customer_id", 1]]
HABTM_Books Load (0.0 ms) SELECT "books_orders".* FROM "books_orders" WHERE "books_orders"."order_id" IN (?, ?) [["order_id", 1], ["order_id", 2]]
Book Load (0.0 ms) SELECT "books".* FROM "books" WHERE "books"."id" IN (?, ?, ?) [["id", 1], ["id", 2], ["id", 3]]
Supplier Load (0.0 ms) SELECT "suppliers".* FROM "suppliers" WHERE "suppliers"."id" IN (?, ?) [["id", 1], ["id", 2]]
Author Load (0.0 ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" IN (?, ?) [["id", 1], ["id", 2]]

Pros - preload

Advantages of using preload are

  • its best for long chain of associations
    • because it does not join tables
  • it best when tables are huge and have large number of columns
  • as it does not use DISTINCT and JOIN clause, its response time would be smaller
  • avoids memory bloating

Cons - preload

Disadvantages

  • executes multiple queries – 1 per association ( 1 per HABTM table)
  • does not support conditions using where so avoid using it when you need to filter rows on associated tables.

eager_load method

It will prepare a single query if no condition is attached. It is slower and less efficient because it joins all the tables including all the columns. It will always prepare a LEFT OUTER JOIN based query.

Cons of this method are

  • it always prepares 2 join table and 2 queries when supplied with a condition
  • slower than preload method

Example 1 - without condition

1
Customer.eager_load(:orders).each { |c| puts c.orders.size }

Output:-

1
Customer Eager Load (0.0 ms) SELECT "customers"."id" AS t0_r0, "customers". "name" AS t0_r1, "customers". "created_at" AS t0_r2, "customers". "updated_at" AS t0_r3, "orders". "id" AS t1_r0, "orders". "customer_id" AS t1_r1, "orders". "number" AS t1_r2, "orders". "created_at" AS t1_r3, "orders". "updated_at" AS t1_r4 FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id"

As, you can see it executes only one query.

Example 2 - with condition

1
Customer.eager_load(:orders).limit(10).each { |c| puts c.orders.size }

Output:-

1
2
3
SQL (0.2 ms) SELECT DISTINCT "customers"."id" FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id" LIMIT ? [["LIMIT", 10]]

Customer Eager Load (0.1 ms) SELECT "customers"."id" AS t0_r0, "customers". "name" AS t0_r1, "customers". "created_at" AS t0_r2, "customers". "updated_at" AS t0_r3, "orders". "id" AS t1_r0, "orders". "customer_id" AS t1_r1, "orders". "number" AS t1_r2, "orders". "created_at" AS t1_r3, "orders". "updated_at" AS t1_r4 FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id" WHERE "customers"."id" IN (?, ?, ?) [["id", 1], ["id", 2], ["id", 3]]

Keep in mind that even limit(10) is a condition and triggered 2 queries.

Example 3 - Nested Params & Nested where

1
2
cs = Customer.eager_load(orders: { books: [:supplier, :author] }).where(orders: { number: 'ORD-001' }).limit(10)
cs.each { |c| puts c.orders.size }

This result to 2 queries with join tables

Pros - eager_load

Advantages:-

  • loads eveything in one query when no condition provided
  • give you advantage of filtering records before-hand in maximum of 2 queries

Cons - eager_load

Disadvantages:-

  • always creates large join tables – which is expensive
  • results in duplicate rows so uses DISTINCT – which is an expensive operation
  • can become very slow in deeply nested associations

includes method

To eager load associated records earlier before making nested attribute calls, you can use includes method in model, model-collection or association object.

includes tries to reduce number of queries as possible.

Example 1 - Simple include

1
2
customers = Customer.includes(:orders).limit(10)
customers.each { |c| puts c.orders.number }

this will just make 2 queries.

1
2
Customer Load (0.2 ms) SELECT "customers".* FROM "customers" LIMIT ? [["LIMIT", 10]]
Order Load (0.1 ms) SELECT "orders".* FROM "orders" WHERE "orders"."customer_id" IN (?, ?, ?) [["customer_id", 1], ["customer_id", 2], ["customer_id", 3]]

Example 2 - nested include with condition

1
Customer.includes(orders: { books: [:supplier, :author] }).find(1)

This will make 6 separate queries. It makes one query per association. Keep in mind that find(1) is also a condition.

Wait, but why 6 queries? 1 extra query for the join table books_orders which was defined by has_and_belongs_to_many relation.

1
2
3
4
5
6
Customer Load (0.0ms)  SELECT "customers".* FROM "customers" WHERE "customers"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]
Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."customer_id" = ?  [["customer_id", 1]]
HABTM_Books Load (0.2ms)  SELECT "books_orders".* FROM "books_orders" WHERE "books_orders"."order_id" IN (?, ?)  [["order_id", 1], ["order_id", 2]]
Book Load (0.1ms)  SELECT "books".* FROM "books" WHERE "books"."id" IN (?, ?, ?)  [["id", 1], ["id", 2], ["id", 3]]
Supplier Load (0.0ms)  SELECT "suppliers".* FROM "suppliers" WHERE "suppliers"."id" IN (?, ?)  [["id", 1], ["id", 2]]
Author Load (0.0ms)  SELECT "authors".* FROM "authors" WHERE "authors"."id" IN (?, ?)  [["id", 1], ["id", 2]]

Example 3 - using where clause

Chaining where method, you can make conditional eager loading. This is useful when you are eventually going to make where queries below to find specific records. Eager loading conditionally will resolve the complexity and boost performance.

Using where clause, you will force it to use eager_load method internally and it will make 2 queries. One to get the valid owner records(in this case Customer with condition) and, another query to ger records from the big join table it will make.

You can make where query either using

  • Structured query: hash, array or nested hash params
  • string SQL fragments

Structured where Query

It is simply supported

1
2
3
4
5
6
7
8
cs = Customer.includes(orders: { books: [:supplier, :author] }).where(orders: { number: 'ORD-001' }).limit(10)

# rails does not execute the query till there comes a enforcing method call in the chain
#  like `first`, `all`, `last`, ....

cs.each { |c| puts c.orders.size }
# now it executes the query

Because of the conditions, instead of making separate where queries like in examples above, it will make left outer join query. So number of queries will be just 2. First query will satisfy the condition and second query will fetch data based on results of the first query.

1
2
3
SQL (0.3ms)  SELECT DISTINCT "customers"."id" FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id" LEFT OUTER JOIN "books_orders" ON "books_orders"."order_id" = "orders"."id" LEFT OUTER JOIN "books" ON "books"."id" = "books_orders"."book_id" LEFT OUTER JOIN "suppliers" ON "suppliers"."id" = "books"."supplier_id" LEFT OUTER JOIN "authors" ON "authors"."id" = "books"."author_id" WHERE "orders"."number" = ? LIMIT ?  [["number", "ORD-001"], ["LIMIT", 10]]

Customer Eager Load (0.1ms)  SELECT "customers"."id" AS t0_r0, "customers"."name" AS t0_r1, "customers"."created_at" AS t0_r2, "customers"."updated_at" AS t0_r3, "orders"."id" AS t1_r0, "orders"."customer_id" AS t1_r1, "orders"."number" AS t1_r2, "orders"."created_at" AS t1_r3, "orders"."updated_at" AS t1_r4, "books"."id" AS t2_r0, "books"."title" AS t2_r1, "books"."supplier_id" AS t2_r2, "books"."author_id" AS t2_r3, "books"."created_at" AS t2_r4, "books"."updated_at" AS t2_r5, "suppliers"."id" AS t3_r0, "suppliers"."name" AS t3_r1, "suppliers"."created_at" AS t3_r2, "suppliers"."updated_at" AS t3_r3, "authors"."id" AS t4_r0, "authors"."name" AS t4_r1, "authors"."created_at" AS t4_r2, "authors"."updated_at" AS t4_r3 FROM "customers" LEFT OUTER JOIN "orders" ON "orders"."customer_id" = "customers"."id" LEFT OUTER JOIN "books_orders" ON "books_orders"."order_id" = "orders"."id" LEFT OUTER JOIN "books" ON "books"."id" = "books_orders"."book_id" LEFT OUTER JOIN "suppliers" ON "suppliers"."id" = "books"."supplier_id" LEFT OUTER JOIN "authors" ON "authors"."id" = "books"."author_id" WHERE "orders"."number" = ? AND "customers"."id" = ?  [["number", "ORD-001"], ["id", 1]]

Unstructured where query - String SQL Fragments

1
2
cs = Customer.includes(orders: { books: [:supplier, :author] }).where("orders.name = 'ORD-001'").limit(10)
cs.each { |c| puts c.orders.size }

this will result in error

1
2
no such column: orders.name: (SQLite3::SQLException)
SELECT "customers".* FROM "customers" WHERE (orders.number = 'ORD-001') LIMIT ?

What happens here is, Rails does not know whether you are referring to the orders table’s name column or orders.name is a column name. So, you have to explicitly specify that you are referencing a table orders using references method.

1
2
cs = Customer.includes(orders: { books: [:supplier, :author] }).where("orders.name = 'ORD-001'").references(:orders).limit(10)
cs.each { |c| puts c.orders.size }

Pros - includes

Advantages of includes are

  • It auto switches between preload and eager_load method internally whenever it sees where clause attached to the chain.
    • where β†’ eager_load
    • no where β†’ preload

Which one is better & when to use?

Choosing between includes, preload and eager_load depends on what you are doing.

  • if you need associated records but do not need filtering
    • use preload
  • if you need associated records and also you need filtering or sorting of associated records based on certain conditions
    • use eager_load

Conclusion

Using includes seems to be the best option as it switches strategy based on your needs.