Handling Has Many Relationships with Slick
Relational data is a normal thing to run into in web applications. And handling this data is often simple when dealing with single records or lists displaying a lot of content. But sometimes it can require a bit of thought to get our data in the form we want it to be.
For example, consider the Has Many relationship between two objects. A simple parent-child relationship is expressed cleanly in SQL when parsed in two queries:
SELECT id, datafield, andstuff FROM table1 WHERE id = 1; -- Returns one row SELECT id, otherdata, fId FROM table2 WHERE fId = 1; -- Returns multiple rows
The above are the type of queries that would be issued if your system did something like the following in pseudo code:
function getTable1Obj(id) d = doDataBaseLookUp(id) return mapToApplicationObj(d) function enrichObjectWithTable2Info(obj) list = doDataBaseLookUp(obj.id) obj.table2Stuff = list return obj
If you're viewing a single object, then this is pretty normal and what one might expect to see in a simple application. After all, 2 queries isn't that many. But what about when we want a list of the information?
SELECT id, datafield, andstuff FROM table1; -- Returns a bunch of rows SELECT id, otherdata, fId from [table2 WHERE fId = 1 SELECT id, otherdata, fId from table2 WHERE fId = 2 SELECT id, otherdata, fId from table2 WHERE fId = 3 SELECT id, otherdata, fId from table2 WHERE fId = 4 ...
The above queries would be the case if our application code was doing something like this in pseudo code:
for each object in table1: object = enrichObjectWithTable2Info(object) doStuffWithObject
What we've just ran into is called the N+1 problem. And it's pretty easy to get around if you know your databases. Specifically, the JOIN statement. We can pull back all the information with a single query in both cases:
-- Case 1 SELECT t1.id, t1.datafield, t1.andstuff, t2.id, t2.otherdata FROM t1 JOIN t2 ON t1.id=t2.fId WHERE t1.id = 1 -- Returns a bunch of rows -- Case 2 SELECT t1.id, t1.datafield, t1.andstuff, t2.id, t2.otherdata,t2.fId FROM t1 JOIN t2 ON t1.id=t2.fId -- Returns a bunch of rows
This is great performance wise, but now we also need to deal with the fact that our application has to handle multiple rows and not just a single one anymore. For each row in t2, we'll have duplicate content from t1 in the records returned:
| 1, "foo", "bar", 1, "boz", 1 | | 1, "foo", "bar", 2, "baz", 1 | | 1, "foo", "bar", 3, "bar", 1 | | 2, "fiz", "bar", 4, "boz", 2 |
If we're hoping to get a single Table1 object, with a field that has a list of Table2 objects, we'll need to aggregate the records in our code and assign them appropriately. To do this, we can turn to scala's great mapping functions. Let's assume we have a couple of case classes that slick is mapping to:
case class T2(id: Int, otherdata: String, fId: Int) case class T1(id: Int, datafield: String, andstuff: String) { var t2s : Option[List[T2]] = None }
And that we're retrieving our data with a simple join like this:
/* How you might query Slick for this: */ for { r <- table1 e <- table2 if e.fId === r.id } yield (r,e)
Since we're using yield (r,e)
we'll end up with tuples of data
that will look like this:
val results = List( (T1(1, "foo", "bar"), T2(1,"boz",1)), (T1(1, "foo", "bar"), T2(2,"baz",1)), (T1(1, "foo", "bar"), T2(3,"bar",1)), (T1(2, "fiz", "bar"), T2(4,"boz",2)) )
To map these into two objects (with id's 1 and 2) that have their
table2 rows listed in the t2s
field, we can use a couple functions
from scala's collection framework. First off, we can use the convenient
groupBy to break them down into a form like so:
results.groupBy(_._1) /* Whose type is: scala.collection.immutable.Map[T1,List[(T1, T2)]] = Map( T1(2,fiz,bar) -> List( (T1(2,fiz,bar),T2(4,boz,2)) ), T1(1,foo,bar) -> List( (T1(1,foo,bar),T2(1,boz,1)), (T1(1,foo,bar),T2(2,baz,1)), (T1(1,foo,bar),T2(3,bar,1)) ) ) */
This is already pretty close, all we really need to do is map over the
collection and return the T1
classes with the appropriate list:
results.groupBy(_._1).map { case (t1, tuples) => { t1.t2s = Some(tuples.map(_._2)) t1 } }
Whose type is immutable.Iterable[T1]
, we can convert this into a list
by calling toList
at the end, and then we'll be able to use the single
object along with its children easily:
> val t1s = results.groupBy(_._1).map { case (t1, tuples) => { t1.t2s = Some(tuples.map(_._2)) t1 } }.toList > t1s(1).t2s Option[List[T2]] = Some(List(T2(1,boz,1), T2(2,baz,1), T2(3,bar,1)))
There's not much to be said about this, we've grouped by the class T1 at
first. With case classes, this means that the comparator is based on the
items in the constructor (so t2s
does not affect comparing two T1
s)
and we'll end up with a list of Tuples for the value and a T1
for each
key. Mapping over this we can get both key and value of the map by using
the case (t1, tuples)
statement. Traversing the tuples and collecting
only the T2
s is simple to do with another map inside the case
statement.
You might be wondering, why are we using Option[List[T2]]
? This is so
we can implement performant look ups via a pattern I've started using
in my own work:
case class T1(id: Int) { var someChild : Option[List[Child]] = None def getChild() = { if(!someChild.isDefined) { Child.getById(this.id) } someChild } }
The benefit to this pattern is that you can retrieve a single object
from the database, then only get the children if you need them. (This
is the two query approach), but you can also use a join statement and
assign the children to the object (as we've done above) and then still
call getChild
and it will not go to the database.
This is beneficial because your client code for your view only ever calls
a single method getChild
and you don't have to worry about if your
data came from a join statement, or hasn't shown up yet. If you're not
careful you will end up with an N+1 problem, but so long as you are mapping
appropriately you won't hit it and can aggregate the joined information
onto the parent side of the relationship in one shot.
As a final tip, it can quickly become unreadable as you join more and more tables. Therefore it is wise to abstract your aggregation functions into seperate helper methods like so:
private def _aggChildrenFromJoinToParent(t1: Recipe, tuples: List[(T1, T2)]) = { t1.t2s = Some(tuples.map(_._2)) t1 } private def aggChildrenFromJoinToParent = (_aggChildrenFromJoinToParent _).tupled
And then you can use it like so:
val t1s = results.groupBy(_._1).map(aggChildrenFromJoinToParent).toList
This is more readable if you have a large amount of mapping or child objects. What's going on is you are "tupling" the function. Treating the function as a partially applied function and then using it in place of the case statement.