Here’s a problem for anyone reading this.
I have a very slow derby query, and I am looking for alternatives. The query is of the form
select TR.rid from (
select s.rid, ROW_NUMBER() OVER () AS R from (
select distinct a.rid
from cn_css a , cn_css a0
where ( a0.cid = ? and a0.v = ? and a0.rid = a.rid ) AND 1 = 1 ) as s
) as TR
where TR.R > 0 and TR.R <= 25+0
The table cn_css contains key value pairs in cid and v, and references row ids rid, (ie columns rid, cid, v) so the inner
select should find a distinct set of row ids, potentially sorted (although not in this example). The nest outer query numbers the results and the most outer query selects the first 25 results.
I have a feeling the select distinct is a bad idea as it will sort on disk, but at the moment I cant find an alternative. Still looking, will post back here if I find one.
Hi Ian,
I used to get this complicated, and a lot more, with SQL, but that was years ago now. Still, this is an intriguing question.
The row_number() function you are using is a part of a whole class of “analytic” or “windowing” functions. A few years back Derby had only minimal support, but if that has been fleshed out now then those functions might be a way to remove the distinct and maybe even the self join.
Also, I believe derby supports the SELECT TOP … N syntax. Will that work to get the first 25 rows for you?
Best regards,
-james.
@James,
Top will help for most cases, we will still need to pull a few pages of results the more pages the user goes into the result set, but that may well be better than the cost of row_number().
There is a possibility that removing the a0 table in the distinct select might cause the query to execute the distinct not using a sort operation. (ie using a hash).
Thanks for the pointers.
@ian,
A quick search: 1) reminded me that derby uses the OLAP term for these functions; 2) suggests that the support is still mostly missing.
http://comments.gmane.org/gmane.comp.apache.db.derby.user/13508
I hope your other ideas give better results!
-james.
Hi Ian, correct me if I’m wrong… I didn’t test this. But I believe this query is equivalent and should run faster:
select TR.rid from (
select distinct a.rid, ROW_NUMBER() OVER () AS R from cn_css a where a.cid = ? and a.v = ?
) as TR where TR.R > 0 and TR.R <= 25