Very Slow Derby Queries

13 05 2011

Here’s a problem for anyone reading this.

I have a very slow derby query, and I am looking for alternatives. The query is of the form

select TR.rid from (
select s.rid, ROW_NUMBER() OVER () AS R from (
select distinct a.rid
from cn_css a , cn_css a0
where ( a0.cid = ? and a0.v = ? and a0.rid = a.rid ) AND 1 = 1 ) as s
)  as TR
where TR.R > 0 and TR.R <= 25+0


The table cn_css contains key value pairs in cid and v, and references row ids rid,  (ie columns rid, cid, v) so the inner
select should find a distinct set of row ids, potentially sorted (although not in this example). The nest outer query numbers the results and the most outer query selects the first 25 results.

I have a feeling the select distinct is a bad idea as it will sort on disk, but at the moment I cant find an alternative. Still looking, will post back here if I find one.

Advertisements

Actions

Information

4 responses

16 05 2011
James Stansell

Hi Ian,

I used to get this complicated, and a lot more, with SQL, but that was years ago now. Still, this is an intriguing question.

The row_number() function you are using is a part of a whole class of “analytic” or “windowing” functions. A few years back Derby had only minimal support, but if that has been fleshed out now then those functions might be a way to remove the distinct and maybe even the self join.

Also, I believe derby supports the SELECT TOP … N syntax. Will that work to get the first 25 rows for you?

Best regards,

-james.

17 05 2011
Ian

@James,
Top will help for most cases, we will still need to pull a few pages of results the more pages the user goes into the result set, but that may well be better than the cost of row_number().
There is a possibility that removing the a0 table in the distinct select might cause the query to execute the distinct not using a sort operation. (ie using a hash).
Thanks for the pointers.

17 05 2011
James Stansell

@ian,

A quick search: 1) reminded me that derby uses the OLAP term for these functions; 2) suggests that the support is still mostly missing.

http://comments.gmane.org/gmane.comp.apache.db.derby.user/13508

I hope your other ideas give better results!

-james.

22 05 2011
Andrew

Hi Ian, correct me if I’m wrong… I didn’t test this. But I believe this query is equivalent and should run faster:
select TR.rid from (
select distinct a.rid, ROW_NUMBER() OVER () AS R from cn_css a where a.cid = ? and a.v = ?
) as TR where TR.R > 0 and TR.R <= 25




%d bloggers like this: