Wednesday, July 23, 2008

Anti Join algorithm

This article was written in oracle9i release 9.2.0.6.0. The below test result may varry from Oracle one release to another release.

An “anti-join” between two tables returns rows from the first table where no matches are found in the second table. An anti-join is essentially the opposite of a semi-join: While a semi-join returns one copy of each row in the first table for which at least one match is found, an anti-join returns one copy of each row in the first table for which no match is found. Anti-joins are written using the NOT EXISTS or NOT IN constructs. These two constructs differ in how they handle nulls.

Suppose you have the DEPT and EMP tables in the SCOTT schema.

SQL> select count(*) from emp;

COUNT(*)
----------
458752
SQL> select count(*) from dept;

COUNT(*)
----------
5
SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'EMP
',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'DEP
T',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

Let us test the Anti Join

We want to list all the departments in department table where the department does not have any employees in employee table.

Here is one way to write the query.
SQL> select count(*) from(
2 SELECT D1.deptno
3 FROM dept D1
4 MINUS
5 SELECT D2.deptno
6 FROM dept D2, emp E2
7 WHERE D2.deptno = E2.deptno)
8 /

COUNT(*)
----------
2
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=2162 Card=1)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=2162 Card=5)
3 2 MINUS
4 3 SORT (UNIQUE NOSORT) (Cost=7 Card=5 Bytes=15)
5 4 INDEX (FULL SCAN) OF 'SYS_C0010206' (UNIQUE)
6 3 SORT (UNIQUE) (Cost=2155 Card=461910 Bytes=2771460)
7 6 NESTED LOOPS (Cost=161 Card=461910 Bytes=2771460)
8 7 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE)
(Cost=161 Card=461910 Bytes=1385730)
9 7 INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)

The above query will give the desired results, but it might be clearer to write the query using an anti-join:

SQL> SELECT COUNT(D.deptno)
2 FROM dept D
3 WHERE D.deptno NOT IN
4 (
5 SELECT E.deptno
6 FROM emp E
7 WHERE E.deptno IS NOT NULL
8 ) ;

COUNT(D.DEPTNO)
---------------
2
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=806 Card=1 Bytes=6)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (ANTI) (Cost=806 Card=2 Bytes=12)
3 2 INDEX (FULL SCAN) OF 'SYS_C0010206' (UNIQUE)
4 2 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost=161 Card=277146 Bytes=831438)

We can write the above query by using NOT EXISTS clause. The below query will also produce the same result as above.

SELECT d.deptno, d.dname
FROM dept D
WHERE NOT EXISTS (SELECT 1
FROM emp E WHERE E.deptno = D.deptno)
/

Here is another good candidate for an ANTI-JOIN access path

Suppose you want a list of customers who have not placed an order within the last ten days. You might start with a query that looks like:
SELECT C.short_name, C.customer_id
FROM customers C
WHERE NOT EXISTS(
SELECT 1
FROM orders O
WHERE O.customer_id = C.customer_id
AND O.order_date > SYSDATE - 10)
ORDER BY C.short_name;
/

SELECT C.short_name, C.customer_id
FROM customers C
WHERE C.customer_id NOT IN(
SELECT O.customer_id
FROM orders O
WHERE O.order_date > SYSDATE - 10
AND O.customer_id IS NOT NULL)
ORDER BY C.short_name
/

In Oracle 9i, an anti-join can be performed using the nested loops, merge, or hash join algorithms. As with a conventional join, Oracle will choose the join algorithm with the lowest cost. Oracle provides the NL_AJ, MERGE_AJ, and HASH_AJ hints in order for you to manipulate the anti-join process if you need to. As with the anti-join hints, the hint is applied to the subquery and not the main body of the query itself.

No comments: