Wednesday, July 23, 2008

NOT IN (Vs) NOT EXISTS

“NOT IN” and “NOT EXISTS” clauses are not the same functionally or performance wise and, therefore, should be used appropriately. NOT IN is different then the NOT EXISTS. But both the query return same amount of records, when the sub query we use in the NOT IN does not contain any null values. NOT IN does not have the ability to compare the NULL values.

The below example is tested in oracle9i release 9.2.0.6.0.

Here is the example for NOT IN versus NOT EXISTS.

We create a table employee with 10 employees. 9 employees out of 10, they are reporting to one manager called Jim. The employee Jim is head of the team and he is not reporting to any one.

Here is the table and content.

SQL> CREATE TABLE employee
2 (
3 EMPNO NUMBER(10) NOT NULL PRIMARY KEY,
4 EMPNAME VARCHAR2(20 CHAR),
5 MGR NUMBER(10) NULL
6 )
7 /

Table created.

SQL>
SQL> INSERT INTO EMPLOYEE VALUES (1, 'TAN', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (2, 'TOMMY', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (3, 'SCOTT', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (4, 'JOE', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (5, 'JIM', NULL);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (6, 'LEENA', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (7, 'TEENA', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (8, 'HACKEN', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (9, 'JEESE', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (10, 'TURNER', 5);

1 row created.

SQL> COMMIT
2 /

Commit complete.

SQL>

Now the goal is to list all the employees who are not managers...

Use NOT IN

SQL> select count(*) from employee where empno not in ( select mgr from employee );

COUNT(*)
----------
0

SQL>
So above query shows that, all the employees are managers.. It is not correct.

Use NOT EXISTS

SQL> select count(*) from employee T1
2 where not exists (select 1 from employee T2 where t2.mgr = t1.empno)
3 /

COUNT(*)
----------
9

SQL>

Now the above query returns 9 employees and it make sense.. So when we have NULL, we need to make sure, we use extra condition in the subquery where clause as below...

SQL> select count(*) from employee where empno not in
2 (select mgr from employee where mgr is not null)
3 /

COUNT(*)
----------
9
SQL>

Performance between NOT IN & NOT EXISTS

Based on my experience, I would recommend to try both (NOT IN & NOT EXISTS) and see which one gives better performance and take the final decision. In my experience, there are cases, performance is very good when i use NOT EXISTS.

Here is one real time example... In this example, import_yi_b table has 80580 records. sap_sohead table has 288382 records. sap_sohead.sonum is NOT NULL field and it is indexed. This two table has recent statistics.

When i use NOT EXISTS, it takes only one second. Here it correlates the inner query with the outer query. The optimizer no longer needs to scan the whole index because now it is only searching for subset of the data, as defined by the correlation.

For the below query, based on the explain plan, it use the index range scan on SAP_SOHEAD table.

12:59:00 SQL> select count(sonum) from import_yi_b x
12:59:00 2 where not exists(select 1 from sap_sohead h where h.sonum = x.sonum);

COUNT(SONUM)
------------
76080

12:59:01 SQL>

When i use NOT IN , query runs for ever. It took one hour to complete this query. It is not correlated, so the whole index is scanned to retrieve possible values. The optimizer doesn't know which values i am interested in.

For the below query, based on the explain plan, it use the index full scan on SAP_SOHEAD table. This is the difference i found on explain plan between NOT IN & NOT EXISTS clause.

13:55:12 SQL> select count(sonum) from import_yi_b x
13:55:12 2 where x.sonum not in
13:55:12 3 (select h.sonum from sap_sohead h);

COUNT(SONUM)
------------
76080

14:55:23 SQL>

So the same query, it took one minute for NOT EXISTS clause, took one hour for NOT IN clause. It is huge performance improvement when i replace the NOT EXISTS in the place of NOT IN.

Conclusion : NOT IN & NOT EXISTS are not substitute for each other. But NOT IN & NOT EXISTS produce the same result when subquery we use in the NOT IN does not contain NULLS. So when we use NOT IN, we need to ensure that, either the columns being selected must have NOT NULL constraints or there must be predicates in the WHERE clause of the subquery to ensure there are no nulls.

No comments: