Thursday, July 31, 2008

How do we reclaim the wasted space in a segment?

This article is tested in oracle 9.2.0.6.0. Before reclaiming the wasted space, First, we need to find out exactly how much space is wasted in that segment that can be reclaimed. To find out the wasted space in the segment, we can use SPACE_USAGE procedure in DBMS_SPACE pacakge. To use this procedure, the tablespace should be ASSM enabled.

I have used pipe symbol in the PLSQL procedure. But it is not visible in the blog. So i used ### symbol where ever, there is the pipe sign. So please replace the ### with pipe symbol in the code when you want to run this script in your test database.

DECLARE
v_unformatted_blocks number;
v_unformatted_bytes number;
v_fs1_blocks number;
v_fs1_bytes number;
v_fs2_blocks number;
v_fs2_bytes number;
v_fs3_blocks number;
v_fs3_bytes number;
v_fs4_blocks number;
v_fs4_bytes number;
v_full_blocks number;
v_full_bytes number;
BEGIN
dbms_space.space_usage ('INVENTADMIN',
'COLLECTION',
'TABLE',
v_unformatted_blocks,
v_unformatted_bytes,
v_fs1_blocks,
v_fs1_bytes,
v_fs2_blocks,
v_fs2_bytes,
v_fs3_blocks,
v_fs3_bytes,
v_fs4_blocks,
v_fs4_bytes,
v_full_blocks,
v_full_bytes);
dbms_output.put_line('Unformatted Blocks = '###v_unformatted_blocks);
dbms_output.put_line('FS1 Blocks = '###v_fs1_blocks);
dbms_output.put_line('FS2 Blocks = '###v_fs2_blocks);
dbms_output.put_line('FS3 Blocks = '###v_fs3_blocks);
dbms_output.put_line('FS4 Blocks = '###v_fs4_blocks);
dbms_output.put_line('Full Blocks = '###v_full_blocks);
end;
/

Here is the output for the above code.

Unformatted Blocks = 0
FS1 Blocks = 32
FS2 Blocks = 60
FS3 Blocks = 63
FS4 Blocks = 19671
Full Blocks = 5

PL/SQL procedure successfully completed.
SQL>
As per the above output
32 blocks has 0-25% free.
60 blocks has 25-50% free.
63 blocks has 50-75% free.
19671 blocks has 50-75% free.
5 blocks has 100% free

Now we have lot of blocks 50-75% free. so we can reclaim the space. We can reset the HWM to reorganize the space in the segment. Resetting the HWM is discussed in another topic in the same blog. Please click to read how to reset the HWM.

Saturday, July 26, 2008

Sending Emails through Oracle procedure

This procedure was tested in oracle9i. This procedure has ability to send the email TO/CC/BCC.

Before implementing this procedure, have the Email server host name handy. You can find the email server host name from your email software(lotus notes, outlook express etc). You can also check your admin or help desk to find out the email server host name.

There are only four step to implement this email procedure. Just one note, i have used pipe symbol in the plsql procedure. But it is not visible in the blog. So i used ### symbol where ever, there is the pipe sign. So please replace the ### with pipe symbol in the code.


Step 1

Create the table to store the email address.

CREATE TABLE EMAIL_ADDRESS(
EMAIL VARCHAR2(100),
EMAIL_GROUP VARCHAR(20))
/

We can store the emails on different group. In this example, we use DBA, ADMIN, BUSIENSS, MANAGERS. We can add these groups on TO, CC, BCC based on our requirements.

Here is the sample insert statement to store the email address.

insert into email_address values('scott@xyzcomp.com','DBA');
insert into email_address values('robert@xyzcomp.com','DBA');
insert into email_address values('mat@xyzcomp.com','ADMIN');
insert into email_address values('lissa@xyzcomp.com','ADMIN');
insert into email_address values('herman@xyzcomp.com','BUSINESS');
insert into email_address values('martin@xyzcomp.com','BUSINESS');
insert into email_address values('david@xyzcomp.com','MANAGER');
insert into email_address values('nany@xyzcomp.com','MANAGER');


Step2
create the array to store the email address

CREATE OR REPLACE type group_array as table of varchar2(255)
/

Step3

Here is the content of email package. Create this package..

-- The piple symbol is not displaying in the blog. So i placed ### symbol
-- whereever there is a pipe symbol. If we want to test the script in your test database,
-- please replace the ### symbol with pipe symbol.

create or replace PACKAGE EMAIL_PACKAGE IS

function address_email( p_string in varchar2,
p_recipients in group_array ) return varchar2;

procedure send( p_sender_email in varchar2,
p_from in varchar2,
p_to in group_array default group_array(),
p_cc in group_array default group_array(),
p_bcc in group_array default group_array(),
p_subject in varchar2,
p_body in long);

PROCEDURE LISTEMAILS(P_ARRAY_TO_GROUP IN group_array,
P_ARRAY_CC_GROUP IN group_array ,
P_ARRAY_BCC_GROUP IN group_array,
l_data_to OUT group_array,
l_data_cc OUT group_array,
l_data_bcc OUT group_array);

END EMAIL_PACKAGE;
/

create or replace package body EMAIL_PACKAGE as
g_crlf char(2) default chr(13)###chr(10);
g_mail_conn utl_smtp.connection;
g_mailhost varchar2(255) := 'EMAIL_HOST_NAME';

FUNCTION address_email( p_string in varchar2,
p_recipients in group_array ) return varchar2
is
l_recipients long;
begin
for i in 1 .. p_recipients.count
loop
utl_smtp.rcpt(g_mail_conn, p_recipients(i) );
if ( l_recipients is null )
then
l_recipients := p_string ### p_recipients(i) ;
else
l_recipients := l_recipients ### ', ' ### p_recipients(i);
end if;
end loop;
return l_recipients;
end;

procedure send( p_sender_email in varchar2,
p_from in varchar2,
p_to in group_array default group_array(),
p_cc in group_array default group_array(),
p_bcc in group_array default group_array(),
p_subject in varchar2,
p_body in long) IS
l_to_list long;
l_cc_list long;
l_bcc_list long;
l_date varchar2(255) default
to_char( SYSDATE, 'dd Mon yy hh24:mi:ss' );

procedure writeData( p_text in varchar2 )
as
begin
if ( p_text is not null )
then
utl_smtp.write_data( g_mail_conn, p_text ### g_crlf );
end if;
end;

BEGIN
g_mail_conn := utl_smtp.open_connection(g_mailhost, 25);
utl_smtp.helo(g_mail_conn, g_mailhost);
utl_smtp.mail(g_mail_conn, p_sender_email);
l_to_list := address_email( 'To: ', p_to );
l_cc_list := address_email( 'Cc: ', p_cc );
l_bcc_list := address_email( 'Bcc: ', p_bcc );
utl_smtp.open_data(g_mail_conn );

writeData( 'Date: ' ### l_date );
writeData( 'From: ' ### nvl( p_from, p_sender_email ) );
writeData( 'Subject: ' ### nvl( p_subject, '(no subject)' ) );

writeData( l_to_list );
writeData( l_cc_list );

utl_smtp.write_data( g_mail_conn, '' ### g_crlf );
utl_smtp.write_data(g_mail_conn, p_body );
utl_smtp.close_data(g_mail_conn );
utl_smtp.quit(g_mail_conn);
END send;

PROCEDURE LISTEMAILS(P_ARRAY_TO_GROUP IN group_array ,
P_ARRAY_CC_GROUP in group_array ,
P_ARRAY_BCC_GROUP in group_array ,
l_data_to out group_array ,
l_data_cc out group_array ,
l_data_bcc out group_array) as

CURSOR C1 IS SELECT * FROM EMAIL_ADDRESS;

BEGIN

l_data_to := group_array();
l_data_cc := group_array();
l_data_bcc := group_array();


FOR X IN 1..P_ARRAY_TO_GROUP.COUNT LOOP

FOR I IN C1 LOOP

IF I.EMAIL_GROUP = P_ARRAY_TO_GROUP(X) THEN
l_data_to.extend ;
l_data_to(l_data_to.count) := I.EMAIL;
END IF;
END LOOP;

END LOOP;


FOR X IN 1..P_ARRAY_CC_GROUP.COUNT LOOP

FOR I IN C1 LOOP

IF I.EMAIL_GROUP = P_ARRAY_CC_GROUP(X) THEN
l_data_cc.extend ;
l_data_cc(l_data_cc.count) := I.EMAIL;
END IF;
END LOOP;

END LOOP;

FOR X IN 1..P_ARRAY_BCC_GROUP.COUNT LOOP

FOR I IN C1 LOOP

IF I.EMAIL_GROUP = P_ARRAY_BCC_GROUP(X) THEN
l_data_bcc.extend ;
l_data_bcc(l_data_bcc.count) := I.EMAIL;
END IF;
END LOOP;

END LOOP;



END LISTEMAILS;
END EMAIL_PACKAGE;
/


Step4
Here is the sample unnamed plsql code to send the email.

-- The piple symbol is not displaying in the blog. So i placed ### symbol
-- whereever there is a pipe symbol. If we want to test the script in your test database,
-- please replace the ### symbol with pipe symbol.

declare
to_group group_array DEFAULT group_array();
cc_group group_array DEFAULT group_array();
bcc_group group_array DEFAULT group_array();
r_to_group group_array DEFAULT group_array();
r_cc_group group_array DEFAULT group_array();
r_bcc_group group_array DEFAULT group_array();
v_message VARCHAR2(22767) := 'test';
v_crlf VARCHAR2(2) := CHR(13) ### CHR(10);
v_time NUMBER;
BEGIN

to_group.extend;
to_group(1) := 'DBA';

to_group.extend;
to_group(2) := 'BUSINESS';

cc_group.extend;
cc_group(1) := 'MANAGER';

bcc_group.extend;
bcc_group(1) := 'DBA PAGER';

Email_Package.listemails(to_group,cc_group,bcc_group,r_to_group,r_cc_group,r_bcc_group);
Email_Package.send
( p_sender_email => 'john.doe@xyz.com' ,
p_from => 'IT' ,
p_to => r_to_group ,
p_cc => r_cc_group ,
p_bcc => r_bcc_group ,
p_subject => 'Testing messaage' ,
p_body => v_message);
END;
/

Wednesday, July 23, 2008

NOT IN (Vs) NOT EXISTS

“NOT IN” and “NOT EXISTS” clauses are not the same functionally or performance wise and, therefore, should be used appropriately. NOT IN is different then the NOT EXISTS. But both the query return same amount of records, when the sub query we use in the NOT IN does not contain any null values. NOT IN does not have the ability to compare the NULL values.

The below example is tested in oracle9i release 9.2.0.6.0.

Here is the example for NOT IN versus NOT EXISTS.

We create a table employee with 10 employees. 9 employees out of 10, they are reporting to one manager called Jim. The employee Jim is head of the team and he is not reporting to any one.

Here is the table and content.

SQL> CREATE TABLE employee
2 (
3 EMPNO NUMBER(10) NOT NULL PRIMARY KEY,
4 EMPNAME VARCHAR2(20 CHAR),
5 MGR NUMBER(10) NULL
6 )
7 /

Table created.

SQL>
SQL> INSERT INTO EMPLOYEE VALUES (1, 'TAN', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (2, 'TOMMY', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (3, 'SCOTT', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (4, 'JOE', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (5, 'JIM', NULL);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (6, 'LEENA', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (7, 'TEENA', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (8, 'HACKEN', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (9, 'JEESE', 5);

1 row created.

SQL> INSERT INTO EMPLOYEE VALUES (10, 'TURNER', 5);

1 row created.

SQL> COMMIT
2 /

Commit complete.

SQL>

Now the goal is to list all the employees who are not managers...

Use NOT IN

SQL> select count(*) from employee where empno not in ( select mgr from employee );

COUNT(*)
----------
0

SQL>
So above query shows that, all the employees are managers.. It is not correct.

Use NOT EXISTS

SQL> select count(*) from employee T1
2 where not exists (select 1 from employee T2 where t2.mgr = t1.empno)
3 /

COUNT(*)
----------
9

SQL>

Now the above query returns 9 employees and it make sense.. So when we have NULL, we need to make sure, we use extra condition in the subquery where clause as below...

SQL> select count(*) from employee where empno not in
2 (select mgr from employee where mgr is not null)
3 /

COUNT(*)
----------
9
SQL>

Performance between NOT IN & NOT EXISTS

Based on my experience, I would recommend to try both (NOT IN & NOT EXISTS) and see which one gives better performance and take the final decision. In my experience, there are cases, performance is very good when i use NOT EXISTS.

Here is one real time example... In this example, import_yi_b table has 80580 records. sap_sohead table has 288382 records. sap_sohead.sonum is NOT NULL field and it is indexed. This two table has recent statistics.

When i use NOT EXISTS, it takes only one second. Here it correlates the inner query with the outer query. The optimizer no longer needs to scan the whole index because now it is only searching for subset of the data, as defined by the correlation.

For the below query, based on the explain plan, it use the index range scan on SAP_SOHEAD table.

12:59:00 SQL> select count(sonum) from import_yi_b x
12:59:00 2 where not exists(select 1 from sap_sohead h where h.sonum = x.sonum);

COUNT(SONUM)
------------
76080

12:59:01 SQL>

When i use NOT IN , query runs for ever. It took one hour to complete this query. It is not correlated, so the whole index is scanned to retrieve possible values. The optimizer doesn't know which values i am interested in.

For the below query, based on the explain plan, it use the index full scan on SAP_SOHEAD table. This is the difference i found on explain plan between NOT IN & NOT EXISTS clause.

13:55:12 SQL> select count(sonum) from import_yi_b x
13:55:12 2 where x.sonum not in
13:55:12 3 (select h.sonum from sap_sohead h);

COUNT(SONUM)
------------
76080

14:55:23 SQL>

So the same query, it took one minute for NOT EXISTS clause, took one hour for NOT IN clause. It is huge performance improvement when i replace the NOT EXISTS in the place of NOT IN.

Conclusion : NOT IN & NOT EXISTS are not substitute for each other. But NOT IN & NOT EXISTS produce the same result when subquery we use in the NOT IN does not contain NULLS. So when we use NOT IN, we need to ensure that, either the columns being selected must have NOT NULL constraints or there must be predicates in the WHERE clause of the subquery to ensure there are no nulls.

Avoid defining maximum data type length in Oracle

One of my co-worker asked me this question... Why can't we define VARCHAR2(4000) for all char data type.. So that, we don't need to increase the length in future.. Since VARCHAR2 data type is variable length, it would not occupy the defined field width in database.

I got an answer when i googled this question and i would like to write the answer in this blog..

Here are the answers..

1. When users runs a query in query tool(say for example TOAD, SQL Navigator), it display the data based on the width of the column. It will be very hard to scroll to see the data from left to right and right to left.

2. When developer use the array fetch(Bulk binds, bulk collects) for better performance, oracle allocate the memory as we defined in the field length. Let us say, developers wanted to fetch 10 columns for 100 rows . So, 100 * 10 * 4000 => almost 4MB of RAM, oracle has to allocate to run this query. One connected session eats 4MB memory for this query. Imagine, if 5 people runs the same query at the same time, it will be 20MB RAM..

PLSQL Developers might use column%type to declare PLSQL variable. This might waste system memory badly if we define the max field length.

3. Developers build the data entry screen to load the data into the table. Let us say, Product_code is VARCHAR2(4000), Product_comment is VARCHAR2(4000). No one knows what sort of data can go into the database.

4. Oracle throw ORA-01450 maximum key length (string) exceeded error when combined length of all the columns specified in a CREATE INDEX statement exceeded the maximum index length. The maximum index length varies by operating system. If you are more interested in how the maximum index key length is computed, there is a document (Doc.Id 136158.1) on Metalink that explains this in details

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 – Production

SQL> create table test(text1 varchar2(4000),
2 text2 varchar2(4000),
3 text3 varchar2(4000),
4 text4 varchar2(4000));

Table created.

SQL> create index idx on test(text1,text2);
create index idx on test(text1,text2)
*
ERROR at line 1:
ORA-01450: maximum key length (6398) exceeded

SQL>

Conclusion :- We should not allocate maximum data length. We should use proper size for Primary key & foreign keys. We can always expand whenever we need it. Max size will hurt the application performance, because, it will mistakenly allocate more memory as we discussed in the seocnd point.

If any one wanted to read more on this, please read this link.

http://tkyte.blogspot.com/2006/11/see-you-just-cannot-make-this-stuff-up.html

When does oracle use Full Table Scan?

Full Table Scans reads all the blocks under the High Water Mark and filter out those that do not meet the selection criteria. Each row is examined to determine whether it satisfies the statement’s WHERE clause.

During full table scan, the blocks are read sequentially. Because the blocks are adjacent, I/O calls larger than a single block can be used to speed up the process. The size of the read calls range from one block to the number of blocks indicated by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT. Using multiblock reads means a full table scan can be performed very efficiently.

Full table scans are cheaper than index range scans when accessing a large fraction of the blocks in a table. This is because full table scans can use larger I/O calls, and making fewer large I/O calls is cheaper than making many smaller calls.

When the Optimizer Uses Full Table Scans

Lack of Index
If the query is unable to use any existing indexes, then it uses a full table scan. For example, if there is a function used on the indexed column in the query, the optimizer is unable to use the index and instead uses a full table scan as below

ScanSELECT last_name, first_name
FROM employees
WHERE UPPER(last_name) LIKE :b1

If you need to use the index for case-independent searches, then either do not permit mixed-case data in the search columns or create a function-based index, such as UPPER(last_name), on the search column.

Large Amount of Data
If the optimizer thinks that the query will access most of the blocks in the table, then it uses a full table scan, even though, the index is available. Oracle takes FTS if the data are spreaded in most of the blocks. Optimizer takes FTS if clustering factor is high.

Small Table
If a table contains less than DB_FILE_MULTIBLOCK_READ_COUNT blocks under the high water mark, which can be read in a single I/O call, then a full table scan might be cheaper than an index range scan, regardless of the fraction of tables being accessed or indexes present.

Full Table Scan Hints
Use the hint FULL(table alias) if you want to force the use of a full table scan. For more information on the FULL hint.

SELECT /*+ FULL(e) */ employee_id, last_name
FROM employees e
WHERE last_name LIKE :b1;

Anti Join algorithm

This article was written in oracle9i release 9.2.0.6.0. The below test result may varry from Oracle one release to another release.

An “anti-join” between two tables returns rows from the first table where no matches are found in the second table. An anti-join is essentially the opposite of a semi-join: While a semi-join returns one copy of each row in the first table for which at least one match is found, an anti-join returns one copy of each row in the first table for which no match is found. Anti-joins are written using the NOT EXISTS or NOT IN constructs. These two constructs differ in how they handle nulls.

Suppose you have the DEPT and EMP tables in the SCOTT schema.

SQL> select count(*) from emp;

COUNT(*)
----------
458752
SQL> select count(*) from dept;

COUNT(*)
----------
5
SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'EMP
',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'DEP
T',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

Let us test the Anti Join

We want to list all the departments in department table where the department does not have any employees in employee table.

Here is one way to write the query.
SQL> select count(*) from(
2 SELECT D1.deptno
3 FROM dept D1
4 MINUS
5 SELECT D2.deptno
6 FROM dept D2, emp E2
7 WHERE D2.deptno = E2.deptno)
8 /

COUNT(*)
----------
2
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=2162 Card=1)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=2162 Card=5)
3 2 MINUS
4 3 SORT (UNIQUE NOSORT) (Cost=7 Card=5 Bytes=15)
5 4 INDEX (FULL SCAN) OF 'SYS_C0010206' (UNIQUE)
6 3 SORT (UNIQUE) (Cost=2155 Card=461910 Bytes=2771460)
7 6 NESTED LOOPS (Cost=161 Card=461910 Bytes=2771460)
8 7 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE)
(Cost=161 Card=461910 Bytes=1385730)
9 7 INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)

The above query will give the desired results, but it might be clearer to write the query using an anti-join:

SQL> SELECT COUNT(D.deptno)
2 FROM dept D
3 WHERE D.deptno NOT IN
4 (
5 SELECT E.deptno
6 FROM emp E
7 WHERE E.deptno IS NOT NULL
8 ) ;

COUNT(D.DEPTNO)
---------------
2
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=806 Card=1 Bytes=6)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (ANTI) (Cost=806 Card=2 Bytes=12)
3 2 INDEX (FULL SCAN) OF 'SYS_C0010206' (UNIQUE)
4 2 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost=161 Card=277146 Bytes=831438)

We can write the above query by using NOT EXISTS clause. The below query will also produce the same result as above.

SELECT d.deptno, d.dname
FROM dept D
WHERE NOT EXISTS (SELECT 1
FROM emp E WHERE E.deptno = D.deptno)
/

Here is another good candidate for an ANTI-JOIN access path

Suppose you want a list of customers who have not placed an order within the last ten days. You might start with a query that looks like:
SELECT C.short_name, C.customer_id
FROM customers C
WHERE NOT EXISTS(
SELECT 1
FROM orders O
WHERE O.customer_id = C.customer_id
AND O.order_date > SYSDATE - 10)
ORDER BY C.short_name;
/

SELECT C.short_name, C.customer_id
FROM customers C
WHERE C.customer_id NOT IN(
SELECT O.customer_id
FROM orders O
WHERE O.order_date > SYSDATE - 10
AND O.customer_id IS NOT NULL)
ORDER BY C.short_name
/

In Oracle 9i, an anti-join can be performed using the nested loops, merge, or hash join algorithms. As with a conventional join, Oracle will choose the join algorithm with the lowest cost. Oracle provides the NL_AJ, MERGE_AJ, and HASH_AJ hints in order for you to manipulate the anti-join process if you need to. As with the anti-join hints, the hint is applied to the subquery and not the main body of the query itself.

Semi Join algorithm

This article was written in oracle9i release 9.2.0.6.0. The below test result may varry from Oracle one release to another release.

A “semi-join” between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned.

Suppose you have the DEPT and EMP tables in the SCOTT schema.

SQL> select count(*) from emp;

COUNT(*)
----------
458752
SQL> select count(*) from dept;

COUNT(*)
----------
4
SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'EMP
',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT',TABNAME => 'DEP
T',ESTIMATE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

Let us test the semi join.

We want a list all departments in dept table if department has atleast one employee in emp table.

Here is conventional way to write the query.

SELECT D.deptno, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno ORDER BY D.deptno;

The problem here is, if the department has 100 employees, then department will appear 100 times. We can eliminate the duplicate records by using the DISTINCT clause. But again, oracle would do more work to get the output.

Here is total time it took to complete the query

SQL> declare
2 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
3 v_cnt number;
4 begin
5 SELECT count(distinct D.deptno) into v_cnt
6 FROM dept D, emp E
7 WHERE E.deptno = D.deptno
8 ORDER BY D.deptno;
9 dbms_output.put_line(to_char(ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/60,3)));
10 end;
11 /
.009
Here is the execution plan for the query

SQL> SELECT count(distinct D.deptno)
2 FROM dept D, emp E
3 WHERE E.deptno = D.deptno
4 ORDER BY D.deptno;

COUNT(DISTINCTD.DEPTNO)
-----------------------
3
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=160 Card=1 Bytes
=6)

1 0 SORT (GROUP BY)
2 1 NESTED LOOPS (Cost=160 Card=455570 Bytes=2733420)
3 2 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost
=160 Card=455570 Bytes=1366710)

4 2 INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)

We can use semi join between the DEPT and EMP tables instead of a conventional join:

The below query will list the departments that have at least one employee. Whether a department has one employee or 100, the department will appear only once in the query output. Moreover, Oracle will move on to the next department as soon as it finds the first employee in a department, instead of finding all of the employees in each department.

Here is total time it took to complete the query

SQL> declare
2 l_start NUMBER DEFAULT DBMS_UTILITY.GET_TIME;
3 v_cnt number;
4 begin
5 SELECT count(D.deptno) into v_cnt
6 FROM dept D
7 WHERE EXISTS
8 (
9 SELECT 1
10 FROM emp E
11 WHERE E.deptno = D.deptno
12 )
13 ORDER BY D.deptno;
14 dbms_output.put_line(to_char(ROUND(ROUND((DBMS_UTILITY.GET_TIME - l_start)/100, 2)/60,3)));
15 end;
16 /
.001

Here is the execution plan

SQL> SELECT count(D.deptno)
2 FROM dept D
3 WHERE EXISTS
4 (
5 SELECT 1
6 FROM emp E
7 WHERE E.deptno = D.deptno
8 )
9 ORDER BY D.deptno;

COUNT(D.DEPTNO)
---------------
3

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=641 Card=1 Bytes =6)

1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (SEMI) (Cost=641 Card=3 Bytes=18)
3 2 INDEX (FULL SCAN) OF 'SYS_C0010206' (UNIQUE)
4 2 INDEX (FAST FULL SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost
=160 Card=341678 Bytes=1025034)

Here is the another good candidate for SEMI-JOIN algorithim.

Display list of gold-status customers who have placed an order within the last three days. You might start with a query that looks like:

SELECT C.short_name, C.customer_id
FROM customers C
WHERE C.customer_type = 'Gold'
AND EXISTS(
SELECT 1
FROM orders O
WHERE O.customer_id = C.customer_id
AND O.order_date > SYSDATE - 3)
ORDER BY C.short_name
/

Note : If the query contains a DISTINCT/UNION clause, then Oracle cannot transform EXISTS or IN clauses into something that could use a semi-join access path. Semi join can be used in Nested loop, Hash Join & Sort Merge. If semi join method is not possible, then optimizer use the conventional join with lowest cost. Oracle provides hints(NL_SJ, HASH_SJ, and MERGE_SJ) to enforce semi join algorithm. The hint is applied to the subquery of the EXISTS or IN clause, not the main body of the query itself.

Tuesday, July 22, 2008

IN (vs) EXISTS

At which situation, IN is better then EXISTS?

IN & EXISTS are processed in different way. But we can use IN in the place of EXISTS, also EXISTS can be used in the place of IN.



select * from emp where deptno in(select deptno from dept);

is typically processed as :

select * from emp, ( select distinct deptno from dept ) dept where emp.deptno = dept.deptno;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to the original table.

As opposed to

select * from emp where exists(select null from dept where deptno = emp.deptno)

That is processed more like:

for x in ( select * from emp ) loop

if ( exists ( select null from dept where deptno = emp.deptno )

then OUTPUT THE RECORD

end if

end loop

It always results in a full scan of EMP in EXISTS clause, whereas the first query can make use of an index on EMP.DEPTNO.


So, when is EXISTS appropriate and IN appropriate?

Lets say the result of the subquery is "huge" and takes a long time, outer query is small. Then IN is not good option. Because, it has to distinct the data. EXISTS can be quite efficient for this circumstances.

If the result of sub query is small, and then IN is good option. EXISTS is not good choice.

If both the subquery and the outer table are huge -- either might work as well as the other depends on the indexes and other factors.

When we tune the query, we can try with both IN & EXISTS and we can decide which one would be better...

If you need more info about this, please read the asktom article..

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:953229842074






How do we see what transaction is currently running on the database?

This queries are tested on oracle 9.2.0.8

Display all the statement which are available in memory?

Select * from v$sqlarea;

Note : If full SQL text is not shown in V$SQLAREA view, because it was larger than 1000 characters, this V$SQLTEXT view should be queried to extract the full SQL. So you need to use sql_text column in v$sql_text view instead of using sql_text column in v$sqlarea.

Display only currently running transactions in database?

The below query is used to find what is actively running(uncommitted transactions) in database. The column query_active_time is time between transaction starting time and commit time. For instance, the transaction started at 10AM and completed at 10.05AM. But user commits the transaction at 10.30PM. So the query_active_time will be 30 Seconds. V$transaction does not show any SELECT statement which are running in the database.

select
osuser,
username,
sid,
serial#,
program,
machine,
sql_text,
start_time,
cpu_time/1000000 cpu_time,
elapsed_time/1000000 run_time,
to_number(sysdate-to_date(c.start_time,'MM/DD/YY HH24:MI:SS'))*24*60 query_active_time
from
v$session a,
v$sqlarea b,
v$transaction c
where
((a.sql_address = b.address
and a.sql_hash_value = b.hash_value )
or ( a.prev_sql_addr = b.address
and a.prev_hash_value = b.hash_value))
and c.ses_addr = a.saddr
--and sysdate-to_date(v$transaction.start_time,'MM/DD/YY HH24:MI:SS') > 2/(60*24)


Display all the statement for only connected users?

This query will display all statement in memory only for connected sessions.

select
osuser,
username,
sid,
serial#,
program,
machine,
sql_text,
cpu_time/1000000 cpu_time,
elapsed_time/1000000 run_time
from
v$session a,
v$sqlarea b
where
((a.sql_address = b.address
and a.sql_hash_value = b.hash_value )
or ( a.prev_sql_addr = b.address
and a.prev_hash_value = b.hash_value))
and username is not null


Display all the statement with I/O statistics for connected session?

The below query provides all the transaction which are in memory with I/0 statistics. Sometime, user might be complaining that their query is Stuck. As a DBA, we can check v$sess_io view and see GETS, READS or CHANGES column continue to increase for a session, then you can tell them that the statement is not stuck.

select
a.sid,
a.username,
optimizer_mode,
hash_value,
address,
cpu_time/1000000,
elapsed_time/1000000,
sql_text,
a.sid,
block_gets,
consistent_gets,
physical_reads,
block_changes,
consistent_changes
from
v$session a,
v$sqlarea b,
v$sess_io c
where
a.sql_hash_value = b.hash_value
and a.sql_address = b.address
and a.sid = c.sid
and a.username is not null


Note : The I/0 statistics(consistent_gets, physical_reads, block_changes) continue to increase for each statement for the session. It would not show the statistics for individual statement.

Display all the long running statement in memory?

This query provides all the long transactions for connected/disconnected sessions.

Select * from v$session_longops;

Display all the long running completed transactions(connected/disconnected sessions)?

Select * from v$session_longops where time_remaining = 0

Display all the long running active transactions?

Select * from v$session_longops where time_remaining > 0

Display all the long running transactions for connected users with I/0 Statistics?

select
a.sid,
a.username,
optimizer_mode,
hash_value,
address,
cpu_time/1000000,
elapsed_time/1000000,
sql_text,
block_gets,
consistent_gets,
physical_reads,
block_changes,
consistent_changes,
d.time_remaining,
message
from
v$session a,
v$sqlarea b,
v$sess_io c,
v$session_longops d
where
a.sql_hash_value = b.hash_value
and a.sql_address = b.address
and a.sid = c.sid
and a.sql_hash_value = d.sql_hash_value
and a.sql_address = d.sql_address
and a.username is not null

How do we check which SQL is currently taking more CPU's in UNIX?
First we need to run the top command and top command will show the top CPU session.

Here is the top command partial output.

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
27486 oracle 19 0 603M 603M 601M S 16.9 5.2 35:12 1 oracle

27486 is process id which takes 16.9% CPU time. The process id 27486 is equivalent to v$process.spid in database. So we can join v$process, v$session and get the required information.

Here is the query to find out which query is currently taking high CPU.

select
a.sid,
a.username,
optimizer_mode,
hash_value,
address,
cpu_time/1000000,
elapsed_time/1000000,
sql_text,
a.sid,
block_gets,
consistent_gets,
physical_reads,
block_changes,
consistent_changes
from
v$session a,
v$sqlarea b,
v$sess_io c,
v$process d
wherea.sql_hash_value = b.hash_value
and a.sql_address = b.address
and a.sid = c.sid
and d.addr = a.paddr
and a.username is not null
and d.spid= 32479

Note : 32479 is taken from Unix TOP output.

Tuesday, July 15, 2008

CLOB update is very slow

This article is written in oracle9i release 9.2.0.6.0 I just want to share this issue with all...

we are updating CLOB column in one of the table. The update statement is running through Java code. It used to take 1 second to update one row. All of a sudden, it started taking 150 second. Immediatley, developers reported to me.

Here are the direction i took initially.

1. I verified the table statistics. The stat is generating periodically.
2. Enabled the trace and ran the TKPROF. In the execution part, it is using high resources.

update MWS_COLLECTION set WEB_BODY=:1, headline=:2, STATE=:3,
SOURCE_PUBLISHED_DT=:4, author=:5
where
STORY_ID=:6
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 55.99 145.55 1057 9495719 8238 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 55.99 145.56 1057 9495719 8238 1
Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 67
Rows Row Source Operation
------- ---------------------------------------------------
1 update
1 TABLE ACCESS BY INDEX ROWID MWS_COLLECTION
1 INDEX UNIQUE SCAN PK_MWS_COLLECTION (object id 139890)
********************************************************************************

3. I checked the tablespace and it is LMT and SSM is AUTO.

I am not able to find anything except the trace file output shows more resources in exection part.

The LOB segment and other columns use the same tablespace. I thought, i move the LOB segment to different tablespace. But i checked with the business and they agreed to purge 90% of the data in the table. so i backup only 10% of the data into another table and truncate the table and insert the cloned data into original table. The problem is gone and again it takes only one second to update the single row.

Wednesday, July 9, 2008

Truncate table in another schema without DROP ANY TABLE Privilege

Developers used to write a Procedure to truncate the table which is in another schema. So developers require truncate table privileges to run their code. Now DBA needs to grant DROP ANY TABLE privileges to statisfy the developers code. DROP ANY TABLE is one of the most powerful and dangerous privilege and DBA can not grant to other schema's in Oracle.

Here is one way to accomplish the task. This article was written in oracle9i release 9.2.0.6.0

Let us say, We have two schemas, INVENT, INVENTADMIN. All the tables were created in INVENT and stored procedures were created in INVENTADMIN. We need to grant truncate privileges to INVENTADMIN to truncate the tables in INVENT.

SQL> connect invent/invent@db1
Connected.
SQL> create table test as select * from user_objects;
Table created.
SQL> select count(*) from test;

COUNT(*)
----------
3

-- For some reson, the piple symbol is not appearing in the blog and
-- please use piple symbol in the third line of the below procedure.

SQL> create or replace procedure stp_truncate_table(p_table_name varchar2) is
2 begin
-- the below line, there is piple synobol before p_table_name.
3 execute immediate 'truncate table 'p_table_name;
4 end;
5 /

Procedure created.

SQL> grant execute on stp_truncate_table to inventadmin;

Grant succeeded.

SQL> connect inventadmin/inventadmin@db1
Connected.

SQL> create synonym stp_truncate_table for invent.stp_truncate_table;
Synonym created.

SQL> execute stp_truncate_table('TEST');
PL/SQL procedure successfully completed.
SQL>

Now login back to invent and check the table data.

SQL>
SQL> connect invent/invent@db1
Connected.
SQL> select * from test;

no rows selected

SQL>

Resize temporary tablespace

In some Database configuration, temporary tablespace is set to autoextend. When user runs query with lot of sort or any cartesian join, then temp tablespace will increase. If the temp file gets increasingly large, then DBA wanted to resize the tablespace to more reasonable size in order to reclaim the space.

There may be multiple methods exist to reclaim the used space depending on which release of Oracle you are running. The method that exists for all releases of Oracle is to simply drop and recreate the temporary tablespace back to its original resonable size.

This excerise is tested in oracle9i on Windows. It should apply to unix too.
Step 1.

Create the new temporary tablespace.

SQL> create temporary tablespace temp1 tempfile 2 'E:\ORADATA\RMS30MC\TEMP02.DBF' SIZE 500M 3 autoextend off
4 /
Tablespace created.

Step 2
Assign the newly created tablespace to default temporary tablespace

SQL> alter database default temporary tablespace temp1;
Database altered.

SQL> select property_value from database_properties where property_name='DEFAULT_TEMP_TABLESPACE';

PROPERTY_VALUE
--------------------------------------------------------------------------------TEMP1

Now all the newly connected users(users conncted to the databaseafter completing step2) will use the temp tablespace TEMP1.

Step 3
Wait until no one is using the temp tablespace.

SELECT COUNT(*) FROM v$sort_usage WHERE tablespace = 'TEMP'
and
SELECT COUNT(*) FROM v$tempseg_usage WHERE tablespace = 'TEMP'

return 0 rows before dropping the old TEMP tablespace.

Step 4
Once we are certain that, no one is using the old temp tablespace,we drop the old temp tablespace.

SQL> DROP TABLESPACE temp INCLUDING CONTENTS AND DATAFILES;

Tablespace dropped.

Index Skip Scanning

Index skip scan is one of the new feature in oracle9i. Prior to oracle9i, if we have composit index and if we are not using leading column in the WHERE clause, then optimizer will not use the index. But in oracle9i and further versions, optimizer may use the skip scan index depends on the cardinality of the leading column and optimizer cost. The regular B-tree index will have only one range scan from top to bottom. But skip scan will do many range scan depends on the cardinatlity of the leading index column.

The advantage of index skip scan is, we can reduce the number of indexes on the table. It save index space and reduce the index maintenance. Skip scan index is not same as efficient as B-tree index. But it is better then the full table scan.

Here is an example.

SQL> desc emp;
Name Null? Type
----------------------------------------- -------- ----------------------------
EMPNO NUMBER
SEX VARCHAR2(10)
NAME VARCHAR2(20)

SQL> select count(*),sex from emp group by sex;

COUNT(*) SEX
---------- ----------
10000 F
10000 M

SQL> create index idx_emp on emp(sex,empno);

Index created.
SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'MWSADMIN',TABNAME => 'EMP',ESTIMATE
_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.
SQL> set autotrace on explain;
SQL> select name from emp where empno=1934;

NAME
--------------------
Scott
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=4 Card=2 Bytes=20)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (Cost=4 Card=2 Bytes=20)
2 1 INDEX (SKIP SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost=3 Card=2)

Optimizer used skip scan index for the above query. During the skip scan, the composite index is accessed once for each distinct value of the leading column(s). For each distinct value, the index is searched to find the query's target values. When optimizer use the skip scan index, the query(select name from emp where empno=1934) is broken down into two small range scan as below.


select name from emp where sex = 'M' and empno=1934
union
select name from emp where sex = 'F' and empno=1934

Let us consider the below SQL query and it is using regular B-tree index. Since we use the leading index column in the where clause. This query does only one range scan, not like skip scan index.

SQL> select name from emp where sex='M' and empno=1934;

NAME
--------------------
Scott

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=2 Card=1 Bytes=11)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (Cost=2 Card=1 Bytes=11)
2 1 INDEX (RANGE SCAN) OF 'IDX_EMP' (NON-UNIQUE) (Cost=1 Card=1)

HWM - High Water Mark

What is High Water Mark(HWM)?

HWM is boundry between used block and unused block. Blocks below HWM is used blocks and above HWM is unused blocks. Used blocks might or might not have the data. In normal operations (Insert/update), the HWM is mostly bump up and never go down. Oracle tracks the HWM for the segment in segment header.

How do we identify the HWM?

Option 1


We can use DBMS_SPACE.UNUSED_SPACE procedure.

Please remember, i used pipe symbol in the below code and it is not visible in blog. So placed ### symbol in the place of pipe sign. Please replace thie ### sign with pipe symbol when we run this code in any of your test database.

DECLARE
alc_bks NUMBER;
alc_bts NUMBER;
unsd_bks NUMBER;
unsd_bts NUMBER;
luefi NUMBER;
luebi NUMBER;
lub NUMBER;
BEGIN
FOR c1_row IN (SELECT table_name
FROM user_tables where table_name='COLLECTION') LOOP
DBMS_SPACE.UNUSED_SPACE (
segment_owner => USER,
segment_name => c1_row.table_name ,
segment_type => 'TABLE' ,
total_blocks => alc_bks ,
total_bytes => alc_bts ,
unused_blocks => unsd_bks ,
unused_bytes => unsd_bts ,
last_used_extent_file_id => luefi ,
last_used_extent_block_id => luebi ,
last_used_block => lub);
DBMS_OUTPUT.PUT_LINE( 'Data for '### c1_row.table_name);
DBMS_OUTPUT.PUT_LINE( RPAD('*',LENGTH(c1_row.table_name) + 10,'*'));
DBMS_OUTPUT.PUT_LINE( 'Total Blocks.................'### alc_bks );
DBMS_OUTPUT.PUT_LINE( 'Total Bytes..................'###alc_bts );
DBMS_OUTPUT.PUT_LINE( 'Unused Blocks................'### unsd_bks );
DBMS_OUTPUT.PUT_LINE( 'Unused Bytes.................'###unsd_bts );
DBMS_OUTPUT.PUT_LINE( 'Last Used Ext File Id........'### luefi );
DBMS_OUTPUT.PUT_LINE( 'Last Used Ext Block Id.......'### luebi );
DBMS_OUTPUT.PUT_LINE( 'Last Used Block..............'###lub );
DBMS_OUTPUT.NEW_LINE;
END LOOP;
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE(SUBSTR(SQLERRM,1,250));
END;
/

Here is the output for the above code.

Data for COLLECTION
************************
Total Blocks.................38656
Total Bytes..................158334976
Unused Blocks................0
Unused Bytes.................0
Last Used Ext File Id........60
Last Used Ext Block Id.......1284368
Last Used Block..............256

PL/SQL procedure successfully completed.

SQL>
HWM = Total Blocks - Unused Blocks
HWM = 38656-0
HWM = 38656

Option 2

we can use user_tables.
select sum(blocks)-sum(empty_blocks) from user_tables

Please remember, the above query gives the result based on the current statistics. Best way to find the HWM is to use DBMS_SPACE.UNUSED_SPACE(Option 1) procedure.

How do we find which table is good candidate to reset the HWM?

I have answered for this as a different topic in the same blog. Please click to answer this question.

Why do we need to reset the HWM?

1. When ever optimizer takes full table scan, it scan all the blocks below HWM. This would degrade the performance and slowdown the Full table scan process. To avoid this, we need to shrink the table to reset the HWM.

2. Oracle does not release the free space under HWM for use by other objects for a simple reason: because that space is reserved for new inserts and to accommodate the growth of existing rows. The additional benefit of resetting HWM is a more efficient utilization of space because the space freed is available for other objects.

3. When rows are inserted with direct path—for example, through Direct Load Insert (insert with the APPEND hint) or through the SQL*Loader direct path—the data blocks are placed directly above the HWM. The space below it remains wasted.

How do we reset the HWM?

Option 1
Alter table emp move tablespace

This option requires rebuilding the indexes. since the index will be invalid/unusable after running the above command. The downside is, rebuilding the index is additional overhead when we use this option. Also users can not use the application or reports while rebuilding the index.

Option 2

1. Export the data
2. truncate the table
3. import the table
4. Analyze the table

Option 3
1. copy the table data
2. truncate the original table
3. insert back.

Option 4
Use DBMS_REDEFINITION package to copy the table.

Option5
In oracle10g, we have better option to reset the high water mark. We need to enable row movement before we shrink the space. We do not need to rebuild the index after reseting the HWM. This is the best option to reset the HWM. But this feature is not available in oracle9i.

SQL@DBA1> alter table bookings enable row movement;

Table altered.

SQL@DBA1>
SQL@DBA1> alter table bookings shrink space compact;

Table altered.

If you need to read more about how to reset the HWM in oracle10g, please click this link. I wrote an article as a different topics.