Translate

Tuesday, August 14, 2007

Above and Beyond Efficient Correlation

Correlated Subqueries that work and bear accurate results

It has been over twenty years (23 to be precise), since I completed the unofficial Spanish translation of an Introduction to Database Systems, which most of my class peers at Universidad del Norte used during the semesters involved. The translation was rather poor in spite of my good knowledge of the English grammar. While the class was mostly based on IBM’s DB2, the professor had made it generically except for his fabulous expertise with IMS, a hierarchical DBMS, that was key to industries such as airline operations, and –in particular, that of Avianca, then the airline of Colombia. I later expanded this knowledge while studying distributed database systems at Universidad de Los Andes in Santa Fé de Bogotá. Among the most interesting topics that I found great from the mathematical logic approach, I encountered the treatment of correlated subqueries, which was based on the suppliers’ data model in Date’s book. The fact is that I had taken several required applied math in logic and Boolean algebra and special mathematics involving practical relational calculus and algebra. My father had sent me Date’s book from New Jersey, and I had a competitive advantage over all others in the class who had to use the library book or other textbooks in Spanish until my translation was completed. I had a different experience when I got the graduate classes at NJIT and MSU, where I scored very high in SQL projects. At NJIT around 1992, I used both SQL (using relational algebra) and QUEL (using relational calculus) since Ingress was the educational database in place, although I usually completed my school projects at Allied-Signal CAE Center using Oracle6. At Montclair State, I breezed with excellence in my database course using Oracle8i/9i, while I challenged myself to resolve the Stamford University undergraduate exams, which are traditionally an overnight-due challenge for even any expert.

At NJIT, I learned that trying to manually parse correlated subqueries using relational algebra or relational calculus could convey manual un-nesting of subqueries or their virtual conversion into non-correlated ones. The issue is that deriving the results via relational algebra or calculus, did not relate any of the abstraction through which I had previously used as an undergraduate, but instead an Oracle Coursework that I had taken from Allied-Signal in 1991 had greatly increased my ability to derive optimal solutions and practical production solutions for their Oracle6 VAX VMS driven environment. When I taught an OCP class at Farleigh Dickinson University’s Computer Career Institute in Teaneck in 2001, I realized that some of my students encountered similar concerns to those I had, and I got the chance to use Oracle’s demos, in particular, the employee model provided in the Scott’s schema, the most traditional one, which is an excellent source to discuss hierarchical and correlated subqueries, and it also led me to enhance my own perspective in teaching others. As usual it is not the same to know well a topic than trying to teach that topic to others.

On recalling this topic, and how RDBMS’s have progressed to improve the quality of producing accurate results. Oracle, in particular, has the most complete SQL API syntax that utilizes the most recent ANSI SQL 1999 standard available in the market as well as more traditional ones.

Indeed, correlated subqueries allow empowering certain scenarios useful in data mining and business intelligence, which in many cases for RDBMS, other than Oracle, require custom code writing and expansion, and in many cases an additional host language, usually with many unhandled exceptions involving small queries that normally threaten the database integrity. Oracle also provides the greatest number of built-in functionality that can be embedded in any correlated subquery as needed, including also built-in extensions, unstructured data such as XML, and LOBs among others.

Below, is a summary of concepts and Oracle literature that I have utilized in previous case studies, and which I prepared for a recent presentation.


OBJECTIVES

  • Distinguish correlated subqueries from non-correlated ones.

  • Emphasize Oracle’s rules and limitations on correlated subqueries.

  • Summarize a set of best practices by highlighting the related strategies and appropriate syntax.


INTRODUCTION: CONCEPTS

  • A subquery answers multiparts questions by involving more than one SELECT statement.

  • A subquery in the FROM clause of a SELECT statement is also called an inline view.

  • A subquery in the WHERE clause of a SELECT statement is also called a nested subquery. This also applies to the HAVING subclause in the GROUP BY clause and does not require a WHERE clause in some scenarios.


OPERATORS INVOLVED

  • Logical Operators

  • (OR, AND, NOT)

  • EXISTS, NOT EXISTS
    (mapping to list operators such as IN, NOT IN)

  • Comparison Operators

  • ANY, SOME, ALL (Filters/Enhancers)

  • Other Operators

Mathematical Operators

Unstructured Data (XML)

MODEL clause

Object-Oriented/User-defined

LIMITATIONS
  • Unlimited subquery levels in the FROM clause

  • Up to 255 levels of subqueries in the WHERE clause


CORRELATED SUBQUERIES: CONCEPTS

  • A correlated subquery is evaluated once for each row processed by the parent statement.

  • A correlated subquery answers a multiple-part question whose answer depends on the value in each row processed by the parent statement.

  • Oracle resolves unqualified columns in the subquery by looking in the tables named in the subquery and then in the tables named in the parent statement.


Example 1: List the name and salaries of all employees in the lowest salary grade bucket


SELECT ename,sal

FROM emp e1

WHERE EXISTS

(SELECT 'x'

FROM salgrade s

WHERE e1.sal BETWEEN s.losal AND s.hisal

AND s.grade = (

SELECT MIN(s2.grade)

FROM salgrade s2

)

);

Example 2: List the records of employees whose salary is above the department’s average salary.

SELECT *

FROM emp

e1

WHERE exists

(SELECT 'x'

FROM emp e2

WHERE e1.sal > ALL

(SELECT AVG(e3.sal)

FROM emp e3

WHERE e2.deptno =e3.deptno

GROUP by e3.deptno

)

AND e1.empno =e2.empno

);

Example 3: List the name, job title, and salary of all employees working for the department that has the most employees in the company.

SELECT e.ename, e.job, e.sal

FROM emp e

WHERE EXISTS

(

SELECT E1.DEPTNO

FROM emp e1

WHERE e1.deptno = e.deptno

GROUP BY deptno

HAVING COUNT(E1.DEPTNO) =

(

SELECT MAX(count(e5.deptno)) edcount

FROM emp e5

GROUP BY e5.deptno

)

) ;

Example 4: List the name of the suppliers who do not ship part P2.


SELECT sname

FROM s

WHERE NOT EXISTS

( SELECT 'A'

FROM sp

WHERE s# = s.s#

AND p# = 'P2'

);

Example 5: Correlated update example: Set the shipping quantity to 0 for all suppliers in London.

UPDATE sp

SET qty = 0

WHERE (

SELECT city

FROM s

WHERE s.s# = sp.s#

) = 'London';


UNNESTING SUBQUERIES

  • Subqueries are nested when they appear in the WHERE clause of the parent statement.

  • When Oracle Database evaluates a statement with a nested subquery, it may overlook the most efficient access paths while evaluating the subquery portion multiple times.

  • Subquery unnesting unnests and merges the body of the subquery into the body of the statement that contains it, allowing the consistent query and access path optimizations.

  • Unnesting exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery. PL/SQL could be useful in some cases.

  • You can use UNNEST hint to unnest a correlated subquery and HASH_AJ and MERGE_AJ hints for non-correlated ones.



BEST PRACTICES

  • Use operators EXISTS and NOT EXISTS rather than IN and NOT in whenever appropriate.

  • Avoid the usage of NOT IN.

  • Use hints if necessary to unnest subqueries.



CONCLUDING REMARKS

  • Correlated Subqueries are different from uncorrelated subqueries in that the former are evaluated once for each row processed by the parent statement

  • The Oracle optimizer may successfully unnest subqueries, but it some scenarios it is possible to use hints such as UNNEST, HASH_AJ and MERGE_AJ to entice improved performance tuning.

No comments: