Correlated Subqueries that work and bear accurate results
It has been over twenty years (23 to be precise), since I completed the unofficial Spanish translation of an Introduction to Database Systems, which most of my class peers at Universidad del Norte used during the semesters involved. The translation was rather poor in spite of my good knowledge of the English grammar. While the class was mostly based on IBM’s DB2, the professor had made it generically except for his fabulous expertise with IMS, a hierarchical DBMS, that was key to industries such as airline operations, and –in particular, that of Avianca, then the airline of Colombia. I later expanded this knowledge while studying distributed database systems at Universidad de Los Andes in Santa Fé de Bogotá. Among the most interesting topics that I found great from the mathematical logic approach, I encountered the treatment of correlated subqueries, which was based on the suppliers’ data model in Date’s book. The fact is that I had taken several required applied math in logic and Boolean algebra and special mathematics involving practical relational calculus and algebra. My father had sent me Date’s book from New Jersey, and I had a competitive advantage over all others in the class who had to use the library book or other textbooks in Spanish until my translation was completed. I had a different experience when I got the graduate classes at NJIT and MSU, where I scored very high in SQL projects. At NJIT around 1992, I used both SQL (using relational algebra) and QUEL (using relational calculus) since Ingress was the educational database in place, although I usually completed my school projects at Allied-Signal CAE Center using Oracle6. At Montclair State, I breezed with excellence in my database course using Oracle8i/9i, while I challenged myself to resolve the Stamford University undergraduate exams, which are traditionally an overnight-due challenge for even any expert.
At NJIT, I learned that trying to manually parse correlated subqueries using relational algebra or relational calculus could convey manual un-nesting of subqueries or their virtual conversion into non-correlated ones. The issue is that deriving the results via relational algebra or calculus, did not relate any of the abstraction through which I had previously used as an undergraduate, but instead an Oracle Coursework that I had taken from Allied-Signal in 1991 had greatly increased my ability to derive optimal solutions and practical production solutions for their Oracle6 VAX VMS driven environment. When I taught an OCP class at Farleigh Dickinson University’s Computer Career Institute in Teaneck in 2001, I realized that some of my students encountered similar concerns to those I had, and I got the chance to use Oracle’s demos, in particular, the employee model provided in the Scott’s schema, the most traditional one, which is an excellent source to discuss hierarchical and correlated subqueries, and it also led me to enhance my own perspective in teaching others. As usual it is not the same to know well a topic than trying to teach that topic to others.
On recalling this topic, and how RDBMS’s have progressed to improve the quality of producing accurate results. Oracle, in particular, has the most complete SQL API syntax that utilizes the most recent ANSI SQL 1999 standard available in the market as well as more traditional ones.
Indeed, correlated subqueries allow empowering certain scenarios useful in data mining and business intelligence, which in many cases for RDBMS, other than Oracle, require custom code writing and expansion, and in many cases an additional host language, usually with many unhandled exceptions involving small queries that normally threaten the database integrity. Oracle also provides the greatest number of built-in functionality that can be embedded in any correlated subquery as needed, including also built-in extensions, unstructured data such as XML, and LOBs among others.
Below, is a summary of concepts and Oracle literature that I have utilized in previous case studies, and which I prepared for a recent presentation.
Distinguish correlated subqueries from non-correlated ones.
Emphasize Oracle’s rules and limitations on correlated subqueries.
Summarize a set of best practices by highlighting the related strategies and appropriate syntax.
A subquery answers multiparts questions by involving more than one SELECT statement.
A subquery in the FROM clause of a SELECT statement is also called an inline view.
A subquery in the WHERE clause of a SELECT statement is also called a nested subquery. This also applies to the HAVING subclause in the GROUP BY clause and does not require a WHERE clause in some scenarios.
(OR, AND, NOT)
EXISTS, NOT EXISTS
(mapping to list operators such as IN, NOT IN)
ANY, SOME, ALL (Filters/Enhancers)
Unstructured Data (XML)
Unlimited subquery levels in the FROM clause
Up to 255 levels of subqueries in the WHERE clause
A correlated subquery is evaluated once for each row processed by the parent statement.
A correlated subquery answers a multiple-part question whose answer depends on the value in each row processed by the parent statement.
Oracle resolves unqualified columns in the subquery by looking in the tables named in the subquery and then in the tables named in the parent statement.
Example 1: List the name and salaries of all employees in the lowest salary grade bucket
FROM emp e1
FROM salgrade s
WHERE e1.sal BETWEEN s.losal AND s.hisal
AND s.grade = (
FROM salgrade s2
Example 2: List the records of employees whose salary is above the department’s average salary.
FROM emp e2
WHERE e1.sal > ALL
FROM emp e3
WHERE e2.deptno =e3.deptno
GROUP by e3.deptno
AND e1.empno =e2.empno
Example 3: List the name, job title, and salary of all employees working for the department that has the most employees in the company.
SELECT e.ename, e.job, e.sal
FROM emp e
FROM emp e1
WHERE e1.deptno = e.deptno
GROUP BY deptno
HAVING COUNT(E1.DEPTNO) =
SELECT MAX(count(e5.deptno)) edcount
FROM emp e5
GROUP BY e5.deptno
Example 4: List the name of the suppliers who do not ship part P2.
WHERE NOT EXISTS
( SELECT 'A'
WHERE s# = s.s#
AND p# = 'P2'
Example 5: Correlated update example: Set the shipping quantity to 0 for all suppliers in London.
SET qty = 0
WHERE s.s# = sp.s#
) = 'London';
Subqueries are nested when they appear in the WHERE clause of the parent statement.
When Oracle Database evaluates a statement with a nested subquery, it may overlook the most efficient access paths while evaluating the subquery portion multiple times.
Subquery unnesting unnests and merges the body of the subquery into the body of the statement that contains it, allowing the consistent query and access path optimizations.
Unnesting exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery. PL/SQL could be useful in some cases.
You can use UNNEST hint to unnest a correlated subquery and HASH_AJ and MERGE_AJ hints for non-correlated ones.
Use operators EXISTS and NOT EXISTS rather than IN and NOT in whenever appropriate.
Avoid the usage of NOT IN.
Use hints if necessary to unnest subqueries.
Correlated Subqueries are different from uncorrelated subqueries in that the former are evaluated once for each row processed by the parent statement
The Oracle optimizer may successfully unnest subqueries, but it some scenarios it is possible to use hints such as UNNEST, HASH_AJ and MERGE_AJ to entice improved performance tuning.