Oracle Expert Blog: 2009

Sunday, July 19, 2009

Business Intelligence and Statistical Error

Surface Response Model: From a Bull/Cow Fertility Test
(in my Advanced Design of Experiments Class)

Statisticians often study errors of Type I to refer to the probability of rejecting the null hypothesis when it is in fact true and Type II when accepting that null hypothesis when it is false. Beyond that point, systematic error can occur in general when wrong or inaccurate information is entered into a database and subsequently propagated to various other areas of information systems replication and reporting. While the most common type of data error is inadmissibly the human error, there are other reasons why decision support systems, and business intelligence systems may be subject to erroneous decision or reporting, other common causes are inappropriate forecasting strategies, for instance inappropriate categorization and choosing the wrong sources of data verification and validation or relevant resources. The well-known belief that “garbage in, garbage out” will drive quality into inconsistency and in many other instances into wrong data derivation. In some data quality for a calibration procedure project I worked on, a procedural error was entered not by the code but by a random reading of data files, which caused for the procedure to introduce inappropriate measurements based on the timestamp associated with the data files. After I found why the algorithm, in spite of being accurate, would introduce an error, the lead researcher refused to accept an improved algorithm that would have eliminated the error, simply because this would have meant to redo the entire experiment. Therefore, the cell calibration test introduced a small percent of inconsistent data, which did not invalidate the regression, correlation, and factorial design model used to validate the test. The entire data went to an Oracle database, where it was analyzed, mined, and presented for calibration test of the CDMA wireless protocol. Indeed, it is obvious that time management was a key factor in making this decision. Having also worked directly with statisticians, biostatisticians and operations research analysts and managers, where my role was a DBA, data analyst, software engineer, software developer or simply consultant, I believe that such a decision might have been contested by others or at least carefully revised.

There are other scenarios, where a DBA role may be the underlying reason or perhaps not so directly. For instance, if a company adopts a DBA corporate approach, such that the CURSOR_SHARING parameter is set to SIMILAR, this may perform consistently in most databases, but could become an issue in many OLTP scenarios, in particular, in queries widely using bind variables. This is because the SQL bind peeking will not be so much concerned with the actual value of the bind variables. In those scenarios, setting CURSOR_SHARING to EXACT is more practical, since it avoids the typical scenario where the cached statement becomes the “SQL bind peeking bible” matching the query and reparsing of the statement never actually occurs. In these queries were used in BI environments, it would obvious that there is the potential and probability that at some point erroneous reporting occurs. Therefore, there is the possibility that a large amount of BI effort, such as applications, and tool-driven implementation can span into a significant level where some rows have been neglected for displaying. In some cases, I have tested, the rendering of result set from re-indexing of tables on additional columns can lead for this misbehavior of the SQL bind peeking and statement parsing to take place, which occurs mostly in OLTP databases.

While error and error propagation of various types is always possible, achieving data quality and integrity can only be measure in relation to the impact. For instance, in a research environment the error may have great or no significance, but it could be devastating in mission-critical environments such as medical and financial systems. Error tracking then becomes a pro-active objective to attain data integrity, consistency, and quality that is to be handled not only by manual human control and resilient data consistency but quality corporate auditing controls as well. In most instances, internal controls are more valuable than external controls such as outsourced data verification and validation, simply because internal resources usually know more about the applications, data domains, and data contents overall.

In some instances, database-driven data mining and BI tools, such as Discoverer, can be complemented not only their own visualization capabilities but also with the usage of other methodologies such as, for instance, surface response models and artificial intelligence mining. While visualization is unlikely to eliminate or resolve any existing database error, i.e., data quality or consistency, it can certainly congruently derive data trend and convey confirming or rejecting the desirable results to verify or validate. Therefore, visualization models are helpful to regression, correlation, and statistical tests, simply because they can verify and validate the data quality with one simple view. While BI models that include functions such as CUBE and ROLLUP, which can further aggregate and categorize data, visualization techniques are powerful tools for decision making in that they provide truthful real-life validation of data trends. Both database and statistical tools and models can expand their data mining capabilities with analysis of central tendency, — such as kurtosis and skewness—, and measures of variability such as mean deviation, variance and standard deviation, and further studies on analysis of variance, but it the end it goes for good to present to have chosen a visualization tool that provides the greatest expectations overall.

Tuesday, June 16, 2009

A Comprehensive Guide to Oracle Partitioning with Samples

The research presented focuses on partitioning strategy options for each currently available Oracle version, in particular Oracle11g. The study guideline is on implementing the new partitioning options available in 11g, while enhancing performance tuning for the physical model and applications involved. Therefore, basic strategies— such as range, hash and list—, composite partitioning strategies including all possible combinations of Basic strategies, and Partition Extensions such as Reference and Interval partitioning strategies are covered. The research covers both middle-to-large-size and VLDB databases with significant implications for consolidation, systems integration, high-availability, and virtualization support. Topics covered emphasize subsequent performance tuning and specific application usage such as IOT-partitioning and composite partitioning choices.

1. Introduction

Encountering that performance for a table whose segments have exceeded a good number of Gigabytes or even Terabytes is probably the main consideration to partition that table. Indeed, that table will require special segment access control, and thus, special index architecture and administration. Ultimately, the usage of multiple block size tablespaces and buffer caches become a database technology that greatly enhances a partitioning approach.
In principle, tables (including materialized views), indexes, and Index-Organized Tables (IOT) are the objects used as target in any valid partitioning strategy utilized. Partitioning strategies can involve primarily Basic, Composite and Partition Extensions.

2. Partitioning Strategies

Oracle Partitioning strategies include, namely:

2.1 Basic Partitioning (Single-Level)

This strategy involves one of the following options: Partition by Range (Establishes ranges within the domain used as partitioning key), List (Provides a list of values matching one partition in the partition key domain and a default partition for those not matched) or Hash (which transforms the partitioning key value and maps it to given partition).

2.2 Composite Partitioning

The following combinations of basic partitioning result into valid composite partitioning strategies, namely:

Range-Range
Range-List
Range-Hash
List-List
List-Range
List-Hash
Interval Partitioning

The Interval Partitioning strategy is fundamentally a special implementation of Range partitioning, which maps primarily a DATE, TIMESTAMP data type to a numeric interval, using the INTERVAL keyword, use as a partition range marker. The functions NUMTOYMINTERVAL and NUMTODSINTERVAL are commonly used. Interval partitioning can occur as a single-level strategy or composite option in combination with all other options, namely, Range, Hash and List.

2.3 Partition Extensions

This partitioning strategy involves:

2.3.1 Reference Partitioning

This strategy normally uses the referential integrity constraint between to table, and uses the key in the details table to attain partition on the referenced key, which points to a candidate primary key in another partitioned table, the master table. The referential integrity constraint must be enabled and enforced.

2.3.2 Virtual Column Partitioning

This option permits to partition of a column on a virtual column, which is usually the outcome of a mathematical operation on two or more actual columns on the same table. This option extends every basic partitioning strategy.

3. Object Partitioning

In addition to the overall partitioning strategies, it is important to look at partitioning from the database objects to be partitioned, which include:

3.1 Tables (including Materialized Views)

Tables strictly support each Basic and Composite strategy and all Partition Extension, constrained by any SQL DDL rule.

3.2 Indexes

Indexes can have local or global partitions. Locally partitioned indexes can be pre-fixed (if they partitioned on the left portion of the key) or non-prefixed, otherwise. A local index has a one-to-one correspondence with the underlying table, and can reside in the same or on a different tablespace, which could even be of a different block size, a strategy that often enhances database performance. Index Partitioning can support Basic Partitioning strategies in general.

3.3 Index-Organized Tables

Like tables, Index-Organized Tables support all partitioning Basic strategies, and –in general– the partitioning key must be a subset of the primary key.

4. Creating Partitions

It is exceptionally good to have the experience of creating partitions under any possible strategy, namely:

4.1 Creating Range-Partitioned Tables

This sample code creates a table with four partitions and enables row movement:

CREATE TABLE credential_evaluations
( eval_id VARCHAR2(16) primary key
, grad_id VARCHAR2(12)
, grad_date DATE
, degree_granted VARCHAR2(12)
, degree_major VARCHAR2(64)
, school_id VARCHAR2(32)
, final_gpa NUMBER(4,2))
PARTITION BY RANGE (grad_date)
( PARTITION grad_date_70s
VALUES LESS THAN (TO_DATE('01-JAN-1980','DD-MON-YYYY')) TABLESPACE T1
, PARTITION grad_date_80s
VALUES LESS THAN (TO_DATE('01-JAN-1990','DD-MON-YYYY')) TABLESPACE T2
, PARTITION grad_date_90s
VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')) TABLESPACE T3
, PARTITION grad_date_00s
VALUES LESS THAN (TO_DATE('01-JAN-2010','DD-MON-YYYY')) TABLESPACE T4 )
ENABLE ROW MOVEMENT;

4.2 Creating Global Indexes

Creating a range-partitioned global index is similar to creating range-partitioned table. The following example creates a range-partitioned global index on final_gpa for CREDENTIAL_EVALUATIONS. Each index partition is named but is stored in the default tablespace for the index.

CREATE INDEX ndx_final_gpa ON credential_evaluations (final_gpa)
GLOBAL PARTITION BY RANGE(final_gpa)
( PARTITION c1 VALUES LESS THAN (2.5)
, PARTITION c2 VALUES LESS THAN (3.0)
, PARTITION b1 VALUES LESS THAN (3.4)
, PARTITION b2 VALUES LESS THAN (3.7)
, PARTITION a1 VALUES LESS THAN (3.9)
, PARTITION a2 VALUES LESS THAN (MAXVALUE));

4.3 Creating an Interval Range-Partitioned Table

An Interval Range-Partitioned table is a special case of a Range Partitioned Table. The following are important features and relevant issues:

The INTERVAL clause of the CREATE TABLE statement sets interval partitioning for the table. At least one range partition must be specified using the PARTITION clause.
The range partitioning key value determines the high value of the range partitions (transition point) and the database automatically creates interval partitions for data beyond that transition point.
For each interval partition, the lower boundary is the non-inclusive upper boundary of the previous range or interval partition.
The partitioning key can only be a single column name from the table and it must be of NUMBER or DATE type.
The optional STORE IN clause lets you specify one or more tablespaces.
The following sample code sets four partitions with varying widths. It also specifies that above the transition point of January 1, 2009, partitions are created with a width of one month.

CREATE TABLE rental_costs (
item_id NUMBER(6),
time_intv DATE,
unit_cost NUMBER(12,2),
unit_price NUMBER(12,2) )
PARTITION BY RANGE (time_intv)
INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
( PARTITION pca VALUES LESS THAN (TO_DATE('1-1-2006', 'DD-MM-YYYY')) tablespace t2,
PARTITION pcb VALUES LESS THAN (TO_DATE('1-1-2007', 'DD-MM-YYYY')) tablespace t4,
PARTITION pcc VALUES LESS THAN (TO_DATE('1-1-2008', 'DD-MM-YYYY')) tablespace t8,
PARTITION pcd VALUES LESS THAN (TO_DATE('1-1-2009', 'DD-MM-YYYY')) tablespace t12);

The high bound of partition pcd establishes the transition point. pcd and all partitions below it, namely, (pca, pcb, and pcc) are in the range section while all partitions above it fall into the interval section.

4.4 Creating Hash-Partitioned Tables

Hash-Partitioned tables map the insertion location for any row via a hashing algorithm that determines the appropriate tablespace for the partition key instance. The following example illustrates a typical case:

CREATE TABLE school_directory
(stid NUMBER PRIMARY KEY,
lname VARCHAR2 (50),
fname VARCHAR2 (50),
phone VARCHAR2(16),
email VARCHAR2(128),
class_year VARCHAR2(4))
PARTITION BY HASH (stid) PARTITIONS 4 STORE IN (t1, t2, t3, t4);

The PARTITION BY HASH clause of the CREATE TABLE statement identifies that the table is to be hash-partitioned. The PARTITIONS clause can then be used to specify the number of partitions to create, and optionally, the tablespaces to store them in. Otherwise, PARTITION clauses can be used to name the individual partitions and their tablespaces . The only attribute needed to specify for hash partitions is TABLESPACE. All of the hash partitions of a table must share the same segment attributes (except TABLESPACE), which are inherited from the table level.

4.5 Creating List-Partitioned Tables

List-partitioned tables follow the following rules among others:

A PARTITION BY LIST clause is used in the CREATE TABLE statement to create a table partitioned by list, by specifying lists of literal values, (the discrete values of the partitioning columns qualifying rows matching the partition’s single column partitioning key.) In fact, there is no sense of order among partitions.

The DEFAULT keyword is used to describe the value list for a partition that will accommodate rows that do not map into any of the other partitions.
Optional subclauses of a PARTITION clause can specify physical and other attributes specific to a partition segment. If not overridden at the partition level, partitions inherit the attributes of their parent table.

CREATE TABLE regional_rentals
(divno NUMBER,
divname VARCHAR2(40),
rentals_quarterly NUMBER(12, 2),
state VARCHAR2(2))
PARTITION BY LIST (state)
(PARTITION pnw VALUES ('OR', 'WA', 'WY') TABLESPACE T1,
PARTITION psw VALUES ('AZ', 'CA', 'UT') TABLESPACE T3,
PARTITION pne VALUES ('CT', 'NY', 'NJ') TABLESPACE T5,
PARTITION pse VALUES ('FL', 'GA', 'SC') TABLESPACE T7);

4.6 Creating Reference-Partitioned Tables

In many cases, it is recommended to use referential integrity constraint in a strategic way to partition a master-detail table scenario, accordingly. The following concerns are quite relevant:

The PARTITION BY REFERENCE clause is used with the CREATE TABLE statement, specifying the name of a referential constraint, which becomes the partitioning referential constraint used as the basis for reference partitioning in the table. The referential integrity constraint must be enabled and enforced.
It is possible to set object-level default attributes, and optionally specify partition descriptors that override the object-level defaults on a per-partition basis.

When providing partition descriptors, the number of partitions described should match the number of partitions or subpartitions in the referenced table, i.e., the table will have one partition for each subpartition of its parent when the parent table is composite; otherwise the table will have one partition for each partition of its parent.

Besides, no partition bounds can be set for the partitions of a reference-partitioned table.

The partitions of a reference-partitioned table can be named, inheriting their name from the respective partition in the parent table, unless this inherited name conflicts with one of the explicit names given. In this scenario, the partition will have a system-generated name.

Partitions of a reference-partitioned table will collocate with the corresponding partition of the parent table, if no explicit tablespace is set accordingly.

The following sample code illustrates the creation of a reference-partitioned table preceded by the creation a of master table and a details table. The referential integrity constraint is highlighted as a key partition marking strategy.

4.6.1 Master Table DDL

CREATE TABLE order_hist
( ord_id NUMBER(16),
ord_date TIMESTAMP WITH LOCAL TIME ZONE,
ord_mode VARCHAR2(8),
cust_id NUMBER(9),
ord_status VARCHAR2(4),
ord_total NUMBER(12,2),
act_mgr_id NUMBER(9),
promo_id NUMBER(8),
CONSTRAINT ord_pk PRIMARY KEY(ord_id) USING INDEX TABLESPACE INDX )
PARTITION BY RANGE(ord_date)
( PARTITION pq1 VALUES LESS THAN (TO_TIMESTAMP_TZ('01-APR-2008 07:00:00 -5:00' ,
'DD-MON-YYYY HH:MI:SS TZH:TZM')),
PARTITION pq2 VALUES LESS THAN (TO_TIMESTAMP_TZ('01-JUL-2008 07:00:00 -5:00' ,
'DD-MON-YYYY HH:MI:SS TZH:TZM')),
PARTITION pq3 VALUES LESS THAN (TO_TIMESTAMP_TZ('01-OCT-2008 07:00:00 -5:00' ,
'DD-MON-YYYY HH:MI:SS TZH:TZM')),
PARTITION pq4 VALUES LESS THAN (TO_TIMESTAMP_TZ('01-JAN-2009 07:00:00 -5:00' ,
'DD-MON-YYYY HH:MI:SS TZH:TZM')) );

4.6.2 Details Table DDL

CREATE TABLE order_details
( ord_id NUMBER(16) NOT NULL,
line_item_id NUMBER(3) NOT NULL,
prod_id NUMBER(8) NOT NULL,
unit_price NUMBER(12,2),
qty NUMBER(8),
CONSTRAINT ord_det_fk
FOREIGN KEY(ord_id) REFERENCES order_hist(ord_id)
)
PARTITION BY REFERENCE(ord_det_fk);

4.7 Creating Local Partitioned Indexes

When creating a local index for a table, the database constructs the index such that it is equipartitioned (with a one-to-one correspondence) against the underlying table.

Similarly, the database also ensures that the index is maintained automatically when maintenance operations are performed on the underlying table. This sample code creates a local index on the SCHOOL_DIRECTORY table:

CREATE INDEX ndx_gd ON school_directory (email) LOCAL
PARTITIONS 4 STORE IN (t1, t2, t3, t4);

Naturally, it is possible to optionally name the hash partitions and tablespaces into which the local index partitions are to be stored, otherwise, the database uses the name of the corresponding base partition as the index partition name, and stores the index partition in the same tablespace as the table partition.

4.8 Creating a Hash-Partitioned Global Index

The following sample code illustrates the creation of a global hash-partitioned index.

CREATE UNIQUE INDEX ndx_sch_dir ON school_directory (stid,phone,email) GLOBAL
PARTITION BY HASH (stid,phone)
(PARTITION psp1 TABLESPACE t1,
PARTITION psp2 TABLESPACE t2,
PARTITION psp3 TABLESPACE t4,
PARTITION psp4 TABLESPACE t8);

The syntax is similar to that used for a hash partitioned table.
In most instances, hash-partitioned global indexes can improve the performance of indexes where a small number of leaf blocks in the index have high contention in multiuser OLTP environments.

Hash-partitioned global indexes can also limit the impact of index skew on monotonously increasing column values. Queries involving the equality and IN predicates on the index partitioning key can efficiently use hash-partitioned global indexes.

4.9 Creating Range-Hash Partitioned Tables

Range-Hash partitioned tables are probably the most common type among the composite partitioning strategies. In general, to create a composite partitioned table, use the PARTITION BY [ RANGE LIST ] clause of a CREATE TABLE statement. Next, you specify a SUBPARTITION BY [ RANGE LIST HASH ] clause that follows similar syntax and rules as the PARTITION BY [ RANGE LIST HASH ] clause. The PARTITION and SUBPARTITION or SUBPARTITION. In fact, it is important to consider the following issues, namely:

The partitions of a range-hash partitioned table are logical structures only, as their data is stored in the segments of their subpartitions.
As with partitions, these subpartitions share the same logical attributes.
Unlike range partitions in a range-partitioned table, the subpartitions cannot have different physical attributes from the owning partition, but they can reside another tablespace.
Attributes specified for a range partition apply to all subpartitions of that partition.
Specify different attributes for each range partition.
Specify a STORE IN clause at the partition level if the list of tablespaces across which the subpartitions of that partition should be spread is different from those of other partitions.

CREATE TABLE credential_evaluations
( eval_id VARCHAR2(16) primary key
, grad_id VARCHAR2(12)
, grad_date DATE
, degree_granted VARCHAR2(12)
, degree_major VARCHAR2(64)
, school_id VARCHAR2(32)
, final_gpa NUMBER(4,2))
PARTITION BY RANGE (grad_date)
SUBPARTITION BY HASH (grad_id) SUBPARTITIONS 8 STORE IN (T1,T2,T3,T4)
( PARTITION grad_date_70s
VALUES LESS THAN (
TO_DATE('01-JAN-1980','DD-MON-YYYY'))
, PARTITION grad_date_80s
VALUES LESS THAN (
TO_DATE('01-JAN-1990','DD-MON-YYYY'))
, PARTITION grad_date_90s
VALUES LESS THAN (
TO_DATE('01-JAN-2000','DD-MON-YYYY')), PARTITION grad_date_00s
VALUES LESS THAN (
TO_DATE('01-JAN-2010','DD-MON-YYYY'))
);

4.10 Creating Range-List Partitioned Tables

Range-List partitioned tables are subject to range rules at the first partitioning level and list rules at second, list partitioning level, accordingly.

CREATE TABLE q_territory_sales
( divno VARCHAR2(12), depno NUMBER,
itemno VARCHAR2(16), accrual_date DATE,
sales_amount NUMBER, state VARCHAR2(2),
constraint pk_q_dvdno primary key(divno,depno)
) TABLESPACE t8 PARTITION BY RANGE (accrual_date) SUBPARTITION BY LIST (state)
(PARTITION q1_2000 VALUES LESS THAN (TO_DATE('1-APR-2000','DD-MON-YYYY'))
( SUBPARTITION q1_2000_nw VALUES ('OR', 'WY'),
SUBPARTITION q1_2000_sw VALUES ('CA', 'NM'),
SUBPARTITION q1_2000_ne VALUES ('NY', 'CT'),
SUBPARTITION q1_2000_se VALUES ('FL', 'GA'),
SUBPARTITION q1_2000_nc VALUES ('SD', 'WI'),
SUBPARTITION q1_2000_sc VALUES ('TX', 'LA‘) ),
PARTITION q2_2000 VALUES LESS THAN (TO_DATE('1-JUL-2000','DD-MON-YYYY'))
( SUBPARTITION q2_2000_nw VALUES ('OR', 'WY'),
SUBPARTITION q2_2000_sw VALUES ('CA', 'NM'),
SUBPARTITION q2_2000_ne VALUES ('NY', 'CT'),
SUBPARTITION q2_2000_se VALUES ('FL', 'GA'),
SUBPARTITION q2_2000_nc VALUES ('SD', 'WI'),
SUBPARTITION q2_2000_sc VALUES ('TX', 'LA‘)
), PARTITION q3_2000 VALUES LESS THAN (TO_DATE('1-OCT-2000','DD-MON-YYYY'))
( SUBPARTITION q3_2000_nw VALUES ('OR', 'WY'),
SUBPARTITION q3_2000_sw VALUES ('CA', 'NM'),
SUBPARTITION q3_2000_ne VALUES ('NY', 'CT'),
SUBPARTITION q3_2000_se VALUES ('FL', 'GA'),
SUBPARTITION q3_2000_nc VALUES ('SD', 'WI'),
SUBPARTITION q3_2000_sc VALUES ('TX', 'LA')
), PARTITION q4_2000 VALUES LESS THAN ( TO_DATE('1-JAN-2001','DD-MON-YYYY'))
( SUBPARTITION q4_2000_nw VALUES ('OR', 'WY'),
SUBPARTITION q4_2000_sw VALUES ('CA', 'NM'),
SUBPARTITION q4_2000_ne VALUES ('NY', 'CT'),
SUBPARTITION q4_2000_se VALUES ('FL', 'GA'),
SUBPARTITION q4_2000_nc VALUES ('SD', 'WI'),
SUBPARTITION q4_2000_sc VALUES ('TX', 'LA')
) );

4.11 Creating List-Hash Partitioned Tables

This example shows the CAR_RENTALS table that is list partitioned by territory and subpartitioned using hash by customer identifier.

CREATE TABLE car_rentals
( car_id VARCHAR2(16)
, account_number NUMBER
, customer_id NUMBER
, amount_paid NUMBER
, branch_id NUMBER
, territory VARCHAR(2)
, status VARCHAR2(1))
PARTITION BY LIST (territory)
SUBPARTITION BY HASH (customer_id) SUBPARTITIONS 8
( PARTITION p_nw VALUES ('OR', 'WY') TABLESPACE T1
, PARTITION p_sw VALUES ('AZ', 'CA') TABLESPACE T2
, PARTITION p_ne VALUES ('NY', 'CT') TABLESPACE T3
, PARTITION p_se VALUES ('FL', 'GA') TABLESPACE T4
, PARTITION p_nc VALUES ('SD', 'WI') TABLESPACE T5
, PARTITION p_sc VALUES ('OK', 'TX') TABLESPACE T6);

4.12 Creating List-Range Partitioned Tables

The following sample code shows a car_rentals table that is list by territory and subpartitioned by range using the rental paid amount. Note that row movement is enabled.

CREATE TABLE car_rentals
( car_id VARCHAR2(16)
, account_number NUMBER
, customer_id NUMBER
, amount_paid NUMBER
, branch_id NUMBER
, territory VARCHAR(2)
, status VARCHAR2(1) )
PARTITION BY LIST (territory)
SUBPARTITION BY RANGE (amount_paid)
( PARTITION p_nw VALUES ('WA', 'WY')
( SUBPARTITION snwlow VALUES LESS THAN (1000)
, SUBPARTITION snwmid VALUES LESS THAN (10000)
, SUBPARTITION snwhigh VALUES LESS THAN (MAXVALUE) )
, PARTITION p_ne VALUES ('NY', 'CT')
( SUBPARTITION snelow VALUES LESS THAN (1000)
, SUBPARTITION snemid VALUES LESS THAN (10000)
, SUBPARTITION snehigh VALUES LESS THAN (MAXVALUE)
)
, PARTITION p_sw VALUES ('CA', 'AZ')
( SUBPARTITION sswlow VALUES LESS THAN (1000)
, SUBPARTITION sswmid VALUES LESS THAN (10000)
, SUBPARTITION sswhigh VALUES LESS THAN (MAXVALUE)
)
, PARTITION p_se VALUES ('FL', 'GA')
( SUBPARTITION sselow VALUES LESS THAN (1000)
, SUBPARTITION ssemid VALUES LESS THAN (10000)
, SUBPARTITION ssehigh VALUES LESS THAN (MAXVALUE)
)
);

4.13 Creating List-List Partitioned Tables

CREATE TABLE car_rentals_acct
( car_id VARCHAR2(16)
, account_number NUMBER
, customer_id NUMBER
, amount_paid NUMBER
, branch_id NUMBER
, territory VARCHAR(2)
, status VARCHAR2(1)
, rental_date TIMESTAMP WITH LOCAL TIME ZONE
, constraint pk_car_rhist primary key(car_id,account_number,branch_id,rental_date)
)
PARTITION BY LIST (territory)
SUBPARTITION BY LIST (status)
( PARTITION p_nw VALUES ('WA', 'WY')
( SUBPARTITION snw_low VALUES ('C')
, SUBPARTITION snw_avg VALUES ('B')
, SUBPARTITION snw_high VALUES ('A')
)
, PARTITION p_ne VALUES ('NY', 'CT')
( SUBPARTITION sne_low VALUES ('C')
, SUBPARTITION sne_avg VALUES ('B')
, SUBPARTITION sne_high VALUES ('A')
)
, PARTITION p_sw VALUES ('CA', 'AZ')
( SUBPARTITION ssw_low VALUES ('C')
, SUBPARTITION ssw_avg VALUES ('B')
, SUBPARTITION ssw_high VALUES ('A')
)
, PARTITION p_se VALUES ('FL', 'GA')
( SUBPARTITION sse_low VALUES ('C')
, SUBPARTITION sse_avg VALUES ('B')
, SUBPARTITION sse_high VALUES ('A')
)
, PARTITION p_ne VALUES ('NY', 'CT')
( SUBPARTITION sne_low VALUES ('C')
, SUBPARTITION sne_avg VALUES ('B')
, SUBPARTITION sne_high VALUES ('A')
)
, PARTITION p_sw VALUES ('CA', 'AZ')
( SUBPARTITION ssw_low VALUES ('C')
, SUBPARTITION ssw_avg VALUES ('B')
, SUBPARTITION ssw_high VALUES ('A')
)
, PARTITION p_se VALUES ('FL', 'GA')
( SUBPARTITION sse_low VALUES ('C')
, SUBPARTITION sse_avg VALUES ('B')
, SUBPARTITION sse_high VALUES ('A')
);

4.14 Creating Range-Hash Partitioned Tables Using a Subpartition Template

The following sample code illustrates how to use a subpartition template to create a composite Range-Hash Partition Table.

CREATE TABLE credential_evaluations
( eval_id VARCHAR2(16) primary key
, grad_id VARCHAR2(12)
, grad_date DATE
, degree_granted VARCHAR2(12)
, degree_major VARCHAR2(64)
, school_id VARCHAR2(32)
, final_gpa NUMBER(4,2)
)
PARTITION BY RANGE (grad_date)
SUBPARTITION BY HASH (grad_id)
SUBPARTITION TEMPLATE
( SUBPARTITION S_a TABLESPACE t1,
SUBPARTITION S_b TABLESPACE t2,
SUBPARTITION S_c TABLESPACE t3,
SUBPARTITION S_d TABLESPACE t4
)
( PARTITION grad_date_70s
VALUES LESS THAN ( TO_DATE('01-JAN-1980','DD-MON-YYYY'))
, PARTITION grad_date_80s
VALUES LESS THAN ( TO_DATE('01-JAN-1990','DD-MON-YYYY'))
, PARTITION grad_date_90s
VALUES LESS THAN ( TO_DATE('01-JAN-2000','DD-MON-YYYY'))
, PARTITION grad_date_00s
VALUES LESS THAN (TO_DATE('01-JAN-2010','DD-MON-YYYY'))
);

4.15 Creating a Multicolumn Range-Partitioned Table

The following sample code illustrates the use of a multicolumn partitioned approach for table BI_AUTO_RENTALS_SUMMARY. This example shows a multicolumn range-partitioned table, storing the actual DATE information in three separate columns: year, month, and day with partition quarterly granularity. The purpose of this type of partition is to avoid dispersion of data within the range and balance the physical usage of each partition entries for performance tuning optimization purposes.

CREATE TABLE bi_auto_rentals_summary
( acctno NUMBER,
rental_date TIMESTAMP WITH LOCAL TIME ZONE,
year NUMBER,
month NUMBER,
day NUMBER,
total_amount NUMBER,
CONSTRAINT pk_actdate PRIMARY KEY (acctno, rental_date))
PARTITION BY RANGE (year,month)
(PARTITION prior2008 VALUES LESS THAN (2008,1),
PARTITION pq1_2008 VALUES LESS THAN (2008,4),
PARTITION pq2_2008 VALUES LESS THAN (2008,7),
PARTITION pq3_2008 VALUES LESS THAN (2008,10),
PARTITION pq4_2008 VALUES LESS THAN (2009,1),
PARTITION p_current VALUES LESS THAN (MAXVALUE,1));

4.16 Creating a Virtual Column Based Partitioned Table

In the context of partitioning, a virtual column can be used as any regular column.

All partition methods are supported when using virtual columns, including interval partitioning and all different combinations of composite partitioning.
There is no support for calls to a PL/SQL function on the virtual column used as the partitioning column.

The next sample code shows the DIRECT_MARKETING table partitioned by range-range using a virtual column for the subpartitioning key. The virtual column calculates the difference between the historic average sales and the forecasted potential sales. As a rule, at least one partition must be specified.

CREATE TABLE direct_marketing
( promo_id NUMBER(6) NOT NULL
, cust_id NUMBER NOT NULL
, campaign_date DATE NOT NULL
, channel_code CHAR(1) NOT NULL
, campaign_id NUMBER(6) NOT NULL
, hist_avg_sales NUMBER(12,2) NOT NULL
, sales_forecast NUMBER(12,2) NOT NULL
, discrepancy AS (sales_forecast - hist_avg_sales ) )
PARTITION BY RANGE (campaign_date) INTERVAL (NUMTOYMINTERVAL(1,'MONTH'))
SUBPARTITION BY RANGE(discrepancy) SUBPARTITION TEMPLATE
( SUBPARTITION p_low VALUES LESS THAN (5000)
, SUBPARTITION p_avg VALUES LESS THAN (15000)
, SUBPARTITION p_high VALUES LESS THAN (100000)
, SUBPARTITION p_max VALUES LESS THAN (MAXVALUE ) )
(PARTITION p_campaign_prior_2009 VALUES LESS THAN
(TO_DATE('01-JAN-2009','dd-MON-yyyy')) )
ENABLE ROW MOVEMENT COMPRESS PARALLEL NOLOGGING;

The reader could note that the hist_avg_sales and sales_forecast are two real columns in the DIRECT_MARKETING table of NUMBER type, which are used in the description of the virtual column.

4.17 Using Compression and Partitioning

The following important issues apply to partitioning when using compression as well:

For heap-organized partitioned tables, compress some or all partitions using table compression.
The compression attribute can be declared for a tablespace, a table, or a partition of a table.
Whenever the compress attribute is not specified, it is inherited like any other storage attribute.

The following sample code creates a list-partitioned table with both compressed and uncompressed partitions. The compression attribute for the table and all other partitions is inherited from the tablespace level.

CREATE TABLE credential_evaluations
( eval_id VARCHAR2(16) primary key
, grad_id VARCHAR2(12)
, grad_date DATE
, degree_granted VARCHAR2(12)
, degree_major VARCHAR2(64)
, school_id VARCHAR2(32)
, final_gpa NUMBER(4,2))
PARTITION BY RANGE (grad_date)
SUBPARTITION BY HASH (grad_id) SUBPARTITIONS 8 STORE IN (T1,T2,T3,T4)
( PARTITION grad_e_70s
VALUES LESS THAN (TO_DATE('01-JAN-1980','DD-MON-YYYY')) TABLESPACE T1 COMPRESS
, PARTITION grad_date_80s
VALUES LESS THAN (TO_DATE('01-JAN-1990','DD-MON-YYYY')) TABLESPACE T2 COMPRESS
, PARTITION grad_date_90s
VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')) TABLESPACE T3 NOCOMPRESS
, PARTITION grad_date_00s
VALUES LESS THAN (TO_DATE('01-JAN-2010','DD-MON-YYYY')) TABLESPACE T4 NOCOMPRESS)
ENABLE ROW MOVEMENT;

4.18 Using Partitioned Index Key Compression

The following recommendations apply to partitioned-index key compression, namely:

Compress some or all partitions of a B-tree index using key compression.
Key compression is applicable only to B-tree indexes.
Bitmap indexes are stored in a compressed manner by default.
An index using key compression eliminates repeated occurrences of key column prefix values, thus saving space and I/O.

This sample code creates a local partitioned index with all partitions except the most recent one compressed:

CREATE INDEX ndx_grad_date ON credential_evaluations (grad_date)
COMPRESS LOCAL
(
PARTITION grad_date_70s,
PARTITION grad_date_80s,
PARTITION grad_date_90s,
PARTITION grad_date_00s NOCOMPRESS
);

Likewise, it is not possible to specify COMPRESS (or NOCOMPRESS) explicitly for an index subpartition. The compression setting in a partition is inherited for a child subpartition attribute. Each index subpartition of a parent partition inherits its key compression setting.

4.19 Creating Range-Partitioned Index-Organized Tables

In general, it is possible to partition index-organized tables, and their secondary indexes, by the range method. The following sample code creates the range-partitioned index-organized table NEW_MKTG_CAMPAINGS. The INCLUDING clause specifies that all columns after period_code are to be stored in an overflow segment. There is one overflow segment for each partition, all stored in the same tablespace (T11). Optionally, OVERFLOW TABLESPACE is specified at the individual partition level, in which case some or all of the overflow segments could have separate TABLESPACE attributes.

CREATE TABLE new_mktg_campaigns
( campaign_id NUMBER(8)
, period_code INTEGER CONSTRAINT rck CHECK (period_code BETWEEN 1 AND 26)
, campaign_name VARCHAR2(20)
, projected_sales NUMBER(12,2)
, campaign_desc VARCHAR2(4000),
PRIMARY KEY (campaign_id, period_code)
) ORGANIZATION INDEX
INCLUDING period_code OVERFLOW TABLESPACE T11
PARTITION BY RANGE (period_code)
(PARTITION VALUES LESS THAN (10) TABLESPACE t1,
PARTITION VALUES LESS THAN (20) TABLESPACE t2 OVERFLOW TABLESPACE t9,
PARTITION VALUES LESS THAN (MAXVALUE) TABLESPACE t13);

4.20 Creating Hash-Partitioned Index-Organized Tables

Another option for partitioning index-organized tables is to use the hash method. In the following example, the FUTURE_MKTG_CAMPAINGS index-organized table is partitioned by the hash method.

CREATE TABLE future_mktg_campaigns
( campaign_id NUMBER(8)
, period_code INTEGER CONSTRAINT fnock CHECK (period_code BETWEEN 1 AND 26)
, campaign_name VARCHAR2(20)
, projected_sales NUMBER(12,2)
, campaign_desc VARCHAR2(2000),
PRIMARY KEY (campaign_id, period_code)
)
ORGANIZATION INDEX
INCLUDING period_code OVERFLOW TABLESPACE T11
PARTITION BY HASH (period_code)
PARTITIONS 8
STORE IN (T1,T2,T3,T4,T5,T6,T7,T8)
OVERFLOW STORE IN (T9,T10,T11);

4.21 Creating a List-Partitioned Index-Organized Table

The other option for partitioning index-organized tables is to use the list method.

CREATE TABLE current_mktg_campaigns
( campaign_id NUMBER(8)
, period_code INTEGER CONSTRAINT fpclst_ck
CHECK (period_code BETWEEN 1 AND 26)
, campaign_name VARCHAR2(20)
, projected_sales NUMBER(12,2)
, campaign_desc VARCHAR2(4000),
PRIMARY KEY (campaign_id, period_code))
ORGANIZATION INDEX
INCLUDING period_code OVERFLOW TABLESPACE T11
PARTITION BY LIST (period_code)
(PARTITION A VALUES (2, 4, 8, 10,12,14,16) TABLESPACE t12,
PARTITION B VALUES (1,3,5,7,9,11,13,15,17) TABLESPACE t14
OVERFLOW TABLESPACE t15,
PARTITION C VALUES (DEFAULT) TABLESPACE t10);

4.22 Creating Composite Interval-RangeHashList Partitioned Tables

In comparison to previous examples, the composite interval range-partitioned tables subpartitioned by any of range, hash or list provides a very powerful strategy in using a time domain, such as those associate with a DATE OR TIMESTAMP column. The following sample code shows the PRO_MARKETING_CAMPAIGNS table as interval-partitioned using monthly intervals on campaign_date, with hash subpartitions by period_code.

CREATE TABLE pro_marketing_campaigns
( campaign_id NUMBER(8)
, campaign_name VARCHAR2(20)
, campaign_date DATE
, period_code INTEGER CONSTRAINT fcopck CHECK (period_code BETWEEN 1 AND 26)
, projected_sales NUMBER(12,2)
, campaign_desc VARCHAR2(4000),
PRIMARY KEY (campaign_id, period_code))
PARTITION BY RANGE (campaign_date) INTERVAL (NUMTOYMINTERVAL(1,'MONTH'))
SUBPARTITION BY HASH (period_code) SUBPARTITIONS 4
( PARTITION p_prior_2009 VALUES LESS THAN (TO_DATE('01-JAN-2009','dd-mon-yyyy')))
PARALLEL COMPRESS FOR ALL OPERATIONS;

4.23 Using Encryption in Partitioned Tables

While encryption is supported in partitioned tables, it is not possible to partitioned a table on an encrypted column under any partitioning strategy.

Exhibit 1. Partitioning Key and Encryption on the Same Column are Incompatible.[1]

[1] Removing the encryption option will allow for the table to be created.

4.24 Using Multiple Block Size Caches

When using multiple block size databases, i.e., those using more than one tablespace block size and associated buffer cache, there are many congruent features to take advantages of to a higher performance optimization.

Exhibit 2. Using multiple Block Size to Optimize Index Performance.

Creating indexes on a tablespace with a larger block size will increasing performance in DSS and in most OLTP scenarios.

The following sample code creates the CREDENTIAL_TABLES in the 8k block size T1,T2,T3,and T4 tablespaces, and local indexes on the 16k T18,T20,T22,T24 tablespaces, as cached respectively.

5. Partitioning- Related Data Dictionary Views

The following exhibit illustrates the list of data dictionary views that handle partitioning-related metadata.

Exhibit 3. Oracle Partitioning-related Data Dictionary Views

6. Maintenance Operations of Partitions and Subpartitions

Exhibit 4. Maintenance Operations of Subpartitions

Exhibit 5. Maintenance Operations of Index Partitions.

The following operations support the UPDATE INDEXES clause:

ADD PARTITION SUBPARTITION
COALESCE PARTITION SUBPARTITION
DROP PARTITION SUBPARTITION
EXCHANGE PARTITION SUBPARTITION
MERGE PARTITION SUBPARTITION
MOVE PARTITION SUBPARTITION
SPLIT PARTITION SUBPARTITION
TRUNCATE PARTITION SUBPARTITION

As of Oracle10g, SKIP_UNUSABLE_INDEXES is an initialization parameter with a default value of TRUE. This setting disables error reporting of indexes and index partitions marked UNUSABLE. To avoid choosing an alternative execution plan to evading the unusable elements, set this parameter to FALSE.
The following set of exhibits illustrates various scenarios of maintenance operations of partitions and subpartitions.

Exhibit 6. Adding a partition to a range partition table when MAXVALUE is not specified.

Exhibit 7. Coalescing Operation Reserved for the Hash Partitioning strategy.

Exhibit 8. The Coalescing Maintenance Operation is Reserved to the Hash Partitioning Strategy.

Exhibit 9. Truncating a Partitioning and Resetting the High Water Mark by Dropping the Storage for Exchange Preparedness.

Exhibit 10. Exchanging the Empty Partition with a Preloaded Table.

Exhibit 11. Exchanging the Empty Partition with a Preloaded Table and Updating the Indexes (Rebuild Operation).

Exhibit 12. Moving a Table Partition (When a Composite Partition is Being Used) is not Supported

Exhibit 13. Moving a Table Partition when Supported.

Exhibit 14. Moving two Table Partitions into a New System-Generated Partition.

Exhibit 15. System-Generated Merged Partition Uses Default Tablespace.

Exhibit 16. Partitions PSW and PSE are Merged into Partition PSS in the Same Tablespace.

The next exhibit shows the query and result set showing that the merged partition is set in the same tablespace as one of the merged partitions. In contrast, the previous example shows that the system-named partition was set in a default tablespace, since neither the target partition name nor the tablespace name was provided.

Exhibit 17. Merged Partitions List for Table REGIONAL_RENTALS.

Exhibit 18. Splitting a Range Partition with Date Datatype Partition Key.

Exhibit 19. Splitting a Range Partition with Date Partition Key.

Exhibit 20. Truncating a Table Partition with Various Options.

Exhibit 21. Rebuilding a Global Index Partitions.

7. Manageability

DBAs can use Oracle Enterprise Manager Database and Grid Control to create, maintain, and verify accuracy of SQL, using the Schema tag and then selecting the desired partitioning options on the relevant object, namely, tables, indexes, and index-organized tables. There is extensive support to use standard, unstructured, and user-defined datatypes.

Exhibit 22. Manageability Applied to Partitioned Index-Organized Tables.

Exhibit 23. Creating a Sample Partitioned Index-Organized Table using Database Control.

8. Partitioning to Attain the Best Availability, Manageability and Performance

The following are valid partitioning options to improve availability, manageability and optimize performance, namely:

8.1 Partition Pruning

The following concerns regarding partition pruning can be highlighted:

Partition pruning is a foundational performance feature to both DSS and OLTP, enabling the Oracle Database to perform operations only on those partitions that are relevant to the SQL.
The optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions.

Partition pruning greatly optimizes time and resources when retrieving data from disk, thus improving query performance.

When partitioning an index and a table on different columns (with a global partitioned index), then partition pruning also eliminates index partitions even when the partitions of the underlying table cannot be eliminated.

Either static or dynamic pruning could be used, depending on SQL statement.

Static pruning occurs at compile-time, with the information about the partitions accessed beforehand while dynamic pruning occurs at run-time.

Partition pruning affects the statistics of the objects involved and therefore also the execution plan of the statement.
Oracle Database prunes partitions when using range, LIKE, equality, and IN-list predicates on the range or list partitioning columns, and when using equality and IN-list predicates on the hash partitioning columns.

When using composite partitioned objects, Oracle can prune at both levels using the relevant predicates.

8.2 Partition-Wise Joins

The next paragraph asserts the most important issues regarding partition-wise joins, as follows:

Partition-wise joins minimize query response time by reducing the amount of data exchanged among parallel execution servers when joins execute in parallel, thus reducing response time and improving the use of both CPU and memory resources.
Oracle Database can perform partial partition-wise joins only in parallel.
Unlike full partition-wise joins, partial partition-wise joins require partitioning only one table on the join key.
The partitioned table is referred to as the reference table. The other table may or may not be partitioned. Partial partition-wise joins are more common than full partition-wise joins.
To execute a partial partition-wise join, the database dynamically repartitions the other table based on the partitioning of the reference table. Then, the execution becomes similar to a full partition-wise join.
In Oracle Real Application Clusters (RAC) environments, partition-wise joins also avoid or at least limit the data traffic over the interconnect, which is the key to achieving good scalability for massive join operations.

The performance advantage that partial partition-wise joins have over joins in non-partitioned tables is that the reference table is not moved during the join operation.

The parallel joins between non-partitioned tables require both input tables to be redistributed on the join key. This redistribution operation involves exchanging rows between parallel execution servers.

This is a CPU-intensive operation that can lead to excessive interconnect traffic in RAC environments.

8.3 Full Partition-Wise Joins

A full partition-wise join divides a large join into smaller joins between a pair of partitions from the two joined tables. To use this feature, you must equipartition both tables on their join keys, or use reference partitioning. For example, consider a large join between the DIRECT_MARKETING table and the CUSTOMERS table on cust_id.

SELECT c.cust_lname COUNT(*)
FROM direct_marketing dm, customers c
WHERE dm.cust_id = c.cust_id
AND dm.campaign_init_date = c.campaign_date
AND dm.campaign_date BETWEEN TO_DATE('01-JUL-2008', 'DD-MON-YYYY') AND
(TO_DATE('01-OCT-2008', 'DD-MON-YYYY'))
GROUP BY c.cust_lname HAVING COUNT(*) > 200;

Furthermore, the most commonly highlights are listed below:

Partition-wise joins reduce query response time and optimizing CPU and memory resources by minimizing the amount of data exchanged among parallel execution servers when joins execute in parallel.
In RAC environments, partition-wise joins also avoid or at least limit the data traffic over the interconnection, which is the key to achieving good scalability for massive joins.

To avoid remote I/O, both matching partitions should have affinity to the same node.
Partition pairs should be spread over all nodes to use all CPU resources available and avoid bottlenecks.
Nodes can host multiple pairs when there are more pairs than nodes, e.g., for an 8-node system and 16 partition pairs, each node receives two pairs.

8.3.1 Full Partition-Wise Joins: Composite - Single-Level

This method is a variation of the single-level - single-level method. In this scenario, one table (typically the larger table) is composite partitioned on two dimensions, using the join columns as the subpartition key.

8.3.2 Partial Partition-Wise Joins: Single-Level Partitioning

The number of partitions determines the maximum degree of parallelism, because the partition is the smallest granule of parallelism for partial partition-wise join operations.

8.3.3 Full Partition-Wise Joins: Composite - Composite

When necessary, it is possible to also partition a table by a composite method.
It is possible to get full partition-wise joins on all combinations of partition and subpartition partitions: partition - partition, partition-subpartition, subpartition-partition, and subpartition-subpartition.

8.4 Index Partitioning

The rules for partitioning indexes are similar to those for tables:

An index can be partitioned unless:

- The index is a cluster index.

- The index is defined on a clustered table.

It is possible to mix partitioned and nonpartitioned indexes with partitioned and nonpartitioned tables:
A partitioned table can have partitioned or nonpartitioned indexes.
A nonpartitioned table can have partitioned or nonpartitioned indexes.
Bitmap indexes on nonpartitioned tables cannot be partitioned.
A bitmap index on a partitioned table must be a local index.

Nonprefixed indexes are particularly useful in historical databases.

The three Oracle-supported Local Index partitioning types are:

8.4.1 Local Partitioned Indexes

In a local index, all keys in a particular index partition refer only to rows stored in a single underlying table partition. A local index is created by specifying the LOCAL attribute. Other important aspects of local partitioned indexes are as follows:

Oracle constructs the local index so that it is equipartitioned with the underlying table.

Oracle also maintains the index partitioning automatically when partitions in the underlying table are added, dropped, merged, or split, or when hash partitions or subpartitions are added or coalesced, ensuring that the index remains equipartitioned with the table.

A local index can be created UNIQUE if the partitioning columns form a subset of the index columns. This restriction guarantees that rows with identical index keys always map into the same partition, where uniqueness violations can be detected.
Only one index partition needs to be rebuilt when a maintenance operation other than SPLIT PARTITION or ADD PARTITION is performed on an underlying table partition.
The duration of a partition maintenance operation is proportional to partition size.
Local indexes support partition independence.
Local indexes support smooth roll-out of old data and roll-in of new data in historical tables.
Oracle can take advantage of the fact that a local index is equipartitioned with the underlying table to generate improved query access plans.
Local indexes simplify the task of tablespace incomplete recovery. In order to recover a partition or subpartition of a table to a point in time, the corresponding index entries must be recovered to the same point in time.
Oracle Database PL/SQL Packages and Types Reference for a description of the DBMS_PCLXUTIL package

8.4.1.1 Local Prefixed Indexes

A local index is prefixed if it is partitioned on a left prefix of the index columns.

8.4.1.2 Local Nonprefixed Indexes

A local index is nonprefixed if it is not partitioned on a left prefix of the index columns. Therefore, it is not possible to have a unique local nonprefixed index unless the partitioning key is a subset of the index key.

8.4.2 Global Partitioned Indexes

In a global partitioned index, the keys in a particular index partition may refer to rows stored in more than one underlying table partition or subpartition. It is also possible to highlight the following features:

A global index can be range or hash partitioned, though it can be defined on any type of partitioned table.

A global index is created by specifying the GLOBAL attribute.
Index partitions can be merged or split as necessary.
Normally, a global index is not equipartitioned with the underlying table and usually nothing could prevent this. An index that must be equi-partitioned with the underlying table should be created as LOCAL.
A global partitioned index contains a single B-tree with entries for all rows in all partitions. Each index partition may contain keys that refer to many different partitions or subpartitions in the table.
The highest partition of a global index must have a partition bound all of whose values are MAXVALUE.

The following are distinctive features differentiating global indexes on prefixing, namely:

A global partitioned index is prefixed if it is partitioned on a left prefix of the index columns.
Global prefixed partitioned indexes can be unique or nonunique.
Nonpartitioned indexes are treated as global prefixed nonpartitioned indexes.loba

The most practical aspects of global partitioned indexes are emphasized in the following statements:

Global partitioned indexes are harder to manage than local indexes.
When the data in an underlying table partition is moved or removed (SPLIT, MOVE, DROP, or TRUNCATE), all partitions of a global index are affected. Thus, global indexes do not support partition independence.
When an underlying table partition or subpartition is recovered to a point in time, all corresponding entries in a global index must be recovered to the same point in time. Because these entries may be scattered across all partitions or subpartitions of the index, mixed in with entries for other partitions or subpartitions that are not being recovered, there is no way to accomplish this except by re-creating the entire global index.
When deciding how to partition indexes on a table, consider the mix of applications that need to access the table.
There is normally a trade-off between performance and availability, and manageability.

8.5 Guidelines to Index Partitioning

The most important guidelines for OLTP and DSS are listed below respectively, as follows:

8.5.1 For OLTP applications

Global indexes and local prefixed indexes provide improved performance over local non-prefixed indexes because they minimize the number of index partition probes.
Local indexes support more availability when there are partition or subpartition maintenance operations on the table.
Local non-prefixed indexes are very useful for historical databases.

8.5.2 For DSS applications

Local non-prefixed indexes can improve performance because many index partitions can be scanned in parallel by range queries on the index key.
For historical tables, indexes should be local if possible. This limits the impact of regularly scheduled drop partition operations.
Unique indexes on columns other than the partitioning columns must be global because unique local non-prefixed indexes whose key does not contain the partitioning key are not supported.

8.6 Tuning and Mixing objects in Multiple Block Size Database Models

The option to set index in larger block size tablespaces and caches has proven to greatly improve performance in Decision Support Systems (DSS), as introduced in the author’s paper “OMBDB, Oracle Multiple Block Size Databases: An Innovative Paradigm for Datawarehousing Architectures”, as verified by several field researchers and leading consultants. This option has also performed quite well in many Online Transaction Processing (OLTP) Systems.

9. Partitioning and Table Compression

When using partitioning and compression options together, it is possible to use compress in most cases, and the clause COMPRESS FOR ALL OPERATIONS can be specified at table creation time. Partitions can be individually set as compressed or not compressed, as needed.

Likewise, when using table compression on partitioned tables with bitmap indexes, the DBA needs to do the following before introducing the compression attribute for the first time:

1. Mark bitmap indexes unusable.
2. Set the compression attribute.
3. Rebuild the indexes.

10. Partition Strategy Recommendations

10.1 When to Use Range or Interval Partitioning

Range partitioning is a convenient method for partitioning historical data. Besides, other reasons include:

The boundaries of range partitions define the ordering of the partitions in the tables or indexes.
Interval partitioning is an extension to range partitioning in which, beyond a point in time, partitions are defined by an interval. Interval partitions are automatically created when the data is inserted into the partition.
Range or interval partitioning is often used to organize data by time intervals on a column of type DATE.

For instance, keeping the past 48 months’ worth of data online, Range partitioning simplifies this process. To add data from a new month, the DBA will load it into a separate table, clean it, index it, and then add it to the range-partitioned table using the EXCHANGE PARTITION statement, all while the original table remains online. After adding the new partition, the DBA can drop the trailing month with the DROP PARTITION statement.

10.2 When to Use Hash Partitioning

There are scenarios when it is not trivial into which partition data should reside, although the partitioning key can be identified. With hash partitioning, a row is placed into a partition based on the result of passing the partitioning key into a hashing algorithm. The next guidelines should be followed carefully:

When using this approach, data is randomly distributed across the partitions rather than grouped together.
Hence, this is a great approach for some data, but may not be an effective way to manage historical data.
Partition pruning is limited to equality predicates.
Hash partitioning also supports partition-wise joins, parallel DML and parallel index access.
Hash-partitioning is beneficial when the DBA needs to enable partial or full parallel partition-wise joins with very likely equi-sized partitions or distribute data evenly among the nodes of an MPP platform using RAC, thus minimizing interconnect traffic when processing internode parallel statements.

10.3 When to Use List Partitioning

It is recommended to use list partitioning when you want to specifically map rows to partitions based on discrete values.

10.4 When to Use Composite Partitioning

Composite partitioning offers the benefits of partitioning on two dimensions. From a performance perspective, it benefits from partition pruning on one or two dimensions depending on the SQL statement, taking advantage of both full or partial partition-wise joins on either dimension, as needed.

It can benefit from parallel backup and recovery of a single table (manageability perspective).

The DBA can split up backups of your tables and you can decide to store data differently based on identification by a partitioning key.

The database stores every subpartition in a composite partitioned table as a separate segment.

Thus, the subpartitions may have properties that differ from the properties of the table or from the partition to which the subpartitions belong.

10.5 When to Use Composite Range-Hash Partitioning

Composite range-hash partitioning is particularly common for tables that store history, are very large as a result, and are frequently joined with other large tables. The following are relevant issues, namely:

Composite range-hash partitioning provides the benefit of partition pruning at the range level.
Opportunity to perform parallel full or partial partition-wise joins at the hash level. Specific cases can benefit from partition pruning on both dimensions for specific SQL statements.
Composite range-hash partitioning can also be utilized for tables that traditionally use hash partitioning, but also use a rolling window approach.

10.6 When to Use Composite Range-List Partitioning

Composite range-list partitioning is mostly used for large tables that store historical data and are usually accessed on more than one dimension.

10.7 When to Use Composite Range-Range Partitioning

Composite range-range partitioning is helpful for applications that store time-dependent data on more than one time dimension. Likewise, business cases for composite range-range partitioning could include ILM scenarios, and applications that store historical data and need to categorize its data by range on another dimension.

10.8 When to Use Composite List-Hash Partitioning

Composite list-hash partitioning is applicable to large tables that are usually accessed on one dimension, but because of their size need yet to take advantage of parallel full or partial partition-wise joins.

10.9 When to Use Composite List-List Partitioning

Composite list-list partitioning is helpful for large tables that are often accessed on different dimensions. The DBA can explicitly map rows to partitions on those dimensions on the basis of discrete values.

10.10 When to Use Composite List-Range Partitioning

Composite list-range partitioning is advantageous for large tables that are accessed on different dimensions. For the most commonly used dimension, the DBA can explicitly map rows to partitions on discrete values. In general, list-range partitioning is likely to be used for tables that use range values within a list partition; in contrast range-list partitioning is mostly used for discrete list values within a range partition. Besides, list-range partitioning is less likely to be used to store historical data, although equivalent scenarios all work. Range-list partitioning can be implemented using interval-list partitioning, while list-range partitioning does not support interval partitioning.

10.11 When to Use Interval Partitioning

Interval partitioning can be used for every table that is range partitioned and uses fixed intervals for new partitions. The database automatically creates interval partitions as data for that partition is loaded. Until this happens, the interval partition exists but no segment is created for the partition.

The benefit of interval partitioning is that there is no need to create your range partitions explicitly. Therefore, a DBA could consider using interval partitioning unless there is a need to create range partitions with different intervals, or a need to specific partition attributes when creating range partitions. When upgrading an application it is recommended to use range partitioning or composite range-rangehashlist partitioning, accordingly.

10.12 When to Use Reference Partitioning

Reference partitioning is effective in the following scenarios:
When denormalizing or planning to denormalize, a column from a master table into a child table in order to get partition pruning benefits on both tables.
If two large tables are joined often, then the tables are not partitioned on the join key, but you want to take advantage of partition-wise joins.
Indeed, reference partitioning implicitly enables full partition-wise joins.
If data in multiple tables has a related life cycle, then reference partitioning can provide significant manageability benefits.
Partition management operations against the master table are automatically cascaded to its descendents. For example, when adding a partition to the master table, that creation is automatically propagated to all its descendents.
In order to use reference partitioning, the DBA has to enable and enforce the foreign key relationship between the master table and the reference table in place.
It is also possible to cascade reference-partitioned tables based on the data model used.

10.13 When to Partition on Virtual Columns

Virtual column partitioning enables partitioning on an expression, which may use data from other columns, and perform calculations with these columns.

The most relevant highlights are:

There is no support for PL/SQL function calls on a virtual column definitions as a partitioning key.
Virtual column partitioning supports all partitioning methods as well as performance and manageability features.
Virtual columns could be used when tables are frequently accessed using a predicate that is not directly captured in a column, but can be derived, in order to get partition pruning benefits.
The virtual column does not require any storage.

11. Oracle Database Partitioning for ILM

The Oracle Database Partitioning option provides an uniquely ideal platform for implementing an ILM solution offering:

11.1 Application Transparency

There is no need to customize applications
Data can easily be moved and accessed at the different stages of its lifecycle.
Flexibility required to quickly adapt to any new regulatory compliance.

11.2 Fine-grained

View data at a very fine-grained level as well as group related data together, whereas storage devices only see bytes and blocks.

11.3 Low-Cost

Low cost storage is a key factor in implementing ILM.

11.4 Enforceable Compliance Policies

It is imperative to show to regulatory bodies that data is being retained and managed in accordance with the regulations defining security and audit policies, which enforce and log all access to data. In general, Enforceable Compliance Policies where Oracle Partitioning becomes mission-critical involve:

Data Retention

Immutability
Privacy
Auditing
Expiration

12. Oracle Partitioning for Datawarehousing

Datawarehouses often require techniques both for managing large tables and providing good query optimization.

Oracle Partitioning is beneficial in attaining the following Datawarehousing goals, namely:

12.1 Scalability

Partitioning is effective scaling a data warehouse by dividing database objects into smaller pieces, enabling access to smaller, more manageable objects. Providing direct access to smaller objects addresses the scalability requirements of data warehouses:

Bigger Database
Bigger Individual tables: More Rows in Tables
More Users Querying the System
More Complex Queries.

12.2 Performance

12.2.1 Partition Pruning

Partition pruning is an essential performance feature since the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list.

Besides, partition pruning greatly reduces the amount of data retrieved from disk and shortens processing time, thus improving query performance and optimizing resource utilization.

12.2.1.1 Basic Partition Pruning Techniques

The optimizer utilizes a wide variety of predicates for pruning. The three predicate types, equality, range, and IN-list, are the most commonly used cases of partition pruning.

12.2.1.2 Advanced Partition Pruning Techniques

Oracle also prunes in the presence of more complex predicates or SQL statements involving partitioned tables. For instance, when a partitioned table is joined to the subset of another table, constrained by a WHERE clause condition.

12.3 Manageability

Manageability is greatly improved of the usage of Oracle Enterprise Manager Database and Grid Control and the usage of supplied packages.

13. Partitioning Materialized Views

The underlying storage for a materialized view is a table structure, and therefore partitioning materialized views is quite similar. When the database rewrites a query to run against materialized views, the query can take advantage of the same performance features as those queries running against tables MV’s directly benefit from. Similarly, a rewritten query may eliminate materialized view partitions and it can take advantage of partition-wise joins, when joins back to tables or with other materialized views are necessary.
The next sample code illustrates how to effectively create a compressed materialized view partitioned by hash, which uses an aggregation on period_code.

Exhibit 24. SQL DDL Creating a Partition Materialized View

13.1 Partition Exchange Load (PEL)

Partitions can be added using Partition Exchange Load (PEL). When using PEL, a separate identical table to a single partition is created, including the same indexes and constraints, if any.

13.2 Partitioning and Materialized View Refresh Strategies

Full refresh
Fast (incremental) refresh based on materialized view logs against the base tables
Manually using DML, followed by ALTER MATERIALIZED VIEW CONSIDER FRESH

Besides, to enable query rewrites, set the QUERY_REWRITE_INTEGRITY initialization parameter.

Likewise, in order to manually keep materialized views up to date, the init.ora parameter QUERY_REWRITE_INTEGRITY must be set to either TRUSTED or STALE_TOLERATED. When using materialized views and base tables with comparable partitioning strategies, then PEL can be an extremely powerful way to keep materialized views up-to-date manually. Here is how PEL can work:

Create tables to enable PEL against the tables and materialized views
Load data into the tables, build the indexes, and implement any constraints
Update the base tables using PEL
Update the materialized views using PEL.

Besides, there is a need for a DBA to execute ALTER MATERIALIZED VIEW CONSIDER FRESH for every materialized view updated using this strategy.
Likewise, partitioning also effectively addresses OLTP features and characterization such as, namely:

Short response time
Small transactions
Data maintenance operations
Large user populations
High concurrency
Large data volumes
High availability
Lifecycle related data usage

13.3 Storage Management

The following approaches can be used, namely:

13.3.1 High Availability: Implementing storage redundancy.

Hardware-based mirroring
Using ASM for mirroring
Software-based mirroring not using ASM.

13.3.2 Performance: optimum throughput from storage devices, multiple disks must work in parallel.

Hardware-based striping
Software-based striping using ASM
Software-based striping not using ASM

13.3.3 ILM

In an Information Lifecycle Management environment, it is not possible to use striping across all devices, because all data would then be distributed across all storage pools, in contrast with different storage pools typically involving different performance characteristics.

13.3.4 Partition Placement

The following scenarios imply special considerations for partition placement, namely:

Using Bigfile Tablespaces (due to one actual big file is used.)
Customization (since other Oracle options are used in conjunction with it.)
Oracle Exadata (a new VLDB Oracle database infrastructure with superfast performance capabilities. )

14. Partitioning Support for LOB

Oracle Partitioning support the storage of LOB types, such as BLOBs and BFILE types, and user-defined object datatypes.

The next exhibits illustrate some sample code involving large objects (LOBs) and user-defined object datatypes, as shown:

Exhibit 25. One option for LOB support can be attained via Index Organized Tables, storing LOBs in a separate tablespace.

Exhibit 26. Partitioning Support for a CLOB Stored as Basicfile with Chunk and Retention set.

Exhibit 27. Partitioning Support for Object User-Defined Datatypes.

Exhibit 28. Partitioning Support for Object User-Defined Datatypes using a primary key (see next 2 exhibits).

Exhibit 29. Partitioning Support for Object User-Defined Datatypes using the object Identifier as a primary key.

Exhibit 30. Partitioning Support for Object User-Defined Datatypes using the object Identifier as a primary key.

Exhibit 31. Partitioning Support for Nested Tables using Object User-Defined Datatypes

Exhibit 32. Partitioning Support for Varrays.

Exhibit 33. Partitioning Support for XML.

15. Best Practices

In relation to best practices, it could be relevant to apply important considerations, tips and techniques, and specific constraints.

15.1 Special Considerations

Use Oracle partitioning strategic recommendations for each database system environment accordingly.
When in doubt refer to sample code, forum discussions, and case studies.
Consolidate recommendations made in this presentation into a practical enterprise policy framework.
From the business and functional point of view, a partitioning strategy is normally identified with a functional goal-seeking perspective, and therefore it needs to be mapped to an Oracle partitioning technical recommendation or specific partitioning strategy matching those business requirements, regulatory compliance , or system’s platform, among others.

15.2 Tips and Techniques

Using multiple block size caches can increase load throughput in DSS, in particular, when using indexes in a block size larger than the table.
This is more important volumes are based on a (Stripe and mirror everything) SAME-approach (i.e., RAID 0+1).
Likewise, performance optimization and contention reduction can be attained in OLTP systems using the same approach, when the appropriate partitioning strategy is being used, in accordance to the strategic recommendations previously made.

15.3 Constraints

As previously stated, there is no support for LONG and LONG RAW data types on any Oracle partitioned object or any partitioning strategy discussed.
Likewise, an encrypted column cannot serve as partitioning key.
When migrating to Oracle11g or any other recent release, consider changing LONG and LONG RAW datatypes into CLOB, BLOB accordingly for current and future release forward compatibility and improved manageability.
A VARRAY of XML data types cannot be set in a partitioned table (via an SQL DDL statement.)
Certain datatypes have size and storage constraints such as LOBs or large VARCHAR2 definitions.

16. Concluding Remarks

Upon completion of this resarch study, the author has arrived to the following remarkable conclusions based on his experience in the field:

Oracle partitioning provides effective strategies to attain time and resource optimization, including CPU and memory.

Oracle Partitioning option is extremely practical to achieve regulatory compliance.
Oracle partitioning is mission-critical to attain most needed scalability, manageability, performance, and high-availability in any system platform.
Oracle partitioning can easily mix with other technologies, in particular, with multiple block size caches, which enable performance optimization by allowing optimal usage of I/O and memory resources with no contention.

Translate

Sunday, July 19, 2009

Tuesday, June 16, 2009

A Comprehensive Guide to Oracle Partitioning with Samples