**Estimating the Optimal Backup Parallel Degree**

The
postulates presented here are of public domain. However, hypotheses and
theorems on which the postulates are based are not of public domain and will be
copy righted under the ADN REsearch logo.

Derived from postulate 2, there is a need to establish a
mechanism to enhance performance. This
mechanism is to identify the optimal degree of parallelism to be used. Based on RMAN technology this is specified
when configuring the default settings to be used by an RMAN operation or by
specifying it in an RMAN script.

A rather non-formal study, suggests that the minimum
production parallel degree to be used should be at least 4. This is simply because there is significant
improvement from using a lower degree such as 2 or 1. However, attempting to identified the optimal
parallel degree in a scenario where there are several huge files (as outliers
by size), requires a statistical study or a heuristic mathematical formulation.
A statistic study should produce a confidence interval with a 95% confidence, using
a sample model where the mean and the variance are known.

Through my experience, a mathematical formulation is
possible based on the variance (or standard deviation) and the average size of
all files being backed up.

A mathematical formulation proposed for this range should
consider the following formula, as follows:

where σ
is the standard deviation of the data files sizes and k = 1,2,3… ,i.e., a positive integer. This means that an appropriate parallel
degree should be established on the basis of the variability of the data file
size rather than on the number of files being backed up.

I can summarize this new postulates, as follows:

**Postulate 3:**

Derived from postulate 3, there exists a parallel degree
such that the backup performance is optimized on the basis of duration. The parallel degree can either be established
within the closed interval N) data files’ sizes. Furthermore, an additional adjustment could
take place when the ratio of the largest file to the largest small file is
significant, e.g., larger than 1000.
Such an adjustment could be based on that ratio. In Oracle technology,
this could be applicable regardless of whether the data file is of the SMALLFILE
or BIGFILE tablespace, i.e., whether one or more data files are allowed in a
tablespace. Similarly, a parallel degree
can be established through a statistical model derived from sampling model
utilizing a 95% confidence interval, where data file size are used as input,
and specific transformation such as logarithmic transformations are applied to
the model in order to attain a reasonable values for the expected parallel
degree range,

where k is a positive integer greater than or
equal to 2, and Sigma is the standard deviation of the population of all (N) data
files’ sizes.

Exhibit. An example to estimate the optimal degree of parallelism (as in an optimal economic model of returns) for a database with 100 files of size 1GB, 100 files of size 2GB, 100 files of size 4GB, 100 files of size 8GB, 50 files of size 16GB; 10 files of size 32TB; 4 files of size 64TB; and 2 files of size 128TB

From postulate 3, it is possible to suggest that the optimal value for the FILESPERSET RMAN parameter is either 1 or 2, in order to minimize the seek time fro restore operations, as well. However, the MAXSETSIZE should probably be either left alone (UNLIMITED) or controlled otherwise by the mean size of all data files involved, excluding the outlier files. A different approach with a future capacity planning could use the double of the largest file among the small files.

## No comments:

Post a Comment