An Open Call to Study and Research
“the Backup Fatigue Syndrome”
During my recent NYOUG presentation
at St. John University, I refer to the relevance of data file size sorting in
relation to backup performance and backup performance tuning degradation when significantly larger files are left at
the end of the backup, which I will hereby refer to as “the backup fatigue syndrome.” This brief article is a summary of informal
statistics on this problem, which I have never formalized for truthful
statistical research. It looks like this
sydrome is associated with various factors such as, but not limited to, logic
inherent to OS-level I/O, logic associated with generic backupset-driven
technology, and inefficiency of large storage devices, in general. The
combination of such factors, among others, is particularly critical to the
appearance of the backup fatigue syndrome.
For the past twenty years or so, I have observed various
scenarios where a backup operation took a bit longer or much longer than
expected, to be more precise. These scenarios involved not only various
instances of Oracle database backup but also OS-level backups, involving
Windows, Linux, MacOS, and Solaris, and network-driven backups; and different
media type, such as essentially tape or drive.
These observations lead me to the systematic testing of verifying that
the order by size in which files or backup pieces are used to create the
corresponding backup sets has a significant impact in the backup duration.
This could simply mean that based a backup operation is not
necessarily commutative by content (data file) size and that the order (by size)
in which datafiles are backed up does result in different duration; in particular,
when significantly large files (huge in comparison to the rest of datafiles in
the backup) are placed at the end of the backupset or last in the order which
files are backed up. This means that an abstraction would not allow me to
define an Abelian Group, as a mathematical analogy; or more exactly, a class or
abstract object that defines such a group, as the order by size does have an
impact. Comparing mathematical concepts
is quite practical here, although it may appear inappropriate to most people.
The purpose of this note is to actually entice a
comprehensive research on this topic.
Explicitly, it is important to determine the following:
1.
The proportion, if any, at which backup
performance latency occurs and results in exponential degradation when only a
serial channel is used; and when nearly
exponential degradation occurs when parallelism is used.
2.
A method to determine such as proportion of
small file to large files, and the proportion between the number of small
files, and the number of larger files; such as, for instance, in Oracle, the
number of SMALLFILE tablespace datafiles in comparison to the number of BIGFILE
tablespace datafiles. The proportion could be established via an actual
mathematical ratio or a statistical index or regression model, or as a
stochastic model with controlled probability via a Bayes or Markovian or Levy process.
3.
The smoothing of the degradation impact as
various level of parallelism are used or increased. In Oracle RMAN technology,
this explicitly relates to the actual number of channels used from the number
established.
4.
The average ratio from the rather small number
large files to the number large number of small files, which does not cause for
the backup fatigue syndrome to occur. Otherwise said, if there is a specific
generalized proportion that one can used without the degradation to occur.
5.
A (formula-based) method to determine a factor
or index to properly estimate the appropriate or custom level of parallelism
for each backup operation, as needed.
Based on the following analysis derived from a good number
of observations while running RMAN and OS-level backups where signficantly
larger files were left a the end of the backup or backup set, it is possible to
postulate the “Backup Fatigue Syndrome”, as follows:
Postulate I
When a significant large file is left at the end of a
backup, having already backed up a comparatively very large number of much
smaller files first, there will be a significant degradation in the backup
performance, and the backup duration will appear to increase exponentially.
Postulate II
Based on consistent replicated observations, if the order of
data files based on their size is an important factor in attaining optimal
backup performance, then there exists an optimal backup order by size for any
backup operation, such that the backup duration is optimized.
The following observations are made in relation to the
parallel degee used, namely:
·
For serial backups (parallel degree 1), based on
a good number of observations, the decreasing sort by size of data files should
produce an optimal duration for backup operations, under similar environmental
factors such as processor, memory, operating system, and storage, and various
others.
·
For parallel backups (parallel degree 2 or
greater), based on a good number of observations, the average of decreasing
backupsets as parallelized (sorted by size of data files) should produce an
optimal duration for backup operations, under similar environmental factors
such as processor, memory, operating system, OS clustering, database clustering
(RAC, HARD or other similar) , and storage, and various others.
Based on these postulates, it is important to consider
expanding RMAN backup capabilities, as well as third party capabilities, to
take advantage from these important observations driving the backup fatigue
syndrome postulates.
Indeed, I believe that the backup fatigue syndrome exists as
a combination of factors, in particular, how backup pieces and backupsets are
aggregated into a full backup, and various technology factors, such as large
storage system management involving file system based, library-based, and raw
devices.
Although I have a good number of sufficient observations, with
perceived backup fatigue, which repeated in similar observed scenarios, I currently
do not have enough resources to conduct a more official and formal research on
this matter using the appropriate tools, statistical methods, and experiment
design; so I would like to invite the storage networking community to
participate with me in a conjoint research on the backup fatigue syndrome,
which could improve the logistic associated with backup technology in the era
of big data and truly big data files.
Some of the expectation that can be derived from these
perspectives is to provide a sorting by size capability inherit within RMAN
rather than writing a script to attain these goals when a disparity of on the
number of small files compared to a small number of huge files occurs.
No comments:
Post a Comment