Thursday, July 3, 2008

Formulating the Grid Reliablity
Performance Model (GRPM)

The Oracle Grid Computing reliability model can best established as a direct function, process model, and overall availability, on the basis of alerts generated through the threshold settings and communicated via the Oracle Process Management and Notification Service (OPMN).

A couple of years ago, I had personally attempted to define a time-series-driven aggregation and accounting model, useful in planning grid computing and project accounting, which could be easily linked to more important business models using BPEL, project management techniques such as PERT, GANTT, or CPM, and overall IT operations and other process control models. These models combined are quite useful in planning, forecasting, and in business intelligence overall.

While the ARIMA
[1]-driven aggregation seasonal models convey these features, after researching grid behavior, I have proposed the following model in order to emulated grid computing:

Considering that reliability, R, can be represented as:

where MTTF is the Mean Time to Failure, and MTBF is the Mean Time Between Failure; and MTBF, can be decomposed into the Mean Time To Failure plus the Mean Time To Recover (MTTR).

The overall reliability model could be presented as a continuous model as follows:

Lambda is the combined effect of the arrival of both critical and warning alerts defined as:

Similarly, the inter-arrival rate for warning alerts is given by:

The number of test unit is given in any reasonable time measure. Furthermore, the overall inter-arrival rate Lambda can be modeled as the weighted mean of both critical and warning alerts, namely:

The overall reliability model, i.e., the model relating the average arrival on a probabilistic basis and foundational for the performance model, can be rewritten as:

and the strong model, showing the continuity involved in the process is giving by:

where t0 is the database start time and t1 is the time to failure, or in an aggregation model the mean time to failure overall, which should converge to its estimated value.
Similarly, in Real Applications Cluster technology, i.e., involving cluster databases, however, the reliability model can be defined as the expected value of the following expression:

where i stands for the ith node weighed through the respective MTTF and MTTR ratio, as presented.

Then, the overall reliability model for a RAC environment would adjust the overall weighed inter-arrival rate, i.e., Lambda, in a similar fashion.

where i and j refer to the jth RAC instance in the ith RAC node and jth instance inter-arrival rate in the ith node, respectively, as described by n and Lambda, based on either critical or warning alert arrival rates, accordingly.

This can also be presented as the generalized model. From this discrete model is easy to derive cubic features for analytic purposes.

Similarly, it is clear from the strong model initially presented that this model can easily be translated from a Poisson stochastic process, to a model represented by a Gamma distribution model, which is likely to be an ideal representation, also applicable to a continuous representation of the global model. A Beta distribution model could also be customized in a scenario similar to Gamma, where two parameters including time and inter-arrival rate.

Thus, for the overall grid computing environments, utilizing the reliability model presented is more practical than my original attempt to periodically aggregate grid matrices, such as those involving I/O, performance, waits, etc. The convenience provided by an alert-driven reliability model is that database engineers, architects, and DBA can control the model thresholds to attain a customized model.

In general reliability methods can lead to a comprehensive grid cost model, leading to practical implementation for business process innovation, project accounting, project engineering, and overall project management.

[1] Auto-Regressive Integrated Moving Average

This article is part of my Mastering Grid Computing Series, and it is based on my early interest in areas such as operations research (Universidad del Norte, Dr. R. Barbosa and Montclair State, Dr. J. Wang), Data Communications Model (NJIT, Drs. Chao and J.T. Wang), my auditing of Probabilistic Calculus (Rutgers University, Dr. C. Zhang), and my study of structured matrices and polynomials at CUNY Graduate Center (Dr. V. Pan). But most importantly it associates my expertise with grid computing environments with the theory studied.