Translate

Thursday, November 10, 2011

Oracle Unveils Solaris 11 in New York


Oracle Solaris 11, simply extraordinary

Yesterday, November 9, I attended the launch of Oracle Solaris 11. The presentations were lead by keynote speakers, Oracle President Mark Hurd, and Executive Vice President, John Fowler, who introduced the new features of what Oracle now calls the first operating system fully intended for cloud computing.
Among the key features highlighted, I can highlight the following:
  • Co-engineered with Oracle Stack
  • Native support to run with SuperCluster, T-4, and Exadata/Exalogic servers and ZFS storage appliance for optimal Cloud elasticity
  • ZFS Virtualized Pool Storage
  • Built for Cloud Infrastructures
  • Leading transactional capability through Oracle Sun servers benchmarks
  • Fault-tolerance automatic troubleshooting monitoring
  • Enhanced built-in security for cloud authoring and authentication
  • Native virtualization for the cloud
  • Stack-driven fault tolerance
  • High-availability capability
  • Unified Upgrade/Patch transparent method and strategy
  • Built-in Support for Solaris10 environment
  • Multi-tenancy built-in support
  • Zone/Container Resource Management
  • Unlimited Boot Environments
After the keynotes presentation, a panel of Oracle Sun engineers followed, involving an interesting discussion on the several years Oracle Sun Solaris 11 took to complete.

Finally, a customer panel followed the engineers' discussion, in which customers related their previous experience was recounted. Customers also listed their expectations on how they will upgrade to and work with Solaris 11.

Over 700 customers worldwide have already adopted Solaris11 for their production environments, by the time its Oracle official launch took place at Gothan Center in the City of New York on November 9, 2011.

Excerpts from the Oracle Solaris 11 Launching Event

video
John Fowler makes his keynote speech on Solaris11

video
An Oracle Sun engineer discusses a new feature in Solaris 11.
11
video

AnSteve Wilson, Oracle VP of Engineering, discusses a new feature in Solaris

video
An Oracle Sun engineer discusses a new feature in Solaris 11.
For a complete view of the New York launch presentations, you are welcome to visit the Oracle website at:

Thursday, September 29, 2011

The Present and Future of Datawarehousing

About Oracle Data Integrator
While companies continue to hire novice people in the data warehousing field, i.e., IT professionals with three years of experience or less, the set of compliance regulations have significantly increased, thus, giving this field an additional dimension of complexity. This is confirmed by a recent Gartner Research report.

Conventional data warehousing is about extracting, transforming, and loading data from one or several data sources, to attain integration through a variety of staging business processes, an possibly into a data mart.  The process of ETL (Extract-Transform-Load)  is quite traditional and conveys a set of constraints, such as, in most instances where the transformation is performed row-by-row, such that load can take place. E-LT (Extract-Load-Transform) is a modern approach where customized bulk load can occur without the need to be constrained by individual row-by-row transformations.

Oracle Data Integrator provides support for up to 10000 different concurrent data sources independently from their type, including a variety of structured and unstructured data sources, such as, relational and object relational databases, XML, flat files including csv and tab-delimited among others, various multi-dimensional data sources, message queuing and various other data source types.

Oracle Data Integrator is an ideal data warehousing tool to work with technologies such as Exadata and Oracle Solaris SuperCluster T4-4 (just released), as it provides seamless designing capabilities and transformations through bulk E-LT.  Besides, Oracle Data Integrator is quite versatile in interaction with other tools such as Oracle Golden Gate, allowing for fast advanced replication, including both full replication and thin provisioning.

Other important capabilities available with Oracle Data Integrator involve the possibility of performing both complex traditional ETL (Extract-Transformation-Load) and fast bulk E-TL (Extract-Transform-Load). While the former allows row-by-row transformation, the latter allows bulk on-the-fly transformation, for which this stage can essential occur at any time in the datawarehousing timeline. This capability in conjunction with Oracle's demonstrated datawarehousing leadership by quadrant positions ODI as the datawarehousing tool of the future not only for conventional datawarehousing environments but also for the Big Data landscape. This feature also enhances its unique ability to reach optimal data quality while attaining outstanding performance and throughput and minimizing latency.

In addition to this, as part of the Oracle Fusion Middleware,  ODI is extremely hot-pluggable. It can friendly interact with Oracle WebLogic, BPEL Process Manager, TopLink, JDeveloper, and Oracle Identity Management.

Concluding Remarks
Oracle Data Integrator is a comprehensive, solid datawarehousing tool that allows the fast and comprehensive, design, implementation, and development of a datawarehousing and reporting datamarts through the appropriate usage of bulk E-LT processing, highly relevant for the Big Data and general datawarehouse performance. Among the top benefits to adopt ODI, it is possible to mentioned reduced DW cost with best ROI, great flexibility, outstanding data quality, and data integrity and accuracy with increased productivity.

Sunday, May 29, 2011

Solid State/Flash and Optical Storage Technologies

The Solid State Disk Single Level Caching and the Future of Database Technology
The implementation of faster more resilient database architectures on the basis of flash drives using Single-Level Caching (SLC) is key to the engineering of new databases technologies. At the present time, database languages like SQL (Oracle, SQL Server, DB2) and QUEL (Ingres) can analyze, query, mine, dredge, and project and forecast based on (the behavior) of data, i.e., information that is static or persistent by nature, even in the event when it only takes a few seconds or a fraction of a second. SLC caching will at some point to track the more dynamic behavior of some data, e.g., the brownian motion of particles in some kind of physics chaos, such as the agitation of some fluid, or the movement of electrones and other subatomic particles; the real-time measurement of heartbeats and other vitals signs could be further measure with a more consistent level of granularity. However, the most important contribution of SLC, which currently provides the best performance among Solid State Drive technologies, is to provide the possibility to interconnect to active or dynamic live automata in general, that can mimic the exact behavior of the brownian motion, vital measurements, stock market behavioral simulators and other live simulation scenarios. Because of this, I believe that iSCSI technology will complete the dominance of the market as they can provide better interaction with “live dynamic data” store in the newly flash storage technology in interaction with other small computer interfaces and devices. Multipathing in conjunction with Oracle technologies like Automatic Storage Mangement (ASM) will provide further capability and support, such that Fiber Channel, FICON in general, can also provide support for such new technologies technologies. Perhaps, optical storage will add another component of managing “real-time live data”, and the LG Hybrid drive released by Hitachi in the last quarter of 2010 is probably the most recent development in that respect.

While there are some Oracle cartridges capable to interact with SQL, such as in Text, Multimedia, and spatial technology, or an XML-driven implementation such as xQuery, there will be a need to provide an API such that SQL can interact with the real-time nature of the dynamic behavior described for such “live data”, which could probably be store in real-time as a tuple describing the state and behavior of an live automata object-data with greater precision than today's object-relational models.

Friday, May 20, 2011

Thanks to Oracle Professionals at Collaborate11

I would like to break this blog protocol to thank all attendees at my presentations at Collaborate11, in particular, those attending my presentation on Engineering Oracle ASM Optimal Storage I/O Performance with Intelligent Data Placement on April 12, a presentation on Intelligent Storage for High-Performance Computing, for such a great feedback.  I look forward to see you again at the next great Oracle event.
Once more thank you!

Wednesday, April 13, 2011

Presenting on Forecasting Analytics at Collaborate11, Orlando, Florida

I will be presenting Perspectives on Forecasting Analytics at Collaborate11, on Thursday, April 14 from 11:00AM until noon. This business intelligence venue will take place in Room West 303C.
Everybody is welcome to attend.

Visit adnresearch.com

Monday, April 11, 2011

You are invited to my Intelligent Storage Presentation at Collaborate11

Please join Anthony D Noriega at Collaborate11, at the Orlando Orange Convention Center in Orlando, Florida on Tuesday, April 12, 4:30PM to 5:30PM, Room West 308C  for his presentation on Intelligent Storage for High-Performance Computing: "Engineering Oracle ASM Optimal I/O Performance with IDP."


Everyone is welcome!



Wednesday, March 23, 2011

Performance Tuning for the Storage Layer

Intelligent Storage for High-Performance Computing (HPC)


The following is an excerpt of my intelligent storage paper using ASM technologies, involving ASM Cluster Files System (ACFS), ASM Dynamic Volume Manager (ADVM), and Intelligent Data Placement (IDP).

The articles proposes an array of recommendations and practical strategies to optimize server performance from the storage layer up. The discussion focuses on the usage of traditional disk drives (HDD) and Solid State Drives (SSD).


Implementation on Traditional HDD Disk Arrays


This is done by creating a custom template and applying to appropriate ASM datafile and subsequently determining relevance to logical objects. The SQL API is shown in the example below:

ALTER DISKGROUP adndata1 ADD TEMPLATE datafile_hot
ATTRIBUTE ( HOT MIRRORHOT);

ALTER DISKGROUP adndata1 MODIFY FILE '+DATA/ADN3/DATAFILE/tools.256.765689507'
ATTRIBUTE ( HOT MIRRORHOT);

Dual Pairing Drive (Hybrid) Solution

This solution seeks the following goals:

SSD for performance

Whenever SSD become part of the storage performance tuning strategy, it is necessary to plan on a more consistent space management, meaning that overprovisioning is an critical requirement to attain optimal storage elasticity. This is primarily due to the asymmetry of disk read/write speed on this technology. Depending on the disk controller used by its vendor, an SSD can have a 10:1 normalized ratio when experimentally tests have been conducted by various vendors where reading tasks represent about 65% of the entire transactional workload. Overprovisioning, in general, can also account for a reduction in manual rebalancing tasks when they seem to be required, such as moving an ASM file from one disk group to another either via PL/SQL packages or RMAN, could result in the disruption of production tasks. Similarly, overprovisioning can prevent the moving of logical objects across the database to attain better performance optimization.

Thus, both DRAM SSDs and NAND SSDs have much better IOPS costs than high performance hard disks (HDDs). Besides, Flash SSDs are very well suited to read-only applications and mobility applications, and they are highly cost-effective in industries such as media, where its default storage format UFS (Universal File System) is widely utilized.

Besides ASM can reciprocally help SSD technology by ensuring that the writing of data is spread out evenly across the storage space over time. Consequently, to facilitate wear leveling, SSD controllers implement a virtual-to-physical mapping of Logical Block Addressing (LBA) to Physical Block Addressing (PBA) on the Flash media and sophisticated error correction codes.

HDD for capacity

Traditional hard disk drives (HDDs) can be used to attain the implementation of low-cost high-capacity volumes. Therefore, the hard disks have a cost per gigabyte which is significantly better than solid state.

Hybrid Pools: Implementation on HHDD

Where Disk Groups are transparently built on the mix of these technologies.

In this scenario, it is important to establish the following considerations:

Establish the hybrid ASM read preferred read groups

SQL> ALTER SYSTEM SET ASM_PREFERRED_READ_FAILURE_GROUPS= 'HYBRID.HHDD';

Decide how to best use the SSD/Flash technology for speed and performance of database online redo logs.1 Set the logs on pure SSD drives. In doing so, this could imply that the Oracle DBA Architect might need at least to create a disk group or set of disk groups solely on SSD drives2.

Consistently, tables can be associated with tablespaces that have been designed on the basis of ASM data files, which have already been set as mirror hot or mirror cold templates, accordingly. These maintenance activities can occur as a baseline, for instance, when the database is created (first loaded), migrated, or upgraded accordingly, and systematically when a relevant event occurs. For instance, a partitioned table, using partition pruning, will require to change the most active partition due to a partition interval or time range issue, which will make a different or new partition the most dynamic segment, which in turn may require to be set in a mirror hot sector for optimal performance. In general, business algorithms can be customized, implemented, and set for periodic ASM storage maintenance in order to attain optimal performance continuously.

Introducing Best Practices for Oracle ASM IDP and Dynamic Storage Tuning

The models presented here can easily be enabled in conjunction with practical industry cloud storage implementations, which apply to both the Open Cloud Computing Interface (OCCI) and Cloud Data Management Interface (CDMI) standards. For Oracle practitioners, it is important to use and apply existing environment knowledge, such as derived from Real application Testing (RAT) prior to accomplish the actual implementation.


The Hot-or-Cached Storage Optimization Model (HoCSOM)

In this model, the purpose is to explicitly customize access and optimize performance for specific database objects, in particular, tables, indexes, materialized views, and all types of object partitions, namely, table, index or materialized view partitions, which have critical current relevance directly impacting performance tuning.

The following rules are reasonably practical in order to attain an adaptive optimal tuning, as follows:

Classify the set of database objects in at least three hierarchical tiers based on segment size and segment utilization levels.

Implement operating procedures to attain a smart database object allocation in the appropriate physical and logical space in an optimal fashion with a dynamic Oracle Automatic Storage Management perspective, as follows:

Tier 1 should include a small number of objects, in comparison to the total database object conglomerate, which require special storage handling due to size and activity.

Tier 2 will include average table in both size and usage or large tables with low level of usage, and average actively small tables.

Tier 3 will include smaller tables, and rather inactive tables, in general.

Tier-Based Storage Optimization Model (TBSOM)

 This model will either utilize a combination of SSD (NAND/Flash) and Traditional HDD disk arrays in order to implement the relevant object-storage devices (OSD), i.e., Oracle ASM disk and disk groups.

The Flash capability serves partly as a way to overprovision the SSD technology when update of blocks could become expensive, as they require that the entire block is first deleted and then rewritten in full. The Flash acts as a mechanism that buffers transactions to avoid SSD latency. In some scenarios, reducing the actual size of the SSD for ASM is useful to both ASM failgroup mirroring and overprovisioning for the purpose diminishing the latency effect caused by writing on SSD devices, in particular, for updates, deletes, and inserts on used blocks. It is also important to remember that in most SSD technologies the hot sectors of the drives or volumes are automatically cached, i.e., in Oracle technology there would also be in the buffer cache or the shared pool.

This model requires the setting of the ASM_PREFERRED_READ_FAILURE_GROUPS accordingly, but does not require the implementation and application of customized MIRRORHOT templates.

Combining both Models: Hybrid Tuning Storage Optimization Model (HTSOM)

The two models can be combined, since there is no constraint that implies so, in general. However, this should be accomplished following the recommendation of Oracle Sun or any third-party vendor accordingly.
This model will require both the setting of the ASM_PREFERRED_READ_FAILURE_GROUPS and the implementation and application of customized MIRRORHOT templates, as needed for optimized database object allocation and storage performance. When using the combined model, the goal is to set the MIRRORHOT data files on the ASM Preferred Read Failure Groups to implement an optimal storage configuration.

Scenarios and Strategies

The following scenarios are both applicable to OLTP and Data Warehousing, as well as hybrid databases.

Working with Different Disk Array Pool Types

When this scenario occurs, it is important to establish the ASM preferred read disk groups, which enables a transparent way to establish storage hierarchies.

Working with Large Logical Database Objects

The models presented here allows for the tuning of the physical model at the storage level in a highly customized fashion, allowing for an optimal allocation. Thus, the Hot-or-Cached Storage Optimization Model presented allows for a large logical database object to have the best residence for optimal performance.

Working with Oracle Partitioning

When using partitioning technology, the following operations are possible or likely to occur:

Creating a new partition onto a tablespace using a MIRRORHOT ASM datafile, accordingly.

Moving a partition to a hot ASM datafile tablespace, onto an ASM datafile previously associated with a MIRRORHOT template and ASM data file.

Exchanging a partition when it becomes current to be associated with the appropriate MIRRORHOT tablespace ASM datafile.

Working with RMAN Backup and Recovery

RMAN can serve to manage ASM data files and move them between disk groups, as needed to set these files onto Oracle ASM Preferred Read Disk Groups or onto a MIRRORHOT ASM data file.

Using Oracle ASM ACFS and ADVM

The usage of these technologies should be enabled in order to apply Intelligent Data Placement (IDP) API to attain optimal storage performance tuning. This is done via the init.ora ASM parameters COMPATIBLE.RDBMS, COMPATIBLE.ASM, and COMPATIBLE.ADVM. By setting these parameters to 11.2 or later, the Oracle ASM Intelligent Data Placement API will be fully available for dynamic usage either manually or in an automated fashion. However, the compatibility can also be explicitly set when creating or altering a disk group via the SQL*Plus command line interface. Consequently, ASM volumes built on specific disk groups will inherit the settings and properties established for those disk groups, and the principles presented can be extended and applied at the ACFS and ADVM levels.

CREATE DISKGROUP DATA1
NORMAL REDUNDANCY
FAILGROUP ADNFG1
DISK '/dev/sda1', '/dev/sda2','/dev/sda3', '/dev/sda4'
FAILGROUP ADNFG2
DISK '/dev/sdb1', '/dev/sdb2','/dev/sdb3', '/dev/sdb4'
ATTRIBUTE 'compatible.rdbms' = '11.2',
'compatible.asm' = '11.2';

ALTER DISGROUP DATA1 ATTRIBUTE 'COMPATIBLE.ASM' = '11.2';


The compatible attributes can only be advanced to a more recent version.

Maintenance Operations

Maintenance operations on ASM IDP for storage optimization are capable to either maximize or optimize performance through the usage of IDP API. For instance, the activation of a new large table partition by setting the newly created partition into a Oracle ASM data file residing on a hot sector will significantly have a positive performance tuning effect over the physical model and the top active SQL statements.
Heating Up Database Logical Objects

Database logical objects can be heated up at creation time by specifying the appropriate hot tablespace. But they can also be altered and moved to the appropriate tablespace using the MIRRORHOT ASM data file. Likewise, this can happen as well when using object partitions at creation, exchange, or move time.

Changing Logical Objects Tiers

This can happen, for instance, when a highly active table or index partition utilization level changes due to an event, such as, for instance, the start of a new fiscal year, which implies that the current partition will no longer be used as often, such that the next current partition will gain a higher relevance. Therefore, an object partition, such as a table partition, local or global index partition or materialized view partition, could well be downgraded to a lower performance tier when it is affected by such an event. This is in order to take advantage of the partition pruning effect in order to use only the hot sectors when appropriate.

Moving ASM Data Files

Moving ASM data files across the ASM disk groups is usually a costly task, and normally not recommended unless strictly necessary. However, this can be accomplished using the ASMCMD utility CLI interface, via the usage of RMAN DATAFILE COPY command, or via the PL/SQL utilizing the the DBMS_FILE_TRANSFER.COPY_FILE and DBMS_FILE_TRANSFER.PUT_FILE methods or the DBMS_DISKGROUP package.



RMAN> SQL 'ALTER DATABASE DATAFILE 7 OFFLINE';
Likewise, for Oracle Managed Files (OMF), the DBA could use a similar command to the one shown below:

RMAN> COPY DATAFILE
'+DATA/ADN3/DATAFILE/tools.256.765689507'
TO '+DATA/ADN7';


For non-Oracle managed files, the DBA must include both the directory and file name when declaring the target in the COPY DATAFILE clause:


RMAN> COPY DATAFILE
'+DATA/ADN3/DATAFILE/tools_01.dbf'
TO '+DATA1/ADN7/DATAFILE/tools_01.dbf';


The DBA will need to rename the file accordingly regardless of the method used to move the file, as shown below:

Using RMAN
RMAN> SWITCH DATAFILE
'+DATA/ADN3/DATAFILE/tools.256.765689507'
TO COPY;

Using SQL

SQL> ALTER DATABASE RENAME FILE
2 '+DATA/ADN3/DATAFILE/tools.256.765689507'
3 TO '+DATA1/ADN7/DATAFILE/tools.264.765695815';


When applicable, perform the appropriate recovery:

Recover the new ASM database file:

SQL> RECOVER DATAFILE '+DATA1/ADN7/DATAFILE/tools.264.765695815';
Bring the new ASM database file ONLINE


SQL> ALTER DATABASE DATAFILE '+DATA1/ADN7/DATAFILE/tools.264.765695815' ONLINE;

Delete the old ASM database file from its original location

SQL> ALTER DISKGROUP DATA DROP FILE '+DATA/ADN3/DATAFILE/tools.256.765689507';

Overprovisioning SSD Drives

Overprovisioning is normally already built into the SSD technology to compensate for the lack of symmetry between reads and writes. Statistics show that under a typical 65% reads and 35%, there is a trend for 10:1 r/w performance ratio, meaning that reads are about 10 times faster than writes. In many database scenarios, the latency is caused because a block involved in a transaction must be completely erased before it can be rewritten. Performance on a SSD drive various based on the drive level of utilization. Thus, it will be excellent Fresh Out of Box (FOB), it will decay and go into a transient state (Transition) based on the manufacturer and will attain a Steady State with maturity. Therefore, when building pools of dual drives, i.e., hybrid arrays, the architect should follow the vendor's specific recommendations, and not assume that pros or cons apply in exactly the same way for all vendors.

Overprovisioning the drive means that it will have a cache allowing the buffering (holding of old blocks and writing of new blocks) to minimize the SSD latency. This cache can be implemented in Flash technology or using the same SSD NAND semi-conductor technology as well.

Overprovisioning on Hybrid Array Pools and SSD Pools

Overprovisioning is usually accomplished by adding a Flash cache to the SSD pool to overcome the latency of rewriting a block, which is much slower than originally reading it. This is usually part of the controller cache. In these architectures, flash memory is used by the controller for the purpose of assisting in garbage collection and alleviating some SSD write-related latency issues.

Overprovisioning on HHDD (Flash and HDD)


In this scenario a flash cache is used to buffer a certain amount of transactional data as it is written. This is built-in in many appliances, such as on EMC2 and NetApp storage appliances and in some systems such as the newest Apple Mac's and Oracle Sun Exadata database machines.

Integration with Other Technologies

The following technologies could be associated to further improve performance in conjunction with the storage optimization model presented, namely:
  • Integration with Cache Management
  • Using Multiple-block-size Database
  • Using Coherence (Oracle Data Grid)
  • Using Oracle Times-Ten.
Recently, Hitachi introduced the LG Hydrive a Dual (Hybrid) drive using an Optical Disk Drive (ODD) and Flash memory as part of its cache component.

Expected Results

Studies based on historic measurement of Jim Gray's 5 minute rule up to today's trends, show that not only we can attain a 25% improvement with ASM disk groups based conventional HDD arrays, but appliance-driven solutions using dynamic storage management over hybrid solutions have reached at least 62% improvement and up to 1000% times, i.e., 10 times better performance, based on 100% solid state disk groups. The study was held by IBM on OLTP databases using appliance-driven storage management and custom algorithm for tuning. This study did not yet utilized the Hot-or-Cached Storage Optimization model, and were fully guided by hot block level tuning. Therefore, the possibilities are endless and the market is opened to mix, match, and customize every possible solution.

There will be a significant cost reduction based on the power consumption with the usage of SSD technology, and the optimal usage of IOPS utilization.

Concluding Remarks
With Oracle11g Release 2, enterprise storage can now benefit of ASM elasticity, and the ability to customize storage optimal performance for a specific database, by engineering ASM production maintenance practices that optimize object allocation onto hot sectors, which use tablespaces related to ASM data files utilizing MIRRORHOT templates.

At the time when this paper is written, due to econometric reasons, hybrid models using both SSDs and HDDs technologies are more practical, very convenient, and highly cost-effective for most industries and enterprise sizes. While SSD technology can reduce the cost associated with energy due to various factors, until the price of solid state technologies drops significantly, the hybrid dual disk drive, using SSD/Flash for performance and HDD for capacity, will remain the de facto standard to implement ASM-based optimal storage architectures and infrastructures. Although, there are some industry-wide reliability pros and cons on SSD disks, it is expected that sometime in the future, when economics of scale will allow so, SSDs will become the predominant storage technology used with ASM.

In general, Oracle ASM IDP can improve I/O performance with dynamic load balancing. Oracle DBAs and database architects should plan any reorganization tasks for optimal performance giving greater relevance to the object-level rather than ASM file level, while also considering preferred read disk groups as a transparent method to establish storage hierarchies.

N.B.  The enitre paper can be read at and retrieved from my new website under construction, namely:

http://www.adnresearch.com/resources/2011_726_noriega_ppr.pdf