Thursday, February 4, 2010

7.5 Hash Clusters











 < Day Day Up > 











7.5 Hash Clusters


We can create a cluster of instances, which is primarily what RAC is all about. Similarly we can also cluster together data contained in multiple tables using a common field. For example, in a direct-to-home broadcasting industry that uses smartcards as their primary method to determine any transaction. The data in various tables could be clustered together using for example, the smartcard ID column in the various tables. Unlike a regular indexed table, where to retrieve data from these clustered tables requires multiple I/O operations, one to read the clustered index (to determine the key value) and subsequently to read the row from the primary table. If the cluster is a hash cluster, the I/O operations are considerable reduced, because lookup is not based on index values (as in regular indexed or clustered index retrieval) but using hash values. Hash values are generated using hash functions that generate distributed numeric values. Under this option, to find or store a row, Oracle applies the hash function to the row's cluster key value. The resulting hash value corresponds to a data block in the cluster, which is then read or written to on behalf of the issued statement. Unlike the traditional clustered index, Oracle uses the hash function to locate the row in a hash cluster, thus eliminating the extra I/O operation.


A hash cluster is created, by specifying the HASHKEYS clause in the CREATE CLUSTER statement, for example;


CREATE CLUSTER smartcard_cluster (smartcard_id NUMBER(18,0))
PCTUSED 80
PCTFREE 5
TABLESPACE smartcard_data_p001
STORAGE
(INITIAL 250K
NEXT 50K
MINEXTENTS 1
MAXEXTENTS 3
PCTINCREASE 0)
HASH IS smartcard_id HASKEYS 150;



7.5.1 Sorted hash cluster



A sorted hash cluster is where the rows corresponding to each value of the hash function are sorted on a specified set of columns in ascending order. This additional operation could improve performance or response time where repeated operations are required against the clustered data.


Taking the earlier example of the direct-to-home broadcasting industry where viewers can select events on demand using their remote control, in which case a smartcard could have an unlimited number of on demand user selections which are stored as transactions in the database. These events are stored as they happen and are required to be processed in a first-in, first-out order when bills are generated for each customer.


In this situation, using a sorted hash cluster on the timestamp that the event was viewed, a single query on the clustered key would return all rows/event transactions for the customer who owns the smartcard. For example, with the cluster created using


CREATE CLUSTER smartcard_cluster (smartcard_id NUMBER(18,0),
event_date NUMBER SORT)
PCTUSED 80
PCTFREE 5
TABLESPACE smartcard_data_p001
STORAGE
(INITIAL 250K
NEXT 50K
MINEXTENTS 1
MAXEXTENTS 3
PCTINCREASE 0)
HASH IS smartcard_id HASKEYS 150;

The corresponding clustered table created using


CREATE TABLE EVENT_DETAIL
(smartcard_id NUMBER(18,0)
event_date NUMBER SORT
event_id NUMBER(18,0))
CLUSTER smartcard_cluster
(smartcard_id
event_date);

Given the sorted order of the data, a query on the cluster key would return the event records for the specified hash key (smartcard_id) by the oldest record first.


Since hash keys are created for a given column, they are useful for OLTP applications that use key based retrieval (single-tone selects) compared to using range scans. This also makes it important that considerable attention should be given during the design phase in selecting the appropriate cluster key.



HASH IS parameter in the CREATE CLUSTER statement should only be used if the cluster key is a single column of datatype NUMBER and contains uniformly distributed integers. If this condition does not apply the HASH IS clause should be omitted.



SIZE parameter is specified to set the average amount of space required to hold all rows for any given hash key. Hence it is important to determine the proper SIZE value based on the characteristics, such as number of rows, length of rows etc. of the data.






Oracle 10g 

New Feature: Sorted hash cluster is a new feature introduced in Oracle version 10g to allow faster retrieval of data for applications where data is consumed in the order in which it was inserted.






















 < Day Day Up > 



No comments: