Data Warehousing con ORACLE Viste materializzate / Snapshots Star query Tabelle partizionate Tabelle Organizzate ad Indice Indici BitMapped Cluster Indici Function Based Transportable Tablespaces
Oracle Architectural Components (Logical) DataBase Tablespace DataFile Oracle Block O.S. Block Partition Owner Schema Segment Extent Table Index Cluster Snapshot
Partitioning offers these advantages: Partitioning enables data management operations such data loads, Index creation and rebuilding backup/recovery at the partition level, rather than on the entire table. This results in significantly reduced times for these operations. Partitioning improves query performance. In many cases, the results of a query can be achieved by accessing a subset of partitions, rather than the entire table. For some queries, this technique (called partition pruning) can provide order-of-magnitude gains in performance. Partitioning can significantly reduce the impact of scheduled downtime for maintenance operations. Partition independence for partition maintenance operations lets you perform concurrent maintenance operations on different partitions of the same table or index.
Partitioning offers these advantages: Partitioning increases the availability of mission-critical databases if critical tables and indexes are divided into partitions to reduce the maintenance windows, recovery times, and impact of failures. Partitioning can be implemented without requiring any modifications to your applications. For example, you could convert a nonpartitioned table to a partitioned table without needing to modify any of the SELECT statements or DML statements which access that table. You do not need to rewrite your application code to take advantage of partitioning.
Partitioning Methods Oracle provides the following partitioning methods: Range Partitioning List Partitioning Hash Partitioning Composite Partitioning Composite partitioning is a combination of other partitioning methods.
Range Partitioning Range partitioning maps data to partitions based on ranges of partition Key Values that you establish for each partition. It is the most common type of partitioning and is often used with dates. When using range partitioning, there are a few rules to keep in mind: 1) Each partition has a VALUES LESS THAN clause, which specifies a non inclusive upper bound for the partitions. Any binary values of the partition key equal to or higher than this literal are added to the next higher partition. 2) All partitions, except the first, have an implicit lower bound specified by the VALUES LESS THAN clause on the previous partition. 3) A MAXVALUE literal can be defined for the highest partition. MAXVALUE represents a virtual infinite value that sorts higher than any other possible value for the partition key, including the null value.
CREATE TABLE sales_range ( salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_amount NUMBER(10), sales_date DATE) PARTITION BY RANGE(sales_date) PARTITION sales_jan2000 VALUES LESS THAN(TO_DATE(’01/02/2000’,’DD/MM/YYYY’)), PARTITION sales_feb2000 VALUES LESS THAN(TO_DATE(’01/03/2000’,’DD/MM/YYYY’)), PARTITION sales_mar2000 VALUES LESS THAN(TO_DATE(’01/04/2000’,’DD/MM/YYYY’)), PARTITION sales_apr2000 VALUES LESS THAN(TO_DATE(’01/05/2000’,’DD/MM/YYYY’)), )
List Partitioning List partitioning enables you to explicitly control how rows map to partitions. You do this by specifying a list of discrete values for the partitioning key in the description for each partition. This is different from range partitioning, where a range of values is associated with a partition and from hash partitioning, where a hash function controls the row-to-partition mapping. The advantage of list partitioning is that you can group and organize unordered and unrelated sets of data in a natural way. CREATE TABLE sales_list (salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_state VARCHAR2(20), sales_amount NUMBER(10), sales_date DATE) PARTITION BY LIST(sales_state) ( PARTITION sales_west VALUES IN (’California’, ’Hawaii’), PARTITION sales_east VALUES IN (’New York’, ’Virginia’, ’Florida’), PARTITION sales_cent VALUES IN (’Texas’, ’Illinois’) )
to range or list partitioning. Hash Partitioning Hash partitioning enables easy partitioning of data that does not lend itself to range or list partitioning. It does this with a simple syntax and is easy to implement. It is a better choice than range partitioning when: 1) You do not know beforehand how much data will map into a given range 2) The sizes of range partitions would differ quite substantially or would be difficult to balance manually 3) Range partitioning would cause the data to be undesirably clustered 4) Performance features such as parallel DML are important The concepts of splitting, dropping or merging partitions do not apply to Hash partitions. Instead, hash partitions can be added and coalesced. CREATE TABLE sales_hash (salesman_id NUMBER(5) , salesman_name VARCHAR2(30), sales_amount NUMBER(10), week_no NUMBER(2)) PARTITION BY HASH(salesman_id) PARTITIONS 4 STORE IN (tbs1, tbs2, tbs3, tbs4);
Maintaining Partitions This section describes how to perform the following specific partition maintenance operations: Moving Partitions Adding Partitions Dropping Partitions Coalescing Partitions Modifying Partition Default Attributes Truncating Partitions Splitting Partitions Merging Partitions Exchanging Table Partitions
Coalescing Free Space Space for tablespace segments is managed using extents, which are made up of a specific number of contiguous data blocks. Thus, a larger free extent can be fragmented, or smaller contiguous free extents can be coalesced into one larger free extent. However, continuous allocation and deallocation of free space fragments your tablespace and makes allocation of larger extents more difficult. By default, SMON (system monitor) processes incrementally coalesce the free extents of tablespaces in the background.
Index-Organized Tables Regular table access ROWID IOT access Non-key columns Key column Row header
Index-Organized Tables An index-organized table has a storage organization that is a variant of a primary B-tree. Unlike an ordinary (heap-organized) table whose data is stored as an unordered collection (heap), data for an index-organized table is stored in a B-tree index structure in a primary key sorted manner. Besides storing the primary key column values of an index-organized table row, each index entry in the B-tree stores the nonkey column values as well.
Index-Organized Tables Compared with Regular Tables Faster key-based access to table data Reduced storage requirements Main restrictions: Must have a primary key Cannot use unique constraints Cannot be clustered
Secondary Indexes on Index-Organized Tables Secondary index support on index-organized tables provides efficient access to index-organized table using columns that: 1) are not the primary key 2) are not prefix of the primary key. Oracle constructs secondary indexes on index-organized tables using logical row identifiers (logical rowids) that are based on the table’s primary key. A logical rowid optionally can includes a physical guess, which identifies the block location of the row. Oracle can use these physical guesses to probe directly into the leaf block of the index-organized table, bypassing the primary key search. Because rows in index-organized tables do not have permanent physical addresses, the physical guesses can become stale when rows are moved to new blocks. A) Without physical guesses, access involves two index scans: a secondary index scan followed by a scan of the primary key index. B) With accurate physical guesses, access involves a secondary index scan and an additional I/O to fetch the data block containing the row. C) With inaccurate physical guesses, access involves a secondary index scan and an I/O to fetch the wrong data block (as indicated by the physical guess), followed by a scan of the primary key index.
Creating Index-Organized Tables SQL> create table sales 2 (office_cd number(3) 3 ,qtr_end date 4 ,revenue number(10,2) 5 ,constraint sales_pk 6 PRIMARY KEY (office_cd,qtr_end) 7 ) 8 ORGANIZATION INDEX 9 tablespace indx 10 storage (…);
Bitmap Indexes Bitmap indexes are widely used in data warehousing environments. The environments typically have: large amounts of data and ad hoc queries, but a low level of concurrent DML transactions and a low level of cardinality. For such applications, bitmap indexing provides: 1) Reduced response time for large classes of ad hoc queries 2) Reduced storage requirements compared to other indexing techniques 3) Dramatic performance gains even on hardware with a relatively small number of CPUs or a small amount of memory 4) Efficient maintenance during parallel DML and loads Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in terms of space because the indexes can be several times larger than the data in the table.
The following shows a portion of a company's customers table. SELECT cust_id, cust_gender, cust_marital_status, cust_income_level FROM customers; CUST_ID C CUST_MARITAL_STATUS CUST_INCOME_LEVEL ---------- - -------------------- --------------------- 70 F D: 70,000 - 89,999 80 F married H: 150,000 - 169,999 90 M single H: 150,000 - 169,999 100 F I: 170,000 - 189,999 110 F married C: 50,000 - 69,999 120 M single F: 110,000 - 129,999 130 M J: 190,000 - 249,999 140 M married G: 130,000 - 149,999 Because cust_gender,cust_marital_status, and cust_income_level are all low-cardinality columns (there are only three possible values for marital status and region, two possible values for gender, and 12 for income level), bitmap indexes are ideal for these columns. Do not create a bitmap index on cust_id because this is a unique column. Instead, a unique B-tree index on this column provides the most efficient representation and retrieval.
Cluster Use clusters to store one or more tables that: Are an optional method of storing table data. A cluster is a group of tables that share the same data blocks because they share common columns and are often used together. Use clusters to store one or more tables that: 1) Are primarily queried 2) Not predominantly inserted into or updated 3) Which the queries often join data of multiple tables in the cluster.
Cluster Because clusters store related rows of different tables together in the same data blocks, properly used clusters offers these benefits: Disk I/O is reduced for joins of clustered tables. 2) Access time improves for joins of clustered tables. 3) In a cluster, a cluster key value is the value of the cluster key columns for a particular row. Each cluster key value is stored only once each in the cluster and the cluster index, no matter how many rows of different tables contain the value. Therefore, less storage is required to store related table and index data.
CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTFREE 5 TABLESPACE user_data STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121 PCTINCREASE 0 ); CREATE TABLE dept (deptno NUMBER(3) PRIMARY KEY, . . ) CLUSTER emp_dept (deptno); CREATE TABLE emp (empno NUMBER(5) PRIMARY KEY, ename VARCHAR2(15) NOT NULL, . . . deptno NUMBER(3) REFERENCES dept)
A cluster index must be created before any rows can be inserted into any clustered table. The following statement creates a cluster index for the emp_dept cluster: CREATE INDEX emp_dept_index ON CLUSTER emp_dept TABLESPACE user_indx STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121 PCTINCREASE 0) PCTFREE 5;
Viewing Information About Clusters The following views display information about clusters:
Index Function Based Features of Function-Based Indexes Function-based indexes allow you to: 1) More powerful 2) Precompute the value of a computationally intensive function and store it in the index 3) Increase the number of situations where the optimizer can perform a range scan instead of a full table scan You must have the following initialization parameters defined to create a function-based index: *) QUERY_REWRITE_ENABLED set to TRUE *) COMPATIBLE set to 8.1.0.0.0 or a greater value
Index Function Based Example: Function-Based Index for Case Insensitive Searches The following statement creates function-based index idx on table emp based on an uppercase evaluation of the ename column: CREATE INDEX idx ON emp (UPPER(ename)); Now the SELECT statement uses the function-based index on UPPER(ename) To retrieve all employees with names that start with JOH: SELECT * FROM emp WHERE UPPER(ename) LIKE 'JOH%'; Example: Precomputing Arithmetic Expressions with a Function-Based Index This statement creates a function-based index on an expression: CREATE INDEX idx ON t (a + b * (c - 1), a, b); SELECT statements can use either an index range scan (in the following SELECT statement the expression is a prefix of the index) or index full scan (preferable when the index specifies a high degree of parallelism). SELECT a FROM t WHERE a + b * (c - 1) < 100;
Data Transportation: Transportable Tablespaces OLTP Data marts Staging Data warehouse Information distribution
Transporting Tablespaces Make tablespace read-only Export metadata from source Copy data files to target system Transfer export file Import metadata into target If necessary, alter the tablespace to read-write
Exporting and Importing Metadata exp sys/… FILE=s980501.dmp TRANSPORT_TABLESPACE=y TABLESPACES=sales_ts TRIGGERS=N CONSTRAINTS=N imp sys/… FILE=s980501.dmp TRANSPORT_TABLESPACE=y DATAFILES=(/disk1/sales01.dbf, /disk2/sales02.dbf)
Transporting a Tablespace System 1 System 2 s980501.dmp /u/d1 /u/d2 /disk1 /disk2 SALES_TS SALES_TS /u/d1/sales1.dbf /disk1/sales01.dbf /u/d2/sales2.dbf /disk2/sales02.dbf Read Only
Transportable Tablespaces: Uses Moves entire tablespace data Supports media recovery Source and target databases must: Be on the same operating system Run Oracle8i, release 8.1, or above Have the same block size Use the same character set
Checking Transport Set Data dictionary List of tablespaces DBMS_TTS.TRANSPORT_SET_CHECK( ts_list => ’SALES_TS’, incl_constraints => TRUE); Check referential integrity Objects with references to objects outside the transport set into table TRANSPORT_SET_VIOLATIONS
Discoverer Architecture Viewer edition User edition Administration edition End User Layer Business Area Database (OLTP, Data Warehouse, Data Mart) Database Complexity is Hidden From Users Data Base DWH
Tuning
Per ottenere i migliori risultati è consigliabile intraprendere la fase di tuning in parallelo con l’attività di disegno, senza attendere le fasi di sviluppo e messa in produzione del software. Il grafico sotto riportato pone in evidenza il rapporto benefici (Y) / tempo di avvio del tuning (X). Un prospetto con andamento opposto si potrebbe pensare se, in alternativa ai benefici, sulle ordinate (Y) si prendessero in considerazione i costi della attività. Disegno Sviluppo Produzione
Ecco riportati gli step da seguire, esplicitamente nell’ordine, al fine di ottenere una ottima politica di tuning:
Q Q Statement SQL Syntax Validation Terminology Validation Parse Implicit Cursor Terminology Validation Grant / User Validation Execution Plan Bind Q Describe OK Execute KO Q Fetch OK KO
Comando SQL Manuale Interno delle Regole Ottimizzatore Informazioni Statistiche Addizionali Piano di Esecuzione
Table Index Cluster Name Analyze Sample Integer Rows Percent Compute Estimate Delete Statistics; DBMS_UTILITY.ANALYZE_SCHEMA( schema VARCHAR2, method VARCHAR2, estimate_rows NUMBER DEFAULT NULL, estimate_percent NUMBER DEFAULT NULL, method_opt VARCHAR2 DEFAULT NULL);
Per abilitare tale metodologia si può intervenire a livello di D.B di sessione di singolo statement SQL. Parametro di inizializzazione OPTIMIZER_MODE RULE (rule-based) CHOOSE (cost-based) FIRST_ROWS (cost-based - minimizza il response time) ALL_ROWS (cost-based - minimizza il total execution time) Comando ALTER SESSION SET OPTIMIZER_MODE Sql Statement Select /*+ choose */ campi from tavole where condizione;
Select from Plan_Table Comando SQL Plan_Table Ottimizzatore Quale Piano? Dettaglio Select from Plan_Table Piano di Esecuzione
Automated Performance Tuning System La metodica che si pone alla base della strategia è architettata su quattro step distinti: 1 - Individuazione degli aspetti del Data Base che si desidera porre sotto analisi, definizione delle regole e rintraccio dei valori limite. 2 - Collecting data. Acquisizione delle informazioni legate a: database instance schema environment workload (carico di lavoro). 3 - Viewing dei collected data. Prospetto, tramite reportistiche semplici e di immediata comprensione, dei dati precedentemente raccolti. 4 - Analyzing data / generate recommendations. Nel caso in cui i valori limite non siano soddisfatti ecco il sistema intervenire con una serie di utili consigli se non addirittura con un insieme di risoluzioni automatiche poste in essere. Sono controlli che rientrano sotto il nome di "Routine Tuning" da considerarsi come "prevenzione" essendo gli stessi un help per anticipare quelli che potrebbero divenire reali problemi prima chegli stessi si presentino con la propria complessita' risolutiva.
Controllo 1-%Library Cache Misses < 1% select round(sum(reloads)/sum(pins)*100,2) Col1 from v$librarycache; > shared_pool_size Controllo 2 -%Data Dictionary Cache Misses < 10% select round(sum(getmisses)/sum(gets)*100,2) Col1 from v$rowcache; > shared_pool_size
Controllo 3 -Ratio Logico/Fisico Critico > 80% select round(((1-(a.value/(b.value+c.value)))*100),2) Col1 from v$sysstat a, v$sysstat b, v$sysstat c where a.name = 'physical reads' -- accessi fisici and b.name = 'db block gets' -- accessi logici and c.name = 'consistent gets'; -- accessi logici > db_block_buffer
Controllo 4 -Analisi Aree di Sort Parallelamente all'area condivisa di ram (SGA), esistono un insieme di aree dimensionalmente ridotte, non condivise ed in relazione 1:1 con i processi dedicati al supporto delle connessioni utente (PGA) la cui funzionalita' principale e' rintracciabile nel supporto alle politiche di sort e di grouping. Ogni statement che richiama operazioni quali sort e grouping, sfrutta per il raggiungimento dell'obiettivo l'area PGA. Nel caso in cui la stessa risulti non sufficiente per accomodare l'attività, quest'ultima migra sui segmenti temporanei opportunamente creati su disco, con un degrado significativo dei tempi di esecuzione. Non esiste un limite da utilizzarsi come confronto. Viene demandata alla sensibilità del DBA, la decisione su di una rianalisi dei parametri di initSID.ora: sort_area_retained_size dimensione in bytes allocata nella PGA per potenziali SORT sort_area_size dimensione in bytes allocabile nella PGA per sicuri SORT. Sono individuate due soglie (min & max) per cercare di risolvere il maggior numero di attivita' integralmente in ram.
Controllo 4 -Analisi Aree di Sort select name, value from v$sysstat where name in ('sorts (memory)','sorts (disk)');
Controllo 5 -Analisi Oggetti con + 25 extents Il controllo prosegue concentrandosi sugli oggetti (segmenti) con un livello di criticità, un livello di frammentazione (numero di extents associati) elevato e potenzialmente colpevolizzabile di un peggioramento delle performances del sistema. select owner, segment_name, segment_type, tablespace_name, extents from dba_segments where extents > 25 and owner not in ('SYS') order by owner,segment_type, extents desc;
SQL Trace Facility Sono in grado di fornire informazioni preziose, per ogni statement SQL chiamato in causa, generando le seguenti statistiche attive: a livello di sessione (alter session set sql_trace = TRUE) a livello di intera istanza (parametro di configurazione del file initSID.ora sql_trace=true) Numero di parse, execute e fetch Tempo di CPU e tempo di elapsed (trascorso) Numero di letture logiche e letture fisiche Numero di record processati archiviate, in formato interno, in un file denominato trace file
Al fine di prospettare le informazione storicizzate nel file di trace sopra descritto, occorre dare in pasto lo stesso archivio ad un formattatore denominato TKPROF la cui sintassi di avvio risulta essere: TKPROF file_trace_input file_output SORT=(option1,option2,……..) EXECPU , EXEELA, EXEROW PRINT=integer INSERT=file_scripts_sql_output SYS=boolean TABLE=schema.table EXPLAIN=user/password RECORD= file_record_output
Analizziamo in dettaglio le tecniche da utilizzarsi al fine di interpretare l’output del TKPROF di cui a seguire viene prospettato un esempio. call count cpu elapsed disk query current rows ---- ------- ------- --------- -------- -------- ------- ------ Parse 11 0.08 0.18 0 0 0 0 Execute 11 0.23 0.66 0 3 6 2 Fetch 35 6.70 6.83 100 12326 2 824 --------------------------------------------------------------- Total 57 7.01 7.67 100 12329 8 826