Oracle Utility: SQL*Loader Control File Input Data Files SQL*Loader Log File Discard Files Bad Files Tables Indexes Data Base
Data Base Record KO OK SQL*Loader Field Processing Bad SQL*Loader When Clause Discard SQL*Loader DML Inserting Data Base
sqlldr ... Valid Keywords: Userid -- ORACLE username/password control -- Control file name log -- Log file name bad -- Bad file name data -- Data file name discard -- Discard file name discardmax -- Number of discards to allow (Default all) skip -- Number of logical records to skip (Default 0) load -- Number of logical records to load (Default all) errors -- Number of errors to allow (Default 50) rows -- Number of rows in conventional path bind array or between direct path data saves (Default: Conventional path 64, Direct path all) silent -- Suppress messages during run (header,feedback,errors,discards,partitions) direct -- use direct path (Default FALSE) parfile -- parameter file: name of file that contains parameter specifications
-- Loads EMP records from first 23 characters -- Creates and loads PROJ records for each PROJNO listed -- for each employee LOAD DATA INFILE ’ulcase5.dat’ BADFILE ’ulcase5.bad’ DISCARDFILE ’ulcase5.dsc’ REPLACE INTO TABLE emp (empno POSITION( 1: 4) INTEGER EXTERNAL, ename POSITION( 6:15) CHAR, deptno POSITION(17:18) CHAR, mgr POSITION(20:23) INTEGER EXTERNAL) INTO TABLE proj -- PROJ has two columns, both not null: EMPNO and PROJNO WHEN projno != ’ ’ projno POSITION(25:27) INTEGER EXTERNAL) -- 1st proj projno POSITION(29:31) INTEGER EXTERNAL) -- 2nd proj projno POSITION(33:35) INTEGER EXTERNAL) -- 3rd proj
External Tables Concepts The Oracle9i external tables feature is a complement to existing SQL*Loader functionality. It allows you to access data in external sources as if it were in a table in the database. External tables are read-only. No data manipulation language (DML) operations or index creation is allowed on an external table. Therefore, SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table. To use the external tables feature, you must have some knowledge of the file format and record format of the datafiles on your platform. You must also know enough about SQL to be able to create an external table and execute DML statements that access the external table.
CREATE TABLE empxt (empno NUMBER(4), ename VARCHAR2(10), job VARCHAR2(9), mgr NUMBER(4) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY dat_dir ACCESS PARAMETERS records delimited by newline badfile dat_dir:'empxt%a_%p.bad' logfile dat_dir:'empxt%a_%p.log' fields terminated by ',' missing field values are null ( empno, ename, job, mgr ) LOCATION ('empxt1.dat') REJECT LIMIT UNLIMITED;
Data Warehousing
DataWarehouse Discoverer Development Tool Forms& Reports PL/SQL Pro*C S.Q.L.
fasi del processo di integrazione... di piattaforme e applicazioni: Intranet Integrazione di dati: Data Warehouse OLAP su Web Integrazione di dati: Data Warehouse
Data Warehouse Definition Un DWH raccoglie i dati da diverse sorgenti trasformandoli in un formato consistente ed omogeneo. Ogni Punto nel “Cubo” contiene “Fact Data” per una particolare combinazione di n “Dimension Data”. Nel caso a 3 dimensioni riportato a lato in particolare troviamo dati statistici organizzati per Product, Market, Time.
Data Warehouse Data Mart data entry Extraction, Transformation Loading reports integrati ed analisi on-line interrogazioni e reports Ambiente di reporting e Decision Support System O.L.A.P. Sistema transazionale Strumenti per l’analisi on-line dei dati con caratteristiche di efficienza, facilità d’uso ed elaborazione grafica O.L.T.P.
Sistema transazionale Reporting tools Sistema transazionale Decision Support System Database operazionale Data Warehouse Meta Data Estrazione Trasformazione Trasporto
Architettura tipica di un DWH
Cosa è successo? Perchè è successo? Cosa accadrebbe se? Oracle Reports IT Sviluppa Utenti Consultano Quanto abbiamo venduto l’anno scorso? Utenti evoluti Chi ha contribuito all’aumento delle vendite? Analisti Quanto si prevede di vendere quest’anno in base al trend attuale? Cosa è successo? Perchè è successo? Cosa accadrebbe se? Tactical Strategic Operational Reporting Standard Analisi e Domande ad hoc Analisi Avanzate Oracle Reports Oracle Discoverer Oracle Express
differenti requisiti di hardware, storage e tuning Differenze di un data warehouse rispetto ad un database operazionale: differenti requisiti di hardware, storage e tuning differente modellazione dei dati diversi strumenti di interrogazione e reporting
Data Mart Schema composto da una o più tabelle dei FATTI contenenti l’elemento quantitativo e diverse tabelle delle DIMESIONI rappresentanti un tipo di gerarchia. Schema a stella (star schema) Più stelle collegate prendono il nome di costellazioni
Star queries Il processo di join di Oracle per N tabelle è quello di creare dei risultati intermedi considerando in join due tabelle per volta (pair-wise). Con l’ottimizzazione delle star query Oracle si riserva di mettere in join la tabella dei fatti per ultimo essendo quella di maggiori dimensioni.
Schema a Stella (Star Schema)
Star Schema Example Oracle9i
Data Warehousing con ORACLE Viste materializzate / Snapshots Tabelle partizionate Tabelle Organizzate ad Indice Cluster Star query Indici Function Based Read only tablespaces
Oracle Architectural Components (Logical) DataBase Tablespace DataFile Oracle Block O.S. Block Partition Owner Schema Segment Extent Table Index Cluster Snapshot
CREATE TABLE sales ( invoice_no CREATE TABLE sales ( invoice_no NUMBER NOT NULL, sale_year NUMBER NOT NULL, sale_month NUMBER NOT NULL, sale_day NUMBER NOT NULL ) PARTITION BY RANGE (sale_year, sale_month, sale_day) ( PARTITION sales_q1 VALUES LESS THAN (1998, 04, 01) TABLESPACE tsa STORAGE (……….), PARTITION sales_q2 VALUES LESS THAN (1998, 07, 01) TABLESPACE tsb STORAGE (……….), PARTITION sales_q3 VALUES LESS THAN (1998, 10, 01) TABLESPACE tsc STORAGE (……….), PARTITION sales_q4 VALUES LESS THAN (1999, 01, 01) TABLESPACE tsd STORAGE (……….) );
Maintaining Partitions This section describes how to perform the following specific partition maintenance operations: Moving Partitions Adding Partitions Dropping Partitions Coalescing Partitions Modifying Partition Default Attributes Truncating Partitions Splitting Partitions Merging Partitions Exchanging Table Partitions
Index-Organized Tables Regular table access IOT access ROWID Non-key columns Key column Row header
Index-Organized Tables Compared with Regular Tables Faster key-based access to table data Reduced storage requirements Main restrictions: Must have a primary key Cannot use unique constraints Cannot be clustered
Creating Index-Organized Tables SQL> create table sales 2 (office_cd number(3) 3 ,qtr_end date 4 ,revenue number(10,2) 5 ,constraint sales_pk 6 PRIMARY KEY (office_cd,qtr_end) 7 ) 8 ORGANIZATION INDEX 9 tablespace indx 10 storage (…);
Cluster Use clusters to store one or more tables that: 1) Are primarily queried 2) Not predominantly inserted into or updated 3) Which the queries often join data of multiple tables in the cluster.
CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTFREE 5 TABLESPACE users STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121 PCTINCREASE 0 ); CREATE TABLE dept (deptno NUMBER(3) PRIMARY KEY, . . ) CLUSTER emp_dept (deptno); CREATE TABLE emp (empno NUMBER(5) PRIMARY KEY, ename VARCHAR2(15) NOT NULL, . . . deptno NUMBER(3) REFERENCES dept)
CREATE INDEX emp_dept_index ON CLUSTER emp_dept TABLESPACE users STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121 PCTINCREASE 0) PCTFREE 5;
Index Function Based Features of Function-Based Indexes Function-based indexes allow you to: 1) More powerful 2) Precompute the value of a computationally intensive function and store it in the index 3) Increase the number of situations where the optimizer can perform a range scan instead of a full table scan You must have the following initialization parameters defined to create a function-based index: *) QUERY_REWRITE_ENABLED set to TRUE *) COMPATIBLE set to 8.1.0.0.0 or a greater value
Index Function Based Example: Function-Based Index for Case Insensitive Searches The following statement creates function-based index idx on table emp based on an uppercase evaluation of the ename column: CREATE INDEX idx ON emp (UPPER(ename)); Now the SELECT statement uses the function-based index on UPPER(ename) To retrieve all employees with names that start with JOH: SELECT * FROM emp WHERE UPPER(ename) LIKE 'JOH%'; Example: Precomputing Arithmetic Expressions with a Function-Based Index This statement creates a function-based index on an expression: CREATE INDEX idx ON t (a + b * (c - 1), a, b); SELECT statements can use either an index range scan (in the following SELECT statement the expression is a prefix of the index) or index full scan (preferable when the index specifies a high degree of parallelism). SELECT a FROM t WHERE a + b * (c - 1) < 100;
Discoverer Architecture Viewer edition User edition Administration edition End User Layer Business Area Database (OLTP, Data Warehouse, Data Mart) Database Complexity is Hidden From Users Data Base DWH
Data Warehouse Tools Low-end OLAP Tools per Simple Queries e Reports ad hoc Adeguati per rispondere a domande della tipologia “che cosa?” Powerful Multi-Dimensional Analysis OLAP Tools Supportano “drill down” in “detail data” per rispondere a domande della tipologia “perche?” e “come?” 1. The tools should be fast ( i.e., targeted to deliver most responses to users within about five seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds). 2. The tools can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user. The tools should allow the user to define new ad hoc calculations as part of the analysis and to report on the data in any desired way, without having to program. 3. The tool should operated in a shared mode and implement all the security requirements for confidentiality (possibly down to cell level). 4. The system must provide a multidimensional conceptual view of the data, including full support for hierarchies and multiple hierarchies. 5. The tool can handle all the data and derived information needed, wherever it is and however much is relevant for the application. The tool should be graded on how much input data it can handle, not how many Gigabytes needed to store it.
OLAP Tools Slicing del “Cubo” Pivot Ruotare l’approccio visuale di un punto del cubo stesso al fine di ottenere una nuova prospettiva Drill-Down Slicing a Data Cube
Data Reconciliation Steps
Data Reconciliation Process Capture Static - initial load Incremental - ongoing update Scrub or data cleansing Pattern recognition and other artificial intelligence techniques Transform Convert the data format from the source to the target system Record-Level Functions Selection Joining Aggregation (for data marts) Field-Level Functions Single-field transformation Multi-field transformationn Errors and inconsistencies that are commonly found when scrubbing operational data: Misspelled names and addresses. Impossible or erroneous dates of birth. Fields used for purposes for which they were never intended. Mismatched addresses and area codes. Missing data.