10 tricks for working with Oracle

There are several Oracle practices at Sberbank that you might find useful. I think some of it is familiar to you, but we use not only ETL tools to load, but also Oracle stored procedures. Oracle PL / SQL implements the most complex algorithms for loading data into storages, where you need to β€œfeel every byte”.



  • Automatic compilation logging
  • What to do if you want to make a view with parameters
  • Using dynamic statistics in queries
  • How to save query plan when inserting data via database link
  • Running procedures in parallel sessions
  • Pulling through leftovers
  • Combining multiple stories into one
  • Normalizer
  • Rendering in SVG format
  • Oracle Metadata Search Application


Automatic compilation logging



On some Oracle databases, Sberbank has a compilation trigger that remembers who, when and what changed in the code of server objects. Thus, the author of changes can be established from the compilation log table. A version control system is also automatically implemented. In any case, if the programmer forgot to submit the changes to Git, then this mechanism will hedge. Let's describe an example of implementation of such a system of automatic compilation logging. One of the simplified versions of the compilation trigger that writes to the log in the form of the ddl_changes_log table looks like this:



create table DDL_CHANGES_LOG
(
  id               INTEGER,
  change_date      DATE,
  sid              VARCHAR2(100),
  schemaname       VARCHAR2(30),
  machine          VARCHAR2(100),
  program          VARCHAR2(100),
  osuser           VARCHAR2(100),
  obj_owner        VARCHAR2(30),
  obj_type         VARCHAR2(30),
  obj_name         VARCHAR2(30),
  previous_version CLOB,
  changes_script   CLOB
);

create or replace trigger trig_audit_ddl_trg
  before ddl on database
declare
  v_sysdate              date;
  v_valid                number;
  v_previous_obj_owner   varchar2(30) := '';
  v_previous_obj_type    varchar2(30) := '';
  v_previous_obj_name    varchar2(30) := '';
  v_previous_change_date date;
  v_lob_loc_old          clob := '';
  v_lob_loc_new          clob := '';
  v_n                    number;
  v_sql_text             ora_name_list_t;
  v_sid                  varchar2(100) := '';
  v_schemaname           varchar2(30) := '';
  v_machine              varchar2(100) := '';
  v_program              varchar2(100) := '';
  v_osuser               varchar2(100) := '';
begin
  v_sysdate := sysdate;
  -- find whether compiled object already presents and is valid
  select count(*)
    into v_valid
    from sys.dba_objects
   where owner = ora_dict_obj_owner
     and object_type = ora_dict_obj_type
     and object_name = ora_dict_obj_name
     and status = 'VALID'
     and owner not in ('SYS', 'SPOT', 'WMSYS', 'XDB', 'SYSTEM')
     and object_type in ('TRIGGER', 'PROCEDURE', 'FUNCTION', 'PACKAGE', 'PACKAGE BODY', 'VIEW');
  -- find information about previous compiled object
  select max(obj_owner) keep(dense_rank last order by id),
         max(obj_type) keep(dense_rank last order by id),
         max(obj_name) keep(dense_rank last order by id),
         max(change_date) keep(dense_rank last order by id)
    into v_previous_obj_owner, v_previous_obj_type, v_previous_obj_name, v_previous_change_date
    from ddl_changes_log;
  -- if compile valid object or compile invalid package body broken by previous compilation of package then log it
  if (v_valid = 1 or v_previous_obj_owner = ora_dict_obj_owner and
     (v_previous_obj_type = 'PACKAGE' and ora_dict_obj_type = 'PACKAGE BODY' or
     v_previous_obj_type = 'PACKAGE BODY' and ora_dict_obj_type = 'PACKAGE') and
     v_previous_obj_name = ora_dict_obj_name and
     v_sysdate - v_previous_change_date <= 1 / 24 / 60 / 2) and
     ora_sysevent in ('CREATE', 'ALTER') then
    -- store previous version of object (before compilation) from dba_source or dba_views in v_lob_loc_old
    if ora_dict_obj_type <> 'VIEW' then
      for z in (select substr(text, 1, length(text) - 1) || chr(13) || chr(10) as text
                  from sys.dba_source
                 where owner = ora_dict_obj_owner
                   and type = ora_dict_obj_type
                   and name = ora_dict_obj_name
                 order by line) loop
        v_lob_loc_old := v_lob_loc_old || z.text;
      end loop;
    else
      select sys.dbms_metadata_util.long2clob(v.textlength, 'SYS.VIEW$', 'TEXT', v.rowid) into v_lob_loc_old
        from sys."_CURRENT_EDITION_OBJ" o, sys.view$ v, sys.user$ u
       where o.obj# = v.obj#
         and o.owner# = u.user#
         and u.name = ora_dict_obj_owner
         and o.name = ora_dict_obj_name;
    end if;
    -- store new version of object (after compilation) from v_sql_text in v_lob_loc_new
    v_n := ora_sql_txt(v_sql_text);
    for i in 1 .. v_n loop
      v_lob_loc_new := v_lob_loc_new || replace(v_sql_text(i), chr(10), chr(13) || chr(10));
    end loop;
    -- find information about session that changed this object
    select max(to_char(sid)), max(schemaname), max(machine), max(program), max(osuser)
      into v_sid, v_schemaname, v_machine, v_program, v_osuser
      from v$session
     where audsid = userenv('sessionid');
    -- store changes in ddl_changes_log
    insert into ddl_changes_log
      (id, change_date, sid, schemaname, machine, program, osuser,
       obj_owner, obj_type, obj_name, previous_version, changes_script)
    values
      (seq_ddl_changes_log.nextval, v_sysdate, v_sid, v_schemaname, v_machine, v_program, v_osuser,
       ora_dict_obj_owner, ora_dict_obj_type, ora_dict_obj_name, v_lob_loc_old, v_lob_loc_new);
  end if;
exception
  when others then
    null;
end;


In this trigger, the name and new contents of the compiled object are obtained, supplemented with the previous contents from the data dictionary, and written to the change log.



What to do if you want to make a view with parameters



Such a desire can often be visited by a developer at Oracle. Why is it possible to create a procedure or function with parameters, but there are no views with input parameters that can be used in calculations? Oracle has something to replace this missing, in our opinion, concept.

Let's look at an example. Let there be a table with sales by division for each day.



create table DIVISION_SALES
(
  division_id INTEGER,
  dt          DATE,
  sales_amt   NUMBER
);


This query compares sales by department over two days. In this case, 04/30/2020 and 09/11/2020.



select t1.division_id,
       t1.dt          dt1,
       t2.dt          dt2,
       t1.sales_amt   sales_amt1,
       t2.sales_amt   sales_amt2
  from (select dt, division_id, sales_amt
          from division_sales
         where dt = to_date('30.04.2020', 'dd.mm.yyyy')) t1,
       (select dt, division_id, sales_amt
          from division_sales
         where dt = to_date('11.09.2020', 'dd.mm.yyyy')) t2
 where t1.division_id = t2.division_id;


Here is a view I would like to write to summarize such a request. I would like to pass dates as parameters. However, the syntax does not allow for this.



create or replace view vw_division_sales_report(in_dt1 date, in_dt2 date) as
select t1.division_id,
       t1.dt          dt1,
       t2.dt          dt2,
       t1.sales_amt   sales_amt1,
       t2.sales_amt   sales_amt2
  from (select dt, division_id, sales_amt
          from division_sales
         where dt = in_dt1) t1,
       (select dt, division_id, sales_amt
          from division_sales
         where dt = in_dt2) t2
 where t1.division_id = t2.division_id;


Such a workaround is suggested. Let's create a type for the line from this view.



create type t_division_sales_report as object
(
  division_id INTEGER,
  dt1         DATE,
  dt2         DATE,
  sales_amt1  NUMBER,
  sales_amt2  NUMBER
);


And we will create a type for a table from such strings.



create type t_division_sales_report_table as table of t_division_sales_report;


Instead of a view, let's write a pipelined function with date input parameters.



create or replace function func_division_sales(in_dt1 date, in_dt2 date)
  return t_division_sales_report_table
  pipelined as
begin
  for z in (select t1.division_id,
                   t1.dt          dt1,
                   t2.dt          dt2,
                   t1.sales_amt   sales_amt1,
                   t2.sales_amt   sales_amt2
              from (select dt, division_id, sales_amt
                      from division_sales
                     where dt = in_dt1) t1,
                   (select dt, division_id, sales_amt
                      from division_sales
                     where dt = in_dt2) t2
             where t1.division_id = t2.division_id) loop
    pipe row(t_division_sales_report(z.division_id,
                                     z.dt1,
                                     z.dt2,
                                     z.sales_amt1,
                                     z.sales_amt2));
  end loop;
end;


You can refer to it like this:



select *
  from table(func_division_sales(to_date('30.04.2020', 'dd.mm.yyyy'),
                                 to_date('11.09.2020', 'dd.mm.yyyy')));


This request will give us the same result as the request at the beginning of this post with explicitly substituted dates.

Pipelined functions can also be useful when you need to pass a parameter inside a complex request.

For example, consider a complex view in which the field1, by which you want to filter data, is hidden somewhere deep in the view.



create or replace view complex_view as
 select field1, ...
   from (select field1, ...
           from (select field1, ... from deep_table), table1
          where ...),
        table2
  where ...;


And a query from a view with a fixed value of field1 may have a bad execution plan.



select field1, ... from complex_view
 where field1 = 'myvalue';


Those. instead of first filtering deep_table by the condition field1 = 'myvalue', the query can first join all tables, processing an unnecessarily large amount of data, and then filter the result by the condition field1 = 'myvalue'. This complexity can be avoided by making a function with a parameter assigned to field1 instead of a pipelined view.



Using dynamic statistics in queries



It happens that the same query in the Oracle database processes each time a different amount of data in the tables and subqueries used in it. How do you get the optimizer to figure out which way to join tables this time and which indexes to use each time? Consider, for example, a query that connects a portion of the account balances that have changed since the last download to the account directory. The portion of the changed account balances varies greatly from download to download, amounting to hundreds of lines, sometimes millions of lines. Depending on the size of this portion, it is required to combine the changed balances with accounts either by the / * + use_nl * / method, or by the / * + use_hash * / method. It is inconvenient to re-collect statistics every time, especially if the number of rows changes from load to load not in the joined table, but in the joined subquery.The hint / * + dynamic_sampling () * / can come to the rescue here. Let's show how it affects, using an example request. Let the table change_balances contain changes in balances, and accounts - the directory of accounts. We join these tables by the account_id fields available in each of the tables. At the beginning of the experiment, we will write more rows in these tables and will not change their contents.

First, let's take 10% of the changes in the residuals in the change_balances table and see what the plan will be using dynamic_sampling:



SQL> EXPLAIN PLAN
  2   SET statement_id = 'test1'
  3   INTO plan_table
  4  FOR  with c as
  5   (select /*+ dynamic_sampling(change_balances 2)*/
  6     account_id, balance_amount
  7      from change_balances
  8     where mod(account_id, 10) = 0)
  9  select a.account_id, a.account_number, c.balance_amount
 10    from c, accounts a
 11   where c.account_id = a.account_id;

Explained.

SQL>
SQL> SELECT * FROM table (DBMS_XPLAN.DISPLAY);
Plan hash value: 874320301

----------------------------------------------------------------------------------------------
| Id  | Operation          | Name            | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                 |  9951K|   493M|       |   140K  (1)| 00:28:10 |
|*  1 |  HASH JOIN         |                 |  9951K|   493M|  3240K|   140K  (1)| 00:28:10 |
|*  2 |   TABLE ACCESS FULL| CHANGE_BALANCES |   100K|  2057K|       |  7172   (1)| 00:01:27 |
|   3 |   TABLE ACCESS FULL| ACCOUNTS        |    10M|   295M|       |   113K  (1)| 00:22:37 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ACCOUNT_ID"="A"."ACCOUNT_ID")
   2 - filter(MOD("ACCOUNT_ID",10)=0)

Note
-----
   - dynamic sampling used for this statement (level=2)

20 rows selected.


So, we see that it is proposed to go through the change_balances and accounts tables using a full scan and join them using a hash join.

Now let's drastically reduce the sample from change_balances. Let's take 0.1% of the residual changes and see what the plan will be using dynamic_sampling:



SQL> EXPLAIN PLAN
  2   SET statement_id = 'test2'
  3   INTO plan_table
  4  FOR  with c as
  5   (select /*+ dynamic_sampling(change_balances 2)*/
  6     account_id, balance_amount
  7      from change_balances
  8     where mod(account_id, 1000) = 0)
  9  select a.account_id, a.account_number, c.balance_amount
 10    from c, accounts a
 11   where c.account_id = a.account_id;

Explained.

SQL>
SQL> SELECT * FROM table (DBMS_XPLAN.DISPLAY);
Plan hash value: 2360715730

-------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                        | 73714 |  3743K| 16452   (1)| 00:03:18 |
|   1 |  NESTED LOOPS                |                        |       |       |            |          |
|   2 |   NESTED LOOPS               |                        | 73714 |  3743K| 16452   (1)| 00:03:18 |
|*  3 |    TABLE ACCESS FULL         | CHANGE_BALANCES        |   743 | 15603 |  7172   (1)| 00:01:27 |
|*  4 |    INDEX RANGE SCAN          | IX_ACCOUNTS_ACCOUNT_ID |   104 |       |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| ACCOUNTS               |    99 |  3069 |   106   (0)| 00:00:02 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(MOD("ACCOUNT_ID",1000)=0)
   4 - access("ACCOUNT_ID"="A"."ACCOUNT_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)

22 rows selected.


This time, the accounts table is attached to the change_balances table with nested loops and an index is used to read rows from accounts.

If the dynamic_sampling hint is removed, then in the second case the plan will remain the same as in the first case, and this is not optimal.

Details on the dynamic_sampling hint and possible values ​​for its numeric argument can be found in the documentation.



How to save query plan when inserting data via database link



We solve this problem. The data source server has tables that need to be joined and loaded into the data warehouse. Suppose a view is written on the source server, which contains all the necessary ETL transformation logic. The view is written optimally, it contains hints for the optimizer that suggest how to join tables and which indexes to use. On the server side of the data warehouse, you need to do a simple thing - insert the data from the view into the target table. And here difficulties can arise. If you insert into the target table with a command like



insert into dwh_table
  (field1, field2)
  select field1, field2 from vw_for_dwh_table@xe_link;


, then all the logic of the query plan contained in the view from which we read data via the database link can be ignored. All hints embedded in this view can be ignored.



SQL> EXPLAIN PLAN
  2   SET statement_id = 'test'
  3   INTO plan_table
  4  FOR  insert into dwh_table
  5    (field1, field2)
  6    select field1, field2 from vw_for_dwh_table@xe_link;

Explained.

SQL>
SQL> SELECT * FROM table (DBMS_XPLAN.DISPLAY);
Plan hash value: 1788691278

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                  |     1 |  2015 |     2   (0)| 00:00:01 |        |      |
|   1 |  LOAD TABLE CONVENTIONAL | DWH_TABLE        |       |       |            |          |        |      |
|   2 |   REMOTE                 | VW_FOR_DWH_TABLE |     1 |  2015 |     2   (0)| 00:00:01 | XE_LI~ | R->S |
-------------------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------

   2 - SELECT /*+ OPAQUE_TRANSFORM */ "FIELD1","FIELD2" FROM "VW_FOR_DWH_TABLE" "VW_FOR_DWH_TABLE"
       (accessing 'XE_LINK' )


16 rows selected.


To save the query plan in the view, you can use the insertion of data into the target table from the cursor:



declare
  cursor cr is
    select field1, field2 from vw_for_dwh_table@xe_link;
  cr_row cr%rowtype;
begin
  open cr;
  loop
    fetch cr
      into cr_row;
    insert into dwh_table
      (field1, field2)
    values
      (cr_row.field1, cr_row.field2);
    exit when cr%notfound;
  end loop;
  close cr;
end;


Query from cursor



select field1, field2 from vw_for_dwh_table@xe_link;


as opposed to insert



insert into dwh_table
  (field1, field2)
  select field1, field2 from vw_for_dwh_table@xe_link;


will save the plan of the request, laid in the view on the source server.



Running procedures in parallel sessions



Often the task is to start several parallel calculations from some parent procedure and, after waiting for each of them to complete, continue executing the parent procedure. This can be useful in parallel computing if the server resources allow it. There are many ways to do this.

Let's describe a very simple implementation of such a mechanism. Parallel procedures will be performed in parallel β€œone-time” jobs, while the parent procedure will wait in a loop for the completion of all these jobs.

Let's create tables with metadata for this mechanism. To begin with, let's make a table with groups of procedures running in parallel:



create table PARALLEL_PROC_GROUP_LIST
(
  group_id   INTEGER,
  group_name VARCHAR2(4000)
);
comment on column PARALLEL_PROC_GROUP_LIST.group_id
  is '    ';
comment on column PARALLEL_PROC_GROUP_LIST.group_name
  is '    ';


Next, we will create a table with scripts that will be executed in parallel in groups. The filling of this table can be either static or created dynamically:



create table PARALLEL_PROC_LIST
(
  group_id    INTEGER,
  proc_script VARCHAR2(4000),
  is_active   CHAR(1) default 'Y'
);
comment on column PARALLEL_PROC_LIST.group_id
  is '    ';
comment on column PARALLEL_PROC_LIST.proc_script
  is 'Pl/sql    ';
comment on column PARALLEL_PROC_LIST.is_active
  is 'Y - active, N - inactive.          ';


And we will make a log table, where we will collect a log of which procedure was launched in which job:



create table PARALLEL_PROC_LOG
(
  run_id      INTEGER,
  group_id    INTEGER,
  proc_script VARCHAR2(4000),
  job_id      INTEGER,
  start_time  DATE,
  end_time    DATE
);
comment on column PARALLEL_PROC_LOG.run_id
  is '   run_in_parallel';
comment on column PARALLEL_PROC_LOG.group_id
  is '    ';
comment on column PARALLEL_PROC_LOG.proc_script
  is 'Pl/sql    ';
comment on column PARALLEL_PROC_LOG.job_id
  is 'Job_id ,      ';
comment on column PARALLEL_PROC_LOG.start_time
  is '  ';
comment on column PARALLEL_PROC_LOG.end_time
  is '  ';

create sequence Seq_Parallel_Proc_Log;


Now let's give the code of the procedure for starting parallel streams:



create or replace procedure run_in_parallel(in_group_id integer) as
  --        parallel_proc_list.
  --  -    parallel_proc_list
  v_run_id             integer;
  v_job_id             integer;
  v_job_id_list        varchar2(32767);
  v_job_id_list_ext    varchar2(32767);
  v_running_jobs_count integer;
begin
  select seq_parallel_proc_log.nextval into v_run_id from dual;
  -- submit jobs with the same parallel_proc_list.in_group_id
  -- store seperated with ',' JOB_IDs in v_job_id_list
  v_job_id_list     := null;
  v_job_id_list_ext := null;
  for z in (select pt.proc_script
              from parallel_proc_list pt
             where pt.group_id = in_group_id
               and pt.is_active = 'Y') loop
    dbms_job.submit(v_job_id, z.proc_script);
    insert into parallel_proc_log
      (run_id, group_id, proc_script, job_id, start_time, end_time)
    values
      (v_run_id, in_group_id, z.proc_script, v_job_id, sysdate, null);
    v_job_id_list     := v_job_id_list || ',' || to_char(v_job_id);
    v_job_id_list_ext := v_job_id_list_ext || ' union all select ' ||
                         to_char(v_job_id) || ' job_id from dual';
  end loop;
  commit;
  v_job_id_list     := substr(v_job_id_list, 2);
  v_job_id_list_ext := substr(v_job_id_list_ext, 12);
  -- loop while not all jobs finished
  loop
    -- set parallel_proc_log.end_time for finished jobs
    execute immediate 'update parallel_proc_log set end_time = sysdate where job_id in (' ||
                      v_job_id_list_ext ||
                      ' minus select job from user_jobs where job in (' ||
                      v_job_id_list ||
                      ') minus select job_id from parallel_proc_log where job_id in (' ||
                      v_job_id_list || ') and end_time is not null)';
    commit;
    -- check whether all jobs finished
    execute immediate 'select count(1) from user_jobs where job in (' ||
                      v_job_id_list || ')'
      into v_running_jobs_count;
    -- if all jobs finished then exit
    exit when v_running_jobs_count = 0;
    -- sleep a little
    sys.dbms_lock.sleep(0.1);
  end loop;
end;


Let's check how the run_in_parallel procedure works. Let's create a test procedure that we will call in parallel sessions.



create or replace procedure sleep(in_seconds integer) as
begin
  sys.Dbms_Lock.Sleep(in_seconds);
end;


Fill in the name of the group and the table with scripts that will be executed in parallel.



insert into PARALLEL_PROC_GROUP_LIST(group_id, group_name) values(1, ' ');

insert into PARALLEL_PROC_LIST(group_id, proc_script, is_active) values(1, 'begin sleep(5); end;', 'Y');
insert into PARALLEL_PROC_LIST(group_id, proc_script, is_active) values(1, 'begin sleep(10); end;', 'Y');


Let's start a group of parallel procedures.



begin
  run_in_parallel(1);
end;


When finished, let's see the log.



select * from PARALLEL_PROC_LOG;


RUN_ID GROUP_ID PROC_SCRIPT JOB_ID START_TIME END_TIME
1 1 begin sleep (5); end; 1 09/11/2020 15:00:51 09/11/2020 15:00:56
1 1 begin sleep (10); end; 2 09/11/2020 15:00:51 09/11/2020 15:01:01


We see that the execution time of the test procedure instances meets expectations.



Pulling through leftovers



Let us describe a variant of solving a fairly typical banking problem of β€œpulling the balance”. Let's say there is a table of facts of changes in account balances. It is required to indicate the current account balance for each day of the calendar (the last one for the day). Such information is often needed in data warehouses. If on some day there were no movements in the count, then you need to repeat the last known remainder. If the amount of data and computing power of the server allows, then you can solve this problem using an SQL query, without even resorting to PL / SQL. The last_value (* ignore nulls) over (partition by * order by *) function will help us in this, which will stretch the last known remainder to subsequent dates in which there were no changes.

Let's create a table and fill it with test data.



create table ACCOUNT_BALANCE
(
  dt           DATE,
  account_id   INTEGER,
  balance_amt  NUMBER,
  turnover_amt NUMBER
);
comment on column ACCOUNT_BALANCE.dt
  is '     ';
comment on column ACCOUNT_BALANCE.account_id
  is ' ';
comment on column ACCOUNT_BALANCE.balance_amt
  is '  ';
comment on column ACCOUNT_BALANCE.turnover_amt
  is '  ';

insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('01.01.2020 00:00:00','dd.mm.yyyy hh24.mi.ss'), 1, 23, 23);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('05.01.2020 01:00:00','dd.mm.yyyy hh24.mi.ss'), 1, 45, 22);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('05.01.2020 20:00:00','dd.mm.yyyy hh24.mi.ss'), 1, 44, -1);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('05.01.2020 00:00:00','dd.mm.yyyy hh24.mi.ss'), 2, 67, 67);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('05.01.2020 20:00:00','dd.mm.yyyy hh24.mi.ss'), 2, 77, 10);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('07.01.2020 00:00:00','dd.mm.yyyy hh24.mi.ss'), 2, 72, -5);


The query below solves our problem. The subquery 'cld' contains the calendar of dates, in the subquery 'ab' we group the balances for each day, in the subquery 'a' we remember the list of all accounts and the start date of history for each account, in the subquery 'pre' for each account we compose a calendar of days from the beginning of it stories. The final request adds the last balances for each day to the calendar of active days of each account and extends them to the days in which there were no changes.



with cld as
 (select /*+ materialize*/
   to_date('01.01.2020', 'dd.mm.yyyy') + level - 1 dt
    from dual
  connect by level <= 10),
ab as
 (select trunc(dt) dt,
         account_id,
         max(balance_amt) keep(dense_rank last order by dt) balance_amt,
         sum(turnover_amt) turnover_amt
    from account_balance
   group by trunc(dt), account_id),
a as
 (select min(dt) min_dt, account_id from ab group by account_id),
pre as
 (select cld.dt, a.account_id from cld left join a on cld.dt >= a.min_dt)
select pre.dt,
       pre.account_id,
       last_value(ab.balance_amt ignore nulls) over(partition by pre.account_id order by pre.dt) balance_amt,
       nvl(ab.turnover_amt, 0) turnover_amt
  from pre
  left join ab
    on pre.dt = ab.dt
   and pre.account_id = ab.account_id
 order by 2, 1;


The result of the query is as expected.

DT ACCOUNT_ID BALANCE_AMT TURNOVER_AMT
01/01/2020 1 23 23
02.01.2020 1 23 0
03/01/2020 1 23 0
04/01/2020 1 23 0
01/05/2020 1 44 21
06.01.2020 1 44 0
07.01.2020 1 44 0
01/08/2020 1 44 0
09/01/2020 1 44 0
10.01.2020 1 44 0
01/05/2020 2 77 77
06.01.2020 2 77 0
07.01.2020 2 72 -five
01/08/2020 2 72 0
09/01/2020 2 72 0
10.01.2020 2 72 0


Combining multiple stories into one



When loading data into storages, the problem is often solved when you need to build a single history for an entity, having separate history of the attributes of this entity that came from various sources. Suppose there is some entity with a primary key primary_key_id, about which the history (start_dt - end_dt) of its three different attributes is known, located in three different tables.



create table HIST1
(
  primary_key_id INTEGER,
  start_dt       DATE,
  attribute1     NUMBER
);

insert into HIST1(primary_key_id, start_dt, attribute1) values(1, to_date('2014-01-01','yyyy-mm-dd'), 7);
insert into HIST1(primary_key_id, start_dt, attribute1) values(1, to_date('2015-01-01','yyyy-mm-dd'), 8);
insert into HIST1(primary_key_id, start_dt, attribute1) values(1, to_date('2016-01-01','yyyy-mm-dd'), 9);
insert into HIST1(primary_key_id, start_dt, attribute1) values(2, to_date('2014-01-01','yyyy-mm-dd'), 17);
insert into HIST1(primary_key_id, start_dt, attribute1) values(2, to_date('2015-01-01','yyyy-mm-dd'), 18);
insert into HIST1(primary_key_id, start_dt, attribute1) values(2, to_date('2016-01-01','yyyy-mm-dd'), 19);

create table HIST2
(
  primary_key_id INTEGER,
  start_dt       DATE,
  attribute2     NUMBER
);
 
insert into HIST2(primary_key_id, start_dt, attribute2) values(1, to_date('2015-01-01','yyyy-mm-dd'), 4);
insert into HIST2(primary_key_id, start_dt, attribute2) values(1, to_date('2016-01-01','yyyy-mm-dd'), 5);
insert into HIST2(primary_key_id, start_dt, attribute2) values(1, to_date('2017-01-01','yyyy-mm-dd'), 6);
insert into HIST2(primary_key_id, start_dt, attribute2) values(2, to_date('2015-01-01','yyyy-mm-dd'), 14);
insert into HIST2(primary_key_id, start_dt, attribute2) values(2, to_date('2016-01-01','yyyy-mm-dd'), 15);
insert into HIST2(primary_key_id, start_dt, attribute2) values(2, to_date('2017-01-01','yyyy-mm-dd'), 16);

create table HIST3
(
  primary_key_id INTEGER,
  start_dt       DATE,
  attribute3     NUMBER
);
 
insert into HIST3(primary_key_id, start_dt, attribute3) values(1, to_date('2016-01-01','yyyy-mm-dd'), 10);
insert into HIST3(primary_key_id, start_dt, attribute3) values(1, to_date('2017-01-01','yyyy-mm-dd'), 20);
insert into HIST3(primary_key_id, start_dt, attribute3) values(1, to_date('2018-01-01','yyyy-mm-dd'), 30);
insert into HIST3(primary_key_id, start_dt, attribute3) values(2, to_date('2016-01-01','yyyy-mm-dd'), 110);
insert into HIST3(primary_key_id, start_dt, attribute3) values(2, to_date('2017-01-01','yyyy-mm-dd'), 120);
insert into HIST3(primary_key_id, start_dt, attribute3) values(2, to_date('2018-01-01','yyyy-mm-dd'), 130);


The goal is to load a single change history of three attributes into one table.

Below is a query that solves this problem. It first forms a diagonal table q1 with data from different sources for different attributes (attributes that are absent in the source are filled with nulls). Then, using the last_value (* ignore nulls) function, the diagonal table is collapsed into a single history, and the last known attribute values ​​are extended forward to those dates on which there were no changes for them:



select primary_key_id,
       start_dt,
       nvl(lead(start_dt - 1)
           over(partition by primary_key_id order by start_dt),
           to_date('9999-12-31', 'yyyy-mm-dd')) as end_dt,
       last_value(attribute1 ignore nulls) over(partition by primary_key_id order by start_dt) as attribute1,
       last_value(attribute2 ignore nulls) over(partition by primary_key_id order by start_dt) as attribute2,
       last_value(attribute3 ignore nulls) over(partition by primary_key_id order by start_dt) as attribute3
  from (select primary_key_id,
               start_dt,
               max(attribute1) as attribute1,
               max(attribute2) as attribute2,
               max(attribute3) as attribute3
          from (select primary_key_id,
                       start_dt,
                       attribute1,
                       cast(null as number) attribute2,
                       cast(null as number) attribute3
                  from hist1
                union all
                select primary_key_id,
                       start_dt,
                       cast(null as number) attribute1,
                       attribute2,
                       cast(null as number) attribute3
                  from hist2
                union all
                select primary_key_id,
                       start_dt,
                       cast(null as number) attribute1,
                       cast(null as number) attribute2,
                       attribute3
                  from hist3) q1
         group by primary_key_id, start_dt) q2
 order by primary_key_id, start_dt;


The result is like this:

PRIMARY_KEY_ID START_DT END_DT ATTRIBUTE1 ATTRIBUTE2 ATTRIBUTE3
1 01/01/2014 12/31/2014 7 NULL NULL
1 01.01.2015 31.12.2015 8 4 NULL
1 01/01/2016 12/31/2016 nine five ten
1 01.01.2017 31.12.2017 nine 6 20
1 01.01.2018 31.12.9999 nine 6 thirty
2 01/01/2014 12/31/2014 17 NULL NULL
2 01.01.2015 31.12.2015 eighteen fourteen NULL
2 01/01/2016 12/31/2016 19 fifteen 110
2 01.01.2017 31.12.2017 19 sixteen 120
2 01.01.2018 31.12.9999 19 sixteen 130


Normalizer



Sometimes the problem arises of normalizing data that came in the format of a delimited field. For example, in the form of a table like this:



create table DENORMALIZED_TABLE
(
  id  INTEGER,
  val VARCHAR2(4000)
);

insert into DENORMALIZED_TABLE(id, val) values(1, 'aaa,cccc,bb');
insert into DENORMALIZED_TABLE(id, val) values(2, 'ddd');
insert into DENORMALIZED_TABLE(id, val) values(3, 'fffff,e');


This query normalizes the data by pasting the comma-linked fields as multiple lines:



select id, regexp_substr(val, '[^,]+', 1, column_value) val, column_value
  from denormalized_table,
       table(cast(multiset
                  (select level
                     from dual
                   connect by regexp_instr(val, '[^,]+', 1, level) > 0) as
                  sys.odcinumberlist))
 order by id, column_value;


The result is like this:

ID VAL COLUMN_VALUE
1 aaa 1
1 cccc 2
1 bb 3
2 ddd 1
3 fffff 1
3 e 2


Rendering in SVG format



Often there is a desire to somehow visualize the numerical indicators stored in the database. For example, build graphs, histograms, charts. Specialized tools such as Oracle BI can help. But licenses for these tools can cost money, and setting them up can take more time than writing "on the knee" SQL query to Oracle, which will return the finished picture. Let's demonstrate with an example how to quickly draw such a picture in SVG format using a query.

Suppose we have a table with data



create table graph_data(dt date, val number, radius number);

insert into graph_data(dt, val, radius) values (to_date('01.01.2020','dd.mm.yyyy'), 12, 3);
insert into graph_data(dt, val, radius) values (to_date('02.01.2020','dd.mm.yyyy'), 15, 4);
insert into graph_data(dt, val, radius) values (to_date('05.01.2020','dd.mm.yyyy'), 17, 5);
insert into graph_data(dt, val, radius) values (to_date('06.01.2020','dd.mm.yyyy'), 13, 6);
insert into graph_data(dt, val, radius) values (to_date('08.01.2020','dd.mm.yyyy'),  3, 7);
insert into graph_data(dt, val, radius) values (to_date('10.01.2020','dd.mm.yyyy'), 20, 8);
insert into graph_data(dt, val, radius) values (to_date('11.01.2020','dd.mm.yyyy'), 18, 9);


dt is the date of relevance,

val is a numerical indicator, the dynamics of which we visualize over time,

radius is another numerical indicator that we will draw in the form of a circle with such a radius.

Let's say a few words about the SVG format. It is a vector graphics format that can be viewed in modern browsers and converted to other graphic formats. In it, among other things, you can draw lines, circles and write text:



<line x1="94" x2="94" y1="15" y2="675" style="stroke:rgb(150,255,255); stroke-width:1px"/>
<circle cx="30" cy="279" r="3" style="fill:rgb(255,0,0)"/>
<text x="7" y="688" font-size="10" fill="rgb(0,150,255)">2020-01-01</text>


Below is an SQL query to Oracle that plots a graph from the data in this table. Here the const subquery contains various constant settings - image size, number of labels on the chart axes, line and circle colors, font sizes, etc. In the gd1 subquery, we convert the data from the graph_data table to the x and y coordinates in the figure. The gd2 subquery remembers the previous points in time, from which the lines need to be drawn to new points. The 'header' block is the header of the image with a white background. The 'vertical lines' block draws vertical lines. The 'dates under vertical lines' block labels dates on the x-axis. The 'horizontal lines' block draws horizontal lines. The 'values ​​near horizontal lines' block labels the values ​​on the y-axis. The 'circles' block draws circles of the specified radius in the graph_data table.The 'graph data' block builds a graph of the dynamics of the val indicator from the graph_data table from the lines. The 'footer' block adds a trailing tag.



with const as
 (select 700 viewbox_width,
         700 viewbox_height,
         30 left_margin,
         30 right_margin,
         15 top_margin,
         25 bottom_margin,
         max(dt) - min(dt) + 1 num_vertical_lines,
         11 num_horizontal_lines,
         'rgb(150,255,255)' stroke_vertical_lines,
         '1px' stroke_width_vertical_lines,
         10 font_size_dates,
         'rgb(0,150,255)' fill_dates,
         23 x_dates_pad,
         13 y_dates_pad,
         'rgb(150,255,255)' stroke_horizontal_lines,
         '1px' stroke_width_horizontal_lines,
         10 font_size_values,
         'rgb(0,150,255)' fill_values,
         4 x_values_pad,
         2 y_values_pad,
         'rgb(255,0,0)' fill_circles,
         'rgb(51,102,0)' stroke_graph,
         '1px' stroke_width_graph,
         min(dt) min_dt,
         max(dt) max_dt,
         max(val) max_val
    from graph_data),
gd1 as
 (select graph_data.dt,
         const.left_margin +
         (const.viewbox_width - const.left_margin - const.right_margin) *
         (graph_data.dt - const.min_dt) / (const.max_dt - const.min_dt) x,
         const.viewbox_height - const.bottom_margin -
         (const.viewbox_height - const.top_margin - const.bottom_margin) *
         graph_data.val / const.max_val y,
         graph_data.radius
    from graph_data, const),
gd2 as
 (select dt,
         round(nvl(lag(x) over(order by dt), x)) prev_x,
         round(x) x,
         round(nvl(lag(y) over(order by dt), y)) prev_y,
         round(y) y,
         radius
    from gd1)
/* header */
select '<?xml version="1.0" encoding="UTF-8" standalone="no"?>' txt
  from dual
union all
select '<svg version="1.1" width="' || viewbox_width || '" height="' ||
       viewbox_height || '" viewBox="0 0 ' || viewbox_width || ' ' ||
       viewbox_height ||
       '" style="background:yellow" baseProfile="full" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ev="http://www.w3.org/2001/xml-events">'
  from const
union all
select '<title>Test graph</title>'
  from dual
union all
select '<desc>Test graph</desc>'
  from dual
union all
select '<rect width="' || viewbox_width || '" height="' || viewbox_height ||
       '" style="fill:white" />'
  from const
union all
/* vertical lines */
select '<line x1="' ||
       to_char(round(left_margin +
                     (viewbox_width - left_margin - right_margin) *
                     (level - 1) / (num_vertical_lines - 1))) || '" x2="' ||
       to_char(round(left_margin +
                     (viewbox_width - left_margin - right_margin) *
                     (level - 1) / (num_vertical_lines - 1))) || '" y1="' ||
       to_char(round(top_margin)) || '" y2="' ||
       to_char(round(viewbox_height - bottom_margin)) || '" style="stroke:' ||
       const.stroke_vertical_lines || '; stroke-width:' ||
       const.stroke_width_vertical_lines || '"/>'
  from const
connect by level <= num_vertical_lines
union all
/* dates under vertical lines */
select '<text x="' ||
       to_char(round(left_margin +
                     (viewbox_width - left_margin - right_margin) *
                     (level - 1) / (num_vertical_lines - 1) - x_dates_pad)) ||
       '" y="' ||
       to_char(round(viewbox_height - bottom_margin + y_dates_pad)) ||
       '" font-size="' || font_size_dates || '" fill="' || fill_dates || '">' ||
       to_char(min_dt + level - 1, 'yyyy-mm-dd') || '</text>'
  from const
connect by level <= num_vertical_lines
union all
/* horizontal lines */
select '<line x1="' || to_char(round(left_margin)) || '" x2="' ||
       to_char(round(viewbox_width - right_margin)) || '" y1="' ||
       to_char(round(top_margin +
                     (viewbox_height - top_margin - bottom_margin) *
                     (level - 1) / (num_horizontal_lines - 1))) || '" y2="' ||
       to_char(round(top_margin +
                     (viewbox_height - top_margin - bottom_margin) *
                     (level - 1) / (num_horizontal_lines - 1))) ||
       '" style="stroke:' || const.stroke_horizontal_lines ||
       '; stroke-width:' || const.stroke_width_horizontal_lines || '"/>'
  from const
connect by level <= num_horizontal_lines
union all
/* values near horizontal lines */
select '<text text-anchor="end" x="' ||
       to_char(round(left_margin - x_values_pad)) || '" y="' ||
       to_char(round(viewbox_height - bottom_margin -
                     (viewbox_height - top_margin - bottom_margin) *
                     (level - 1) / (num_horizontal_lines - 1) +
                     y_values_pad)) || '" font-size="' || font_size_values ||
       '" fill="' || fill_values || '">' ||
       to_char(round(max_val / (num_horizontal_lines - 1) * (level - 1), 2)) ||
       '</text>'
  from const
connect by level <= num_horizontal_lines
union all
/* circles */
select '<circle cx="' || to_char(gd2.x) || '" cy="' || to_char(gd2.y) ||
       '" r="' || gd2.radius || '" style="fill:' || const.fill_circles ||
       '"/>'
  from gd2, const
union all
/* graph data */
select '<line x1="' || to_char(gd2.prev_x) || '" x2="' || to_char(gd2.x) ||
       '" y1="' || to_char(gd2.prev_y) || '" y2="' || to_char(gd2.y) ||
       '" style="stroke:' || const.stroke_graph || '; stroke-width:' ||
       const.stroke_width_graph || '"/>'
  from gd2, const
union all
/* footer */
select '</svg>' from dual;


The query result can be saved to a file with the * .svg extension and viewed in a browser. If you want, you can use any of the utilities to convert it to other graphic formats, place it on the web pages of your application, etc.

The result is the following picture:







Oracle Metadata Search Application



Imagine trying to find something in the source code on Oracle by looking at information on several servers at once. This is about searching through Oracle data dictionary objects. The work place for the search is the web interface, where the user-programmer enters the search string and selects the checkboxes on which Oracle servers to perform this search.

The web search engine is able to search for a row in Oracle server objects simultaneously in several different databases of the bank. For example, you can search for:

  • Oracle 61209, ?
  • accounts ( .. database link)?
  • , , ORA-20001 β€œ ”?
  • IX_CLIENTID - SQL-?
  • - ( .. database link) , , , ..?
  • - - ? .
  • Oracle ? , wm_concat Oracle. .
  • - , , ? , Oracle sys_connect_by_path, regexp_instr push_subq.


Based on the search results, the user is given information on which server in the code of which functions, procedures, packages, triggers, views, etc. found the required results.

Let's describe how such a search engine is implemented.



The client side is not complicated. The web interface receives the search string entered by the user, the list of servers to search for, and the user's login. The web page passes them to an Oracle stored procedure on the handler server. The history of requests to the search engine, i.e. who executed which request is logged just in case.



After receiving a search query, the server side on the Oracle search server runs several procedures in parallel jobs that scan the following data dictionary views on the database links on the selected Oracle servers in search of the desired string: dba_col_comments, dba_jobs, dba_mviews, dba_objects, dba_scheduler_jobs, dba_babsource, dba_scheduler_jobs, dba_ba_bacource , dba_views. Each of the procedures, if found something, writes the found in the search results table (with the corresponding search query ID).



When all search procedures have finished, the client part gives the user everything that is written in the search results table with the corresponding search query ID.

But that's not all. In addition to searching in the Oracle data dictionary, the search in the Informatica PowerCenter repository was also screwed into the described mechanism. Informatica PowerCenter is a popular ETL tool used by Sberbank to load various information into data warehouses. Informatica PowerCenter has an open, well-documented repository structure. On this repository, it is possible to search for information in the same way as in the Oracle data dictionary. What tables and fields are used in the download code developed with Informatica PowerCenter? What can be found in port transforms and explicit SQL queries? All this information is available in the structures of the repository and can be found. For PowerCenter connoisseurs, I'll write that our search engine scans the following repository locations in search of mappings, sessions or workflows,containing the search string somewhere: sql override, mapplet attributes, ports, source definitions in mappings, source definitions, target definitions in mappings, target_definitions, mappings, mapplets, workflows, worklets, sessions, commands, expression ports, session instances, source definition fields, target definition fields, email tasks.



: , SberProfi DWH/BigData.



SberProfi DWH/BigData , Hadoop, Teradata, Oracle DB, GreenPlum, BI Qlik, SAP BO, Tableau .



All Articles