- Automatic compilation logging
- What to do if you want to make a view with parameters
- Using dynamic statistics in queries
- How to save query plan when inserting data via database link
- Running procedures in parallel sessions
- Pulling through leftovers
- Combining multiple stories into one
- Normalizer
- Rendering in SVG format
- Oracle Metadata Search Application
Automatic compilation logging
On some Oracle databases, Sberbank has a compilation trigger that remembers who, when and what changed in the code of server objects. Thus, the author of changes can be established from the compilation log table. A version control system is also automatically implemented. In any case, if the programmer forgot to submit the changes to Git, then this mechanism will hedge. Let's describe an example of implementation of such a system of automatic compilation logging. One of the simplified versions of the compilation trigger that writes to the log in the form of the ddl_changes_log table looks like this:
create table DDL_CHANGES_LOG
(
id INTEGER,
change_date DATE,
sid VARCHAR2(100),
schemaname VARCHAR2(30),
machine VARCHAR2(100),
program VARCHAR2(100),
osuser VARCHAR2(100),
obj_owner VARCHAR2(30),
obj_type VARCHAR2(30),
obj_name VARCHAR2(30),
previous_version CLOB,
changes_script CLOB
);
create or replace trigger trig_audit_ddl_trg
before ddl on database
declare
v_sysdate date;
v_valid number;
v_previous_obj_owner varchar2(30) := '';
v_previous_obj_type varchar2(30) := '';
v_previous_obj_name varchar2(30) := '';
v_previous_change_date date;
v_lob_loc_old clob := '';
v_lob_loc_new clob := '';
v_n number;
v_sql_text ora_name_list_t;
v_sid varchar2(100) := '';
v_schemaname varchar2(30) := '';
v_machine varchar2(100) := '';
v_program varchar2(100) := '';
v_osuser varchar2(100) := '';
begin
v_sysdate := sysdate;
-- find whether compiled object already presents and is valid
select count(*)
into v_valid
from sys.dba_objects
where owner = ora_dict_obj_owner
and object_type = ora_dict_obj_type
and object_name = ora_dict_obj_name
and status = 'VALID'
and owner not in ('SYS', 'SPOT', 'WMSYS', 'XDB', 'SYSTEM')
and object_type in ('TRIGGER', 'PROCEDURE', 'FUNCTION', 'PACKAGE', 'PACKAGE BODY', 'VIEW');
-- find information about previous compiled object
select max(obj_owner) keep(dense_rank last order by id),
max(obj_type) keep(dense_rank last order by id),
max(obj_name) keep(dense_rank last order by id),
max(change_date) keep(dense_rank last order by id)
into v_previous_obj_owner, v_previous_obj_type, v_previous_obj_name, v_previous_change_date
from ddl_changes_log;
-- if compile valid object or compile invalid package body broken by previous compilation of package then log it
if (v_valid = 1 or v_previous_obj_owner = ora_dict_obj_owner and
(v_previous_obj_type = 'PACKAGE' and ora_dict_obj_type = 'PACKAGE BODY' or
v_previous_obj_type = 'PACKAGE BODY' and ora_dict_obj_type = 'PACKAGE') and
v_previous_obj_name = ora_dict_obj_name and
v_sysdate - v_previous_change_date <= 1 / 24 / 60 / 2) and
ora_sysevent in ('CREATE', 'ALTER') then
-- store previous version of object (before compilation) from dba_source or dba_views in v_lob_loc_old
if ora_dict_obj_type <> 'VIEW' then
for z in (select substr(text, 1, length(text) - 1) || chr(13) || chr(10) as text
from sys.dba_source
where owner = ora_dict_obj_owner
and type = ora_dict_obj_type
and name = ora_dict_obj_name
order by line) loop
v_lob_loc_old := v_lob_loc_old || z.text;
end loop;
else
select sys.dbms_metadata_util.long2clob(v.textlength, 'SYS.VIEW$', 'TEXT', v.rowid) into v_lob_loc_old
from sys."_CURRENT_EDITION_OBJ" o, sys.view$ v, sys.user$ u
where o.obj# = v.obj#
and o.owner# = u.user#
and u.name = ora_dict_obj_owner
and o.name = ora_dict_obj_name;
end if;
-- store new version of object (after compilation) from v_sql_text in v_lob_loc_new
v_n := ora_sql_txt(v_sql_text);
for i in 1 .. v_n loop
v_lob_loc_new := v_lob_loc_new || replace(v_sql_text(i), chr(10), chr(13) || chr(10));
end loop;
-- find information about session that changed this object
select max(to_char(sid)), max(schemaname), max(machine), max(program), max(osuser)
into v_sid, v_schemaname, v_machine, v_program, v_osuser
from v$session
where audsid = userenv('sessionid');
-- store changes in ddl_changes_log
insert into ddl_changes_log
(id, change_date, sid, schemaname, machine, program, osuser,
obj_owner, obj_type, obj_name, previous_version, changes_script)
values
(seq_ddl_changes_log.nextval, v_sysdate, v_sid, v_schemaname, v_machine, v_program, v_osuser,
ora_dict_obj_owner, ora_dict_obj_type, ora_dict_obj_name, v_lob_loc_old, v_lob_loc_new);
end if;
exception
when others then
null;
end;
In this trigger, the name and new contents of the compiled object are obtained, supplemented with the previous contents from the data dictionary, and written to the change log.
What to do if you want to make a view with parameters
Such a desire can often be visited by a developer at Oracle. Why is it possible to create a procedure or function with parameters, but there are no views with input parameters that can be used in calculations? Oracle has something to replace this missing, in our opinion, concept.
Let's look at an example. Let there be a table with sales by division for each day.
create table DIVISION_SALES
(
division_id INTEGER,
dt DATE,
sales_amt NUMBER
);
This query compares sales by department over two days. In this case, 04/30/2020 and 09/11/2020.
select t1.division_id,
t1.dt dt1,
t2.dt dt2,
t1.sales_amt sales_amt1,
t2.sales_amt sales_amt2
from (select dt, division_id, sales_amt
from division_sales
where dt = to_date('30.04.2020', 'dd.mm.yyyy')) t1,
(select dt, division_id, sales_amt
from division_sales
where dt = to_date('11.09.2020', 'dd.mm.yyyy')) t2
where t1.division_id = t2.division_id;
Here is a view I would like to write to summarize such a request. I would like to pass dates as parameters. However, the syntax does not allow for this.
create or replace view vw_division_sales_report(in_dt1 date, in_dt2 date) as
select t1.division_id,
t1.dt dt1,
t2.dt dt2,
t1.sales_amt sales_amt1,
t2.sales_amt sales_amt2
from (select dt, division_id, sales_amt
from division_sales
where dt = in_dt1) t1,
(select dt, division_id, sales_amt
from division_sales
where dt = in_dt2) t2
where t1.division_id = t2.division_id;
Such a workaround is suggested. Let's create a type for the line from this view.
create type t_division_sales_report as object
(
division_id INTEGER,
dt1 DATE,
dt2 DATE,
sales_amt1 NUMBER,
sales_amt2 NUMBER
);
And we will create a type for a table from such strings.
create type t_division_sales_report_table as table of t_division_sales_report;
Instead of a view, let's write a pipelined function with date input parameters.
create or replace function func_division_sales(in_dt1 date, in_dt2 date)
return t_division_sales_report_table
pipelined as
begin
for z in (select t1.division_id,
t1.dt dt1,
t2.dt dt2,
t1.sales_amt sales_amt1,
t2.sales_amt sales_amt2
from (select dt, division_id, sales_amt
from division_sales
where dt = in_dt1) t1,
(select dt, division_id, sales_amt
from division_sales
where dt = in_dt2) t2
where t1.division_id = t2.division_id) loop
pipe row(t_division_sales_report(z.division_id,
z.dt1,
z.dt2,
z.sales_amt1,
z.sales_amt2));
end loop;
end;
You can refer to it like this:
select *
from table(func_division_sales(to_date('30.04.2020', 'dd.mm.yyyy'),
to_date('11.09.2020', 'dd.mm.yyyy')));
This request will give us the same result as the request at the beginning of this post with explicitly substituted dates.
Pipelined functions can also be useful when you need to pass a parameter inside a complex request.
For example, consider a complex view in which the field1, by which you want to filter data, is hidden somewhere deep in the view.
create or replace view complex_view as
select field1, ...
from (select field1, ...
from (select field1, ... from deep_table), table1
where ...),
table2
where ...;
And a query from a view with a fixed value of field1 may have a bad execution plan.
select field1, ... from complex_view
where field1 = 'myvalue';
Those. instead of first filtering deep_table by the condition field1 = 'myvalue', the query can first join all tables, processing an unnecessarily large amount of data, and then filter the result by the condition field1 = 'myvalue'. This complexity can be avoided by making a function with a parameter assigned to field1 instead of a pipelined view.
Using dynamic statistics in queries
It happens that the same query in the Oracle database processes each time a different amount of data in the tables and subqueries used in it. How do you get the optimizer to figure out which way to join tables this time and which indexes to use each time? Consider, for example, a query that connects a portion of the account balances that have changed since the last download to the account directory. The portion of the changed account balances varies greatly from download to download, amounting to hundreds of lines, sometimes millions of lines. Depending on the size of this portion, it is required to combine the changed balances with accounts either by the / * + use_nl * / method, or by the / * + use_hash * / method. It is inconvenient to re-collect statistics every time, especially if the number of rows changes from load to load not in the joined table, but in the joined subquery.The hint / * + dynamic_sampling () * / can come to the rescue here. Let's show how it affects, using an example request. Let the table change_balances contain changes in balances, and accounts - the directory of accounts. We join these tables by the account_id fields available in each of the tables. At the beginning of the experiment, we will write more rows in these tables and will not change their contents.
First, let's take 10% of the changes in the residuals in the change_balances table and see what the plan will be using dynamic_sampling:
SQL> EXPLAIN PLAN
2 SET statement_id = 'test1'
3 INTO plan_table
4 FOR with c as
5 (select /*+ dynamic_sampling(change_balances 2)*/
6 account_id, balance_amount
7 from change_balances
8 where mod(account_id, 10) = 0)
9 select a.account_id, a.account_number, c.balance_amount
10 from c, accounts a
11 where c.account_id = a.account_id;
Explained.
SQL>
SQL> SELECT * FROM table (DBMS_XPLAN.DISPLAY);
Plan hash value: 874320301
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9951K| 493M| | 140K (1)| 00:28:10 |
|* 1 | HASH JOIN | | 9951K| 493M| 3240K| 140K (1)| 00:28:10 |
|* 2 | TABLE ACCESS FULL| CHANGE_BALANCES | 100K| 2057K| | 7172 (1)| 00:01:27 |
| 3 | TABLE ACCESS FULL| ACCOUNTS | 10M| 295M| | 113K (1)| 00:22:37 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ACCOUNT_ID"="A"."ACCOUNT_ID")
2 - filter(MOD("ACCOUNT_ID",10)=0)
Note
-----
- dynamic sampling used for this statement (level=2)
20 rows selected.
So, we see that it is proposed to go through the change_balances and accounts tables using a full scan and join them using a hash join.
Now let's drastically reduce the sample from change_balances. Let's take 0.1% of the residual changes and see what the plan will be using dynamic_sampling:
SQL> EXPLAIN PLAN
2 SET statement_id = 'test2'
3 INTO plan_table
4 FOR with c as
5 (select /*+ dynamic_sampling(change_balances 2)*/
6 account_id, balance_amount
7 from change_balances
8 where mod(account_id, 1000) = 0)
9 select a.account_id, a.account_number, c.balance_amount
10 from c, accounts a
11 where c.account_id = a.account_id;
Explained.
SQL>
SQL> SELECT * FROM table (DBMS_XPLAN.DISPLAY);
Plan hash value: 2360715730
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 73714 | 3743K| 16452 (1)| 00:03:18 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 73714 | 3743K| 16452 (1)| 00:03:18 |
|* 3 | TABLE ACCESS FULL | CHANGE_BALANCES | 743 | 15603 | 7172 (1)| 00:01:27 |
|* 4 | INDEX RANGE SCAN | IX_ACCOUNTS_ACCOUNT_ID | 104 | | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| ACCOUNTS | 99 | 3069 | 106 (0)| 00:00:02 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(MOD("ACCOUNT_ID",1000)=0)
4 - access("ACCOUNT_ID"="A"."ACCOUNT_ID")
Note
-----
- dynamic sampling used for this statement (level=2)
22 rows selected.
This time, the accounts table is attached to the change_balances table with nested loops and an index is used to read rows from accounts.
If the dynamic_sampling hint is removed, then in the second case the plan will remain the same as in the first case, and this is not optimal.
Details on the dynamic_sampling hint and possible values ββfor its numeric argument can be found in the documentation.
How to save query plan when inserting data via database link
We solve this problem. The data source server has tables that need to be joined and loaded into the data warehouse. Suppose a view is written on the source server, which contains all the necessary ETL transformation logic. The view is written optimally, it contains hints for the optimizer that suggest how to join tables and which indexes to use. On the server side of the data warehouse, you need to do a simple thing - insert the data from the view into the target table. And here difficulties can arise. If you insert into the target table with a command like
insert into dwh_table
(field1, field2)
select field1, field2 from vw_for_dwh_table@xe_link;
, then all the logic of the query plan contained in the view from which we read data via the database link can be ignored. All hints embedded in this view can be ignored.
SQL> EXPLAIN PLAN
2 SET statement_id = 'test'
3 INTO plan_table
4 FOR insert into dwh_table
5 (field1, field2)
6 select field1, field2 from vw_for_dwh_table@xe_link;
Explained.
SQL>
SQL> SELECT * FROM table (DBMS_XPLAN.DISPLAY);
Plan hash value: 1788691278
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
-------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | 2015 | 2 (0)| 00:00:01 | | |
| 1 | LOAD TABLE CONVENTIONAL | DWH_TABLE | | | | | | |
| 2 | REMOTE | VW_FOR_DWH_TABLE | 1 | 2015 | 2 (0)| 00:00:01 | XE_LI~ | R->S |
-------------------------------------------------------------------------------------------------------------
Remote SQL Information (identified by operation id):
----------------------------------------------------
2 - SELECT /*+ OPAQUE_TRANSFORM */ "FIELD1","FIELD2" FROM "VW_FOR_DWH_TABLE" "VW_FOR_DWH_TABLE"
(accessing 'XE_LINK' )
16 rows selected.
To save the query plan in the view, you can use the insertion of data into the target table from the cursor:
declare
cursor cr is
select field1, field2 from vw_for_dwh_table@xe_link;
cr_row cr%rowtype;
begin
open cr;
loop
fetch cr
into cr_row;
insert into dwh_table
(field1, field2)
values
(cr_row.field1, cr_row.field2);
exit when cr%notfound;
end loop;
close cr;
end;
Query from cursor
select field1, field2 from vw_for_dwh_table@xe_link;
as opposed to insert
insert into dwh_table
(field1, field2)
select field1, field2 from vw_for_dwh_table@xe_link;
will save the plan of the request, laid in the view on the source server.
Running procedures in parallel sessions
Often the task is to start several parallel calculations from some parent procedure and, after waiting for each of them to complete, continue executing the parent procedure. This can be useful in parallel computing if the server resources allow it. There are many ways to do this.
Let's describe a very simple implementation of such a mechanism. Parallel procedures will be performed in parallel βone-timeβ jobs, while the parent procedure will wait in a loop for the completion of all these jobs.
Let's create tables with metadata for this mechanism. To begin with, let's make a table with groups of procedures running in parallel:
create table PARALLEL_PROC_GROUP_LIST
(
group_id INTEGER,
group_name VARCHAR2(4000)
);
comment on column PARALLEL_PROC_GROUP_LIST.group_id
is ' ';
comment on column PARALLEL_PROC_GROUP_LIST.group_name
is ' ';
Next, we will create a table with scripts that will be executed in parallel in groups. The filling of this table can be either static or created dynamically:
create table PARALLEL_PROC_LIST
(
group_id INTEGER,
proc_script VARCHAR2(4000),
is_active CHAR(1) default 'Y'
);
comment on column PARALLEL_PROC_LIST.group_id
is ' ';
comment on column PARALLEL_PROC_LIST.proc_script
is 'Pl/sql ';
comment on column PARALLEL_PROC_LIST.is_active
is 'Y - active, N - inactive. ';
And we will make a log table, where we will collect a log of which procedure was launched in which job:
create table PARALLEL_PROC_LOG
(
run_id INTEGER,
group_id INTEGER,
proc_script VARCHAR2(4000),
job_id INTEGER,
start_time DATE,
end_time DATE
);
comment on column PARALLEL_PROC_LOG.run_id
is ' run_in_parallel';
comment on column PARALLEL_PROC_LOG.group_id
is ' ';
comment on column PARALLEL_PROC_LOG.proc_script
is 'Pl/sql ';
comment on column PARALLEL_PROC_LOG.job_id
is 'Job_id , ';
comment on column PARALLEL_PROC_LOG.start_time
is ' ';
comment on column PARALLEL_PROC_LOG.end_time
is ' ';
create sequence Seq_Parallel_Proc_Log;
Now let's give the code of the procedure for starting parallel streams:
create or replace procedure run_in_parallel(in_group_id integer) as
-- parallel_proc_list.
-- - parallel_proc_list
v_run_id integer;
v_job_id integer;
v_job_id_list varchar2(32767);
v_job_id_list_ext varchar2(32767);
v_running_jobs_count integer;
begin
select seq_parallel_proc_log.nextval into v_run_id from dual;
-- submit jobs with the same parallel_proc_list.in_group_id
-- store seperated with ',' JOB_IDs in v_job_id_list
v_job_id_list := null;
v_job_id_list_ext := null;
for z in (select pt.proc_script
from parallel_proc_list pt
where pt.group_id = in_group_id
and pt.is_active = 'Y') loop
dbms_job.submit(v_job_id, z.proc_script);
insert into parallel_proc_log
(run_id, group_id, proc_script, job_id, start_time, end_time)
values
(v_run_id, in_group_id, z.proc_script, v_job_id, sysdate, null);
v_job_id_list := v_job_id_list || ',' || to_char(v_job_id);
v_job_id_list_ext := v_job_id_list_ext || ' union all select ' ||
to_char(v_job_id) || ' job_id from dual';
end loop;
commit;
v_job_id_list := substr(v_job_id_list, 2);
v_job_id_list_ext := substr(v_job_id_list_ext, 12);
-- loop while not all jobs finished
loop
-- set parallel_proc_log.end_time for finished jobs
execute immediate 'update parallel_proc_log set end_time = sysdate where job_id in (' ||
v_job_id_list_ext ||
' minus select job from user_jobs where job in (' ||
v_job_id_list ||
') minus select job_id from parallel_proc_log where job_id in (' ||
v_job_id_list || ') and end_time is not null)';
commit;
-- check whether all jobs finished
execute immediate 'select count(1) from user_jobs where job in (' ||
v_job_id_list || ')'
into v_running_jobs_count;
-- if all jobs finished then exit
exit when v_running_jobs_count = 0;
-- sleep a little
sys.dbms_lock.sleep(0.1);
end loop;
end;
Let's check how the run_in_parallel procedure works. Let's create a test procedure that we will call in parallel sessions.
create or replace procedure sleep(in_seconds integer) as
begin
sys.Dbms_Lock.Sleep(in_seconds);
end;
Fill in the name of the group and the table with scripts that will be executed in parallel.
insert into PARALLEL_PROC_GROUP_LIST(group_id, group_name) values(1, ' ');
insert into PARALLEL_PROC_LIST(group_id, proc_script, is_active) values(1, 'begin sleep(5); end;', 'Y');
insert into PARALLEL_PROC_LIST(group_id, proc_script, is_active) values(1, 'begin sleep(10); end;', 'Y');
Let's start a group of parallel procedures.
begin
run_in_parallel(1);
end;
When finished, let's see the log.
select * from PARALLEL_PROC_LOG;
RUN_ID | GROUP_ID | PROC_SCRIPT | JOB_ID | START_TIME | END_TIME |
1 | 1 | begin sleep (5); end; | 1 | 09/11/2020 15:00:51 | 09/11/2020 15:00:56 |
1 | 1 | begin sleep (10); end; | 2 | 09/11/2020 15:00:51 | 09/11/2020 15:01:01 |
We see that the execution time of the test procedure instances meets expectations.
Pulling through leftovers
Let us describe a variant of solving a fairly typical banking problem of βpulling the balanceβ. Let's say there is a table of facts of changes in account balances. It is required to indicate the current account balance for each day of the calendar (the last one for the day). Such information is often needed in data warehouses. If on some day there were no movements in the count, then you need to repeat the last known remainder. If the amount of data and computing power of the server allows, then you can solve this problem using an SQL query, without even resorting to PL / SQL. The last_value (* ignore nulls) over (partition by * order by *) function will help us in this, which will stretch the last known remainder to subsequent dates in which there were no changes.
Let's create a table and fill it with test data.
create table ACCOUNT_BALANCE
(
dt DATE,
account_id INTEGER,
balance_amt NUMBER,
turnover_amt NUMBER
);
comment on column ACCOUNT_BALANCE.dt
is ' ';
comment on column ACCOUNT_BALANCE.account_id
is ' ';
comment on column ACCOUNT_BALANCE.balance_amt
is ' ';
comment on column ACCOUNT_BALANCE.turnover_amt
is ' ';
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('01.01.2020 00:00:00','dd.mm.yyyy hh24.mi.ss'), 1, 23, 23);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('05.01.2020 01:00:00','dd.mm.yyyy hh24.mi.ss'), 1, 45, 22);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('05.01.2020 20:00:00','dd.mm.yyyy hh24.mi.ss'), 1, 44, -1);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('05.01.2020 00:00:00','dd.mm.yyyy hh24.mi.ss'), 2, 67, 67);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('05.01.2020 20:00:00','dd.mm.yyyy hh24.mi.ss'), 2, 77, 10);
insert into account_balance(dt, account_id, balance_amt, turnover_amt) values(to_date('07.01.2020 00:00:00','dd.mm.yyyy hh24.mi.ss'), 2, 72, -5);
The query below solves our problem. The subquery 'cld' contains the calendar of dates, in the subquery 'ab' we group the balances for each day, in the subquery 'a' we remember the list of all accounts and the start date of history for each account, in the subquery 'pre' for each account we compose a calendar of days from the beginning of it stories. The final request adds the last balances for each day to the calendar of active days of each account and extends them to the days in which there were no changes.
with cld as
(select /*+ materialize*/
to_date('01.01.2020', 'dd.mm.yyyy') + level - 1 dt
from dual
connect by level <= 10),
ab as
(select trunc(dt) dt,
account_id,
max(balance_amt) keep(dense_rank last order by dt) balance_amt,
sum(turnover_amt) turnover_amt
from account_balance
group by trunc(dt), account_id),
a as
(select min(dt) min_dt, account_id from ab group by account_id),
pre as
(select cld.dt, a.account_id from cld left join a on cld.dt >= a.min_dt)
select pre.dt,
pre.account_id,
last_value(ab.balance_amt ignore nulls) over(partition by pre.account_id order by pre.dt) balance_amt,
nvl(ab.turnover_amt, 0) turnover_amt
from pre
left join ab
on pre.dt = ab.dt
and pre.account_id = ab.account_id
order by 2, 1;
The result of the query is as expected.
DT | ACCOUNT_ID | BALANCE_AMT | TURNOVER_AMT |
01/01/2020 | 1 | 23 | 23 |
02.01.2020 | 1 | 23 | 0 |
03/01/2020 | 1 | 23 | 0 |
04/01/2020 | 1 | 23 | 0 |
01/05/2020 | 1 | 44 | 21 |
06.01.2020 | 1 | 44 | 0 |
07.01.2020 | 1 | 44 | 0 |
01/08/2020 | 1 | 44 | 0 |
09/01/2020 | 1 | 44 | 0 |
10.01.2020 | 1 | 44 | 0 |
01/05/2020 | 2 | 77 | 77 |
06.01.2020 | 2 | 77 | 0 |
07.01.2020 | 2 | 72 | -five |
01/08/2020 | 2 | 72 | 0 |
09/01/2020 | 2 | 72 | 0 |
10.01.2020 | 2 | 72 | 0 |
Combining multiple stories into one
When loading data into storages, the problem is often solved when you need to build a single history for an entity, having separate history of the attributes of this entity that came from various sources. Suppose there is some entity with a primary key primary_key_id, about which the history (start_dt - end_dt) of its three different attributes is known, located in three different tables.
create table HIST1
(
primary_key_id INTEGER,
start_dt DATE,
attribute1 NUMBER
);
insert into HIST1(primary_key_id, start_dt, attribute1) values(1, to_date('2014-01-01','yyyy-mm-dd'), 7);
insert into HIST1(primary_key_id, start_dt, attribute1) values(1, to_date('2015-01-01','yyyy-mm-dd'), 8);
insert into HIST1(primary_key_id, start_dt, attribute1) values(1, to_date('2016-01-01','yyyy-mm-dd'), 9);
insert into HIST1(primary_key_id, start_dt, attribute1) values(2, to_date('2014-01-01','yyyy-mm-dd'), 17);
insert into HIST1(primary_key_id, start_dt, attribute1) values(2, to_date('2015-01-01','yyyy-mm-dd'), 18);
insert into HIST1(primary_key_id, start_dt, attribute1) values(2, to_date('2016-01-01','yyyy-mm-dd'), 19);
create table HIST2
(
primary_key_id INTEGER,
start_dt DATE,
attribute2 NUMBER
);
insert into HIST2(primary_key_id, start_dt, attribute2) values(1, to_date('2015-01-01','yyyy-mm-dd'), 4);
insert into HIST2(primary_key_id, start_dt, attribute2) values(1, to_date('2016-01-01','yyyy-mm-dd'), 5);
insert into HIST2(primary_key_id, start_dt, attribute2) values(1, to_date('2017-01-01','yyyy-mm-dd'), 6);
insert into HIST2(primary_key_id, start_dt, attribute2) values(2, to_date('2015-01-01','yyyy-mm-dd'), 14);
insert into HIST2(primary_key_id, start_dt, attribute2) values(2, to_date('2016-01-01','yyyy-mm-dd'), 15);
insert into HIST2(primary_key_id, start_dt, attribute2) values(2, to_date('2017-01-01','yyyy-mm-dd'), 16);
create table HIST3
(
primary_key_id INTEGER,
start_dt DATE,
attribute3 NUMBER
);
insert into HIST3(primary_key_id, start_dt, attribute3) values(1, to_date('2016-01-01','yyyy-mm-dd'), 10);
insert into HIST3(primary_key_id, start_dt, attribute3) values(1, to_date('2017-01-01','yyyy-mm-dd'), 20);
insert into HIST3(primary_key_id, start_dt, attribute3) values(1, to_date('2018-01-01','yyyy-mm-dd'), 30);
insert into HIST3(primary_key_id, start_dt, attribute3) values(2, to_date('2016-01-01','yyyy-mm-dd'), 110);
insert into HIST3(primary_key_id, start_dt, attribute3) values(2, to_date('2017-01-01','yyyy-mm-dd'), 120);
insert into HIST3(primary_key_id, start_dt, attribute3) values(2, to_date('2018-01-01','yyyy-mm-dd'), 130);
The goal is to load a single change history of three attributes into one table.
Below is a query that solves this problem. It first forms a diagonal table q1 with data from different sources for different attributes (attributes that are absent in the source are filled with nulls). Then, using the last_value (* ignore nulls) function, the diagonal table is collapsed into a single history, and the last known attribute values ββare extended forward to those dates on which there were no changes for them:
select primary_key_id,
start_dt,
nvl(lead(start_dt - 1)
over(partition by primary_key_id order by start_dt),
to_date('9999-12-31', 'yyyy-mm-dd')) as end_dt,
last_value(attribute1 ignore nulls) over(partition by primary_key_id order by start_dt) as attribute1,
last_value(attribute2 ignore nulls) over(partition by primary_key_id order by start_dt) as attribute2,
last_value(attribute3 ignore nulls) over(partition by primary_key_id order by start_dt) as attribute3
from (select primary_key_id,
start_dt,
max(attribute1) as attribute1,
max(attribute2) as attribute2,
max(attribute3) as attribute3
from (select primary_key_id,
start_dt,
attribute1,
cast(null as number) attribute2,
cast(null as number) attribute3
from hist1
union all
select primary_key_id,
start_dt,
cast(null as number) attribute1,
attribute2,
cast(null as number) attribute3
from hist2
union all
select primary_key_id,
start_dt,
cast(null as number) attribute1,
cast(null as number) attribute2,
attribute3
from hist3) q1
group by primary_key_id, start_dt) q2
order by primary_key_id, start_dt;
The result is like this:
PRIMARY_KEY_ID | START_DT | END_DT | ATTRIBUTE1 | ATTRIBUTE2 | ATTRIBUTE3 |
1 | 01/01/2014 | 12/31/2014 | 7 | NULL | NULL |
1 | 01.01.2015 | 31.12.2015 | 8 | 4 | NULL |
1 | 01/01/2016 | 12/31/2016 | nine | five | ten |
1 | 01.01.2017 | 31.12.2017 | nine | 6 | 20 |
1 | 01.01.2018 | 31.12.9999 | nine | 6 | thirty |
2 | 01/01/2014 | 12/31/2014 | 17 | NULL | NULL |
2 | 01.01.2015 | 31.12.2015 | eighteen | fourteen | NULL |
2 | 01/01/2016 | 12/31/2016 | 19 | fifteen | 110 |
2 | 01.01.2017 | 31.12.2017 | 19 | sixteen | 120 |
2 | 01.01.2018 | 31.12.9999 | 19 | sixteen | 130 |
Normalizer
Sometimes the problem arises of normalizing data that came in the format of a delimited field. For example, in the form of a table like this:
create table DENORMALIZED_TABLE
(
id INTEGER,
val VARCHAR2(4000)
);
insert into DENORMALIZED_TABLE(id, val) values(1, 'aaa,cccc,bb');
insert into DENORMALIZED_TABLE(id, val) values(2, 'ddd');
insert into DENORMALIZED_TABLE(id, val) values(3, 'fffff,e');
This query normalizes the data by pasting the comma-linked fields as multiple lines:
select id, regexp_substr(val, '[^,]+', 1, column_value) val, column_value
from denormalized_table,
table(cast(multiset
(select level
from dual
connect by regexp_instr(val, '[^,]+', 1, level) > 0) as
sys.odcinumberlist))
order by id, column_value;
The result is like this:
ID | VAL | COLUMN_VALUE |
1 | aaa | 1 |
1 | cccc | 2 |
1 | bb | 3 |
2 | ddd | 1 |
3 | fffff | 1 |
3 | e | 2 |
Rendering in SVG format
Often there is a desire to somehow visualize the numerical indicators stored in the database. For example, build graphs, histograms, charts. Specialized tools such as Oracle BI can help. But licenses for these tools can cost money, and setting them up can take more time than writing "on the knee" SQL query to Oracle, which will return the finished picture. Let's demonstrate with an example how to quickly draw such a picture in SVG format using a query.
Suppose we have a table with data
create table graph_data(dt date, val number, radius number);
insert into graph_data(dt, val, radius) values (to_date('01.01.2020','dd.mm.yyyy'), 12, 3);
insert into graph_data(dt, val, radius) values (to_date('02.01.2020','dd.mm.yyyy'), 15, 4);
insert into graph_data(dt, val, radius) values (to_date('05.01.2020','dd.mm.yyyy'), 17, 5);
insert into graph_data(dt, val, radius) values (to_date('06.01.2020','dd.mm.yyyy'), 13, 6);
insert into graph_data(dt, val, radius) values (to_date('08.01.2020','dd.mm.yyyy'), 3, 7);
insert into graph_data(dt, val, radius) values (to_date('10.01.2020','dd.mm.yyyy'), 20, 8);
insert into graph_data(dt, val, radius) values (to_date('11.01.2020','dd.mm.yyyy'), 18, 9);
dt is the date of relevance,
val is a numerical indicator, the dynamics of which we visualize over time,
radius is another numerical indicator that we will draw in the form of a circle with such a radius.
Let's say a few words about the SVG format. It is a vector graphics format that can be viewed in modern browsers and converted to other graphic formats. In it, among other things, you can draw lines, circles and write text:
<line x1="94" x2="94" y1="15" y2="675" style="stroke:rgb(150,255,255); stroke-width:1px"/>
<circle cx="30" cy="279" r="3" style="fill:rgb(255,0,0)"/>
<text x="7" y="688" font-size="10" fill="rgb(0,150,255)">2020-01-01</text>
Below is an SQL query to Oracle that plots a graph from the data in this table. Here the const subquery contains various constant settings - image size, number of labels on the chart axes, line and circle colors, font sizes, etc. In the gd1 subquery, we convert the data from the graph_data table to the x and y coordinates in the figure. The gd2 subquery remembers the previous points in time, from which the lines need to be drawn to new points. The 'header' block is the header of the image with a white background. The 'vertical lines' block draws vertical lines. The 'dates under vertical lines' block labels dates on the x-axis. The 'horizontal lines' block draws horizontal lines. The 'values ββnear horizontal lines' block labels the values ββon the y-axis. The 'circles' block draws circles of the specified radius in the graph_data table.The 'graph data' block builds a graph of the dynamics of the val indicator from the graph_data table from the lines. The 'footer' block adds a trailing tag.
with const as
(select 700 viewbox_width,
700 viewbox_height,
30 left_margin,
30 right_margin,
15 top_margin,
25 bottom_margin,
max(dt) - min(dt) + 1 num_vertical_lines,
11 num_horizontal_lines,
'rgb(150,255,255)' stroke_vertical_lines,
'1px' stroke_width_vertical_lines,
10 font_size_dates,
'rgb(0,150,255)' fill_dates,
23 x_dates_pad,
13 y_dates_pad,
'rgb(150,255,255)' stroke_horizontal_lines,
'1px' stroke_width_horizontal_lines,
10 font_size_values,
'rgb(0,150,255)' fill_values,
4 x_values_pad,
2 y_values_pad,
'rgb(255,0,0)' fill_circles,
'rgb(51,102,0)' stroke_graph,
'1px' stroke_width_graph,
min(dt) min_dt,
max(dt) max_dt,
max(val) max_val
from graph_data),
gd1 as
(select graph_data.dt,
const.left_margin +
(const.viewbox_width - const.left_margin - const.right_margin) *
(graph_data.dt - const.min_dt) / (const.max_dt - const.min_dt) x,
const.viewbox_height - const.bottom_margin -
(const.viewbox_height - const.top_margin - const.bottom_margin) *
graph_data.val / const.max_val y,
graph_data.radius
from graph_data, const),
gd2 as
(select dt,
round(nvl(lag(x) over(order by dt), x)) prev_x,
round(x) x,
round(nvl(lag(y) over(order by dt), y)) prev_y,
round(y) y,
radius
from gd1)
/* header */
select '<?xml version="1.0" encoding="UTF-8" standalone="no"?>' txt
from dual
union all
select '<svg version="1.1" width="' || viewbox_width || '" height="' ||
viewbox_height || '" viewBox="0 0 ' || viewbox_width || ' ' ||
viewbox_height ||
'" style="background:yellow" baseProfile="full" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ev="http://www.w3.org/2001/xml-events">'
from const
union all
select '<title>Test graph</title>'
from dual
union all
select '<desc>Test graph</desc>'
from dual
union all
select '<rect width="' || viewbox_width || '" height="' || viewbox_height ||
'" style="fill:white" />'
from const
union all
/* vertical lines */
select '<line x1="' ||
to_char(round(left_margin +
(viewbox_width - left_margin - right_margin) *
(level - 1) / (num_vertical_lines - 1))) || '" x2="' ||
to_char(round(left_margin +
(viewbox_width - left_margin - right_margin) *
(level - 1) / (num_vertical_lines - 1))) || '" y1="' ||
to_char(round(top_margin)) || '" y2="' ||
to_char(round(viewbox_height - bottom_margin)) || '" style="stroke:' ||
const.stroke_vertical_lines || '; stroke-width:' ||
const.stroke_width_vertical_lines || '"/>'
from const
connect by level <= num_vertical_lines
union all
/* dates under vertical lines */
select '<text x="' ||
to_char(round(left_margin +
(viewbox_width - left_margin - right_margin) *
(level - 1) / (num_vertical_lines - 1) - x_dates_pad)) ||
'" y="' ||
to_char(round(viewbox_height - bottom_margin + y_dates_pad)) ||
'" font-size="' || font_size_dates || '" fill="' || fill_dates || '">' ||
to_char(min_dt + level - 1, 'yyyy-mm-dd') || '</text>'
from const
connect by level <= num_vertical_lines
union all
/* horizontal lines */
select '<line x1="' || to_char(round(left_margin)) || '" x2="' ||
to_char(round(viewbox_width - right_margin)) || '" y1="' ||
to_char(round(top_margin +
(viewbox_height - top_margin - bottom_margin) *
(level - 1) / (num_horizontal_lines - 1))) || '" y2="' ||
to_char(round(top_margin +
(viewbox_height - top_margin - bottom_margin) *
(level - 1) / (num_horizontal_lines - 1))) ||
'" style="stroke:' || const.stroke_horizontal_lines ||
'; stroke-width:' || const.stroke_width_horizontal_lines || '"/>'
from const
connect by level <= num_horizontal_lines
union all
/* values near horizontal lines */
select '<text text-anchor="end" x="' ||
to_char(round(left_margin - x_values_pad)) || '" y="' ||
to_char(round(viewbox_height - bottom_margin -
(viewbox_height - top_margin - bottom_margin) *
(level - 1) / (num_horizontal_lines - 1) +
y_values_pad)) || '" font-size="' || font_size_values ||
'" fill="' || fill_values || '">' ||
to_char(round(max_val / (num_horizontal_lines - 1) * (level - 1), 2)) ||
'</text>'
from const
connect by level <= num_horizontal_lines
union all
/* circles */
select '<circle cx="' || to_char(gd2.x) || '" cy="' || to_char(gd2.y) ||
'" r="' || gd2.radius || '" style="fill:' || const.fill_circles ||
'"/>'
from gd2, const
union all
/* graph data */
select '<line x1="' || to_char(gd2.prev_x) || '" x2="' || to_char(gd2.x) ||
'" y1="' || to_char(gd2.prev_y) || '" y2="' || to_char(gd2.y) ||
'" style="stroke:' || const.stroke_graph || '; stroke-width:' ||
const.stroke_width_graph || '"/>'
from gd2, const
union all
/* footer */
select '</svg>' from dual;
The query result can be saved to a file with the * .svg extension and viewed in a browser. If you want, you can use any of the utilities to convert it to other graphic formats, place it on the web pages of your application, etc.
The result is the following picture:
Oracle Metadata Search Application
Imagine trying to find something in the source code on Oracle by looking at information on several servers at once. This is about searching through Oracle data dictionary objects. The work place for the search is the web interface, where the user-programmer enters the search string and selects the checkboxes on which Oracle servers to perform this search.
The web search engine is able to search for a row in Oracle server objects simultaneously in several different databases of the bank. For example, you can search for:
- Oracle 61209, ?
- accounts ( .. database link)?
- , , ORA-20001 β β?
- IX_CLIENTID - SQL-?
- - ( .. database link) , , , ..?
- - - ? .
- Oracle ? , wm_concat Oracle. .
- - , , ? , Oracle sys_connect_by_path, regexp_instr push_subq.
Based on the search results, the user is given information on which server in the code of which functions, procedures, packages, triggers, views, etc. found the required results.
Let's describe how such a search engine is implemented.
The client side is not complicated. The web interface receives the search string entered by the user, the list of servers to search for, and the user's login. The web page passes them to an Oracle stored procedure on the handler server. The history of requests to the search engine, i.e. who executed which request is logged just in case.
After receiving a search query, the server side on the Oracle search server runs several procedures in parallel jobs that scan the following data dictionary views on the database links on the selected Oracle servers in search of the desired string: dba_col_comments, dba_jobs, dba_mviews, dba_objects, dba_scheduler_jobs, dba_babsource, dba_scheduler_jobs, dba_ba_bacource , dba_views. Each of the procedures, if found something, writes the found in the search results table (with the corresponding search query ID).
When all search procedures have finished, the client part gives the user everything that is written in the search results table with the corresponding search query ID.
But that's not all. In addition to searching in the Oracle data dictionary, the search in the Informatica PowerCenter repository was also screwed into the described mechanism. Informatica PowerCenter is a popular ETL tool used by Sberbank to load various information into data warehouses. Informatica PowerCenter has an open, well-documented repository structure. On this repository, it is possible to search for information in the same way as in the Oracle data dictionary. What tables and fields are used in the download code developed with Informatica PowerCenter? What can be found in port transforms and explicit SQL queries? All this information is available in the structures of the repository and can be found. For PowerCenter connoisseurs, I'll write that our search engine scans the following repository locations in search of mappings, sessions or workflows,containing the search string somewhere: sql override, mapplet attributes, ports, source definitions in mappings, source definitions, target definitions in mappings, target_definitions, mappings, mapplets, workflows, worklets, sessions, commands, expression ports, session instances, source definition fields, target definition fields, email tasks.
: , SberProfi DWH/BigData.
SberProfi DWH/BigData , Hadoop, Teradata, Oracle DB, GreenPlum, BI Qlik, SAP BO, Tableau .