Seven Practical Tips for Bulk Loading Data in PostgreSQL

Free translation of the article "7 Best Practice Tips for PostgreSQL Bulk Data Loading"



Sometimes it becomes necessary to load a large amount of data into the PostgreSQL database in a few simple steps. This practice is commonly referred to as bulk import, where one or more large files serve as the data source. This process can sometimes be unacceptably slow. There are several reasons for this poor performance. Indexes, triggers, foreign and primary keys, or even writing WAL files can cause delays.



In this article, we will provide some practical tips for bulk importing data into PostgreSQL databases. However, there may be situations where none of them is an effective solution to the problem. We encourage readers to consider the merits and demerits of any method before applying it.



Tip 1. Putting the target table in non-logged mode



In PostgreSQL9.5 and later, the target table can be set to non-logged mode, and returned to logged mode after loading the data.



ALTER TABLE <target table> SET UNLOGGED;
<bulk data insert operations…>
ALTER TABLE <target table> LOGGED;


, PostgreSQL (WAL). . , , PostgreSQL . PostgreSQL .



, – . . , , , .



:



  • c ;
  • ;
  • , , .


2.



. , , .



, -, , . , , , , .



DROP INDEX <index_name1>, <index_name2> … <index_name_n>
<bulk data insert operations…>
CREATE INDEX <index_name> ON <target_table>(column1, …,column n)


maintenance_work_mem. .



, . : . , , .



3.



, , – , . PostgreSQL .



, , . , , , .



ALTER TABLE <target_table> 
    DROP CONSTRAINT <foreign_key_constraint>;
BEGIN TRANSACTION;
    <bulk data insert operations…>
COMMIT;
ALTER TABLE <target_table> 
    ADD CONSTRAINT <foreign key constraint>  
    FOREIGN KEY (<foreign_key_field>) 
    REFERENCES <parent_table>(<primary key field>)...;


, maintenance_work_mem .



4.



INSERT DELETE ( ) . , , , .



, . , .



ALTER TABLE <target table> DISABLE TRIGGER ALL;
<bulk data insert operations…>
ALTER TABLE <target table> ENABLE TRIGGER ALL;


5. COPY



PostgreSQL – COPY . COPY . , INSERT INSERT- VALUE



COPY <target table> [( column1>, … , <column_n>)]
    FROM  '<file_name_and_path>' 
    WITH  (<option1>, <option2>, … , <option_n>)


COPY:



  • , , ;
  • ;
  • ;
  • WHERE.


6. INSERT VALUE



INSERT – . , INSERT , , WAL.



INSERT VALUE .



INSERT INTO <target_table> (<column1>, <column2>, …, <column_n>) 
VALUES 
    (<value a>, <value b>, …, <value x>),
    (<value 1>, <value 2>, …, <value n>),
    (<value A>, <value B>, …, <value Z>),
    (<value i>, <value ii>, …, <value L>),
    ...;


INSERT VALUES . .



, , , PostgreSQL INSERT VALUES. INSERT, RAM , .



effective_cache_size 50%, shared_buffer 25% . , , INSERT VALUES, 1000 .



7. ANALYZE



, ANALYZE . , , . , . ANALYZE .





Bulk import of data for database applications does not happen every day, but it does affect query performance. That is why it is imperative to reduce loading times as much as possible. One thing DBA can do to minimize the possibility of any surprises is to perform load optimization in a test environment with a similar server and PostgreSQL configured in a similar way. There are different scenarios for loading data and it would be best to try each method and choose one that works well.




All Articles