Spring: Speeding Up Database Writing with XML

Hello!



This article will discuss how to speed up the writing of large amounts of information to a relational database for applications written using Spring Boot. When writing a large number of rows at a time, Hibernate inserts them one at a time, which leads to significant waiting if there are many rows. Let's consider a case of how to get around this.



We are using the Spring Boot application. As a DBMS -> MS SQL Server, as a programming language - Kotlin. Of course, there will be no difference for Java.



Entity for the data we need to write:



@Entity
@Table(schema = BaseEntity.schemaName, name = GoodsPrice.tableName)
data class GoodsPrice(

        @Id
        @Column(name = "GoodsPriceId")
        @GeneratedValue(strategy =  GenerationType.IDENTITY)
        override val id: Long,

        @Column(name = "GoodsId")
        val goodsId: Long,

        @Column(name = "Price")
        val price: BigDecimal,

        @Column(name = "PriceDate")
        val priceDate: LocalDate
): BaseEntity(id) {
        companion object {
                const val tableName: String = "GoodsPrice"
        }
}


SQL:



CREATE TABLE [dbo].[GoodsPrice](
	[GoodsPriceId] [int] IDENTITY(1,1) NOT NULL,
	[GoodsId] [int] NOT NULL,
	[Price] [numeric](18, 2) NOT NULL,
	[PriceDate] nvarchar(10) NOT NULL,
 CONSTRAINT [PK_GoodsPrice] PRIMARY KEY(GoodsPriceId))


As a demo example, we will assume that we need to record 20,000 and 50,000 records each.



Let's create a controller that will generate data and transfer it for recording and log the time:



@RestController
@RequestMapping("/api")
class SaveDataController(private val goodsPriceService: GoodsPriceService) {

    @PostMapping("/saveViaJPA")
    fun saveDataViaJPA(@RequestParam count: Int) {
        val timeStart = System.currentTimeMillis()
        goodsPriceService.saveAll(prepareData(count))
        val secSpent = (System.currentTimeMillis() - timeStart) / 60
        logger.info("Seconds spent : $secSpent")
    }

    private fun prepareData(count: Int) : List<GoodsPrice> {
        val prices = mutableListOf<GoodsPrice>()
        for (i in 1..count) {
            prices.add(GoodsPrice(
                    id = 0L,
                    priceDate = LocalDate.now().minusDays(i.toLong()),
                    goodsId = 1L,
                    price = BigDecimal.TEN
            ))
        }
        return prices
    }
    companion object {
        private val logger = LoggerFactory.getLogger(SaveDataController::class.java)
    }
}


We will also create a service for writing data and a repository GoodsPriceRepository



@Service
class GoodsPriceService(
        private val goodsPriceRepository: GoodsPriceRepository
) {

    private val xmlMapper: XmlMapper = XmlMapper()

    fun saveAll(prices: List<GoodsPrice>) {
        goodsPriceRepository.saveAll(prices)
    }
}


After that, we will sequentially call our saveDataViaJPA method for 20,000 records and 50,000 records.



Console:



Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
Hibernate: insert into dbo.GoodsPrice (GoodsId, Price, PriceDate) values (?, ?, ?)
2020-11-10 19:11:58.886  INFO 10364 --- [  restartedMain] xmlsave.controller.SaveDataController    : Seconds spent : 63


The problem is that Hibernate tried to insert each row in a separate query, that is, 20,000 times. And on my machine it took 63 seconds.



For 50,000 entries 166 sec.



Solution



What can be done? The main idea is that we will write through the buffer table:



@Entity
@Table(schema = BaseEntity.schemaName, name = SaveBuffer.tableName)
data class SaveBuffer(

        @Id
        @Column(name = "BufferId")
        @GeneratedValue(strategy =  GenerationType.IDENTITY)
        override val id: Long,

        @Column(name = "UUID")
        val uuid: String,

        @Column(name = "xmlData")
        val xmlData: String
): BaseEntity(id) {
        companion object {
                const val tableName: String = "SaveBuffer"
        }
}


SQL script for table in database



CREATE TABLE [dbo].[SaveBuffer](
	[BufferId] [int] IDENTITY NOT NULL,
	[UUID] [varchar](64) NOT NULL,
	[xmlData] [xml] NULL,
 CONSTRAINT [PK_SaveBuffer] PRIMARY KEY (BufferId))


Add a method to SaveDataController:



@PostMapping("/saveViaBuffer")
    fun saveViaBuffer(@RequestParam count: Int) {
        val timeStart = System.currentTimeMillis()
        goodsPriceService.saveViaBuffer(prepareData(count))
        val secSpent = (System.currentTimeMillis() - timeStart) / 60
        logger.info("Seconds spent : $secSpent")
    }


Let's also add a method to the GoodsPriceService:



@Transactional
    fun saveViaBuffer(prices: List<GoodsPrice>) {
        val uuid = UUID.randomUUID().toString()
        val values = prices.map {
            BufferDTO(
                    goodsId = it.goodsId,
                    priceDate = it.priceDate.format(DateTimeFormatter.ISO_DATE),
                    price = it.price.stripTrailingZeros().toPlainString()
            )
        }
        bufferRepository.save(
                    SaveBuffer(
                            id = 0L,
                            uuid = uuid,
                            xmlData = xmlMapper.writeValueAsString(values)
                    )
            )
        goodsPriceRepository.saveViaBuffer(uuid)
        bufferRepository.deleteAllByUuid(uuid)
    }


To write, we will first generate a unique uuid to distinguish the current data that we are writing. Next, we write our data into the created buffer with text in the form of xml. That is, there will be not 20,000 inserts, but only 1.



And after that we transfer data from the buffer to the GoodsPrice table with one query like Insert into ... select.



GoodsPriceRepository with saveViaBuffer method:



@Repository
interface GoodsPriceRepository: JpaRepository<GoodsPrice, Long> {
    @Modifying
    @Query("""
    insert into dbo.GoodsPrice(
	GoodsId,
	Price,
	PriceDate
	)
	select res.*
	from dbo.SaveBuffer buffer
		cross apply(select temp.n.value('goodsId[1]', 'int') as GoodsId
			, temp.n.value('price[1]', 'numeric(18, 2)') as Price
			, temp.n.value('priceDate[1]', 'nvarchar(10)') as PriceDate
			from buffer.xmlData.nodes('/ArrayList/item') temp(n)) res
			where buffer.UUID = :uuid
    """, nativeQuery = true)
    fun saveViaBuffer(uuid: String)
}


And at the end, in order not to store duplicate information in the database, we delete the data from the buffer by uuid.



Let's call our saveViaBuffer method for 20,000 lines and 50,000 lines:



Hibernate: insert into dbo.SaveBuffer (UUID, xmlData) values (?, ?)
Hibernate: insert into dbo.SaveBuffer (UUID, xmlData) values (?, ?)
Hibernate: insert into dbo.SaveBuffer (UUID, xmlData) values (?, ?)
Hibernate: insert into dbo.SaveBuffer (UUID, xmlData) values (?, ?)
Hibernate: 
    insert into dbo.GoodsPrice(
	GoodsId,
	Price,
	PriceDate
	)
	select res.*
	from dbo.SaveBuffer buffer
		cross apply(select temp.n.value('goodsId[1]', 'int') as GoodsId
			, temp.n.value('price[1]', 'numeric(18, 2)') as Price
			, temp.n.value('priceDate[1]', 'nvarchar(10)') as PriceDate
			from buffer.xmlData.nodes('/ArrayList/item') temp(n)) res
			where buffer.UUID = ?
    
Hibernate: select savebuffer0_.BufferId as bufferid1_1_, savebuffer0_.UUID as uuid2_1_, savebuffer0_.xmlData as xmldata3_1_ from dbo.SaveBuffer savebuffer0_ where savebuffer0_.UUID=?
Hibernate: delete from dbo.SaveBuffer where BufferId=?
Hibernate: delete from dbo.SaveBuffer where BufferId=?
Hibernate: delete from dbo.SaveBuffer where BufferId=?
Hibernate: delete from dbo.SaveBuffer where BufferId=?
2020-11-10 20:01:58.788  INFO 7224 --- [  restartedMain] xmlsave.controller.SaveDataController    : Seconds spent : 13


As you can see from the results, we got a significant acceleration of data recording.

For 20,000 records, 13 seconds was 63.

For 50,000 records, 27 seconds were 166.



Link to test project



All Articles