Parallel queries in Kotlin to automate data collection

Hello! In my work, I often use Kotlin for automation. My activity is not directly related to programming, but Kotlin greatly simplifies work tasks.





Recently it was necessary to collect data of a rather large size in order to do the analysis, so I decided to write a small script to get the data and save it to Excel. There were no problems with the last point - I read about Apache POI, took a couple of examples from the official documentation, modifying it for myself. The same cannot be said about Internet requests.





The source returned in batches of json and it was necessary to collect these "batches" somehow quickly, converting them to text and writing a table to a file.





Asynchronous method

I decided to start with a simple asynchronization. Having poked around HttpUrlConnection a little, I sent it to where it belongs, replacing it with HttpClient from Java.





For tests I took the service https://jsonplaceholder.typicode.com/ , which was suggested to me by one familiar developer. I saved a link that issues Json with comments to a variable, so as not to duplicate and start tests.





const val URL = "https://jsonplaceholder.typicode.com/comments"
      
      



The function was ready and even working. The data came in.





fun getDataAsync(url: String): String? {
    val httpClient = HttpClient.newBuilder()
        .build()
    val httpRequest = HttpRequest.newBuilder()
        .uri(URI.create(link)).build()

    return httpClient.sendAsync(httpRequest, BodyHandlers.ofString())
        .join().body()
}
      
      



Now it was necessary to check the speed of work. Armed with measureTimeMillis, I ran the code.





val asyncTime = measureTimeMillis { 
    val res = (1..10)
        .toList()
        .map {getDataAsync("$URL/$it")}
    res.forEach { println(it) }
}
println("   $asyncTime ")
      
      



Everything worked as it should, but I wanted it faster. After a little digging on the Internet, I came across a solution in which tasks are performed in parallel.





Parallel Map

, . , , .





suspend fun <A, B> Iterable<A>.pmap(f: suspend (A) -> B): List<B> =
    coroutineScope {
        map { async { f(it) } }.awaitAll()
    }
      
      



, ( Iterable) pmap, . A. async , .awaitAll() . suspend, .





, , - .





val parmapTime = measureTimeMillis {
    runBlocking {
        val res = (1..10)
            .toList()
            .pmap { getDataAsync("$URL/$it") }
        println(mapResult)
    }
}
println(" pmap $parmapTime ")
      
      



- 1523, . map async, .





Parallel Map v 2.0

, , .





suspend fun <T, V> Iterable<T>.parMap(func: suspend (T) -> V): Iterable<V> =
    coroutineScope {
        map { element -> 
            async(Dispatchers.IO) { func(element) } 
        }.awaitAll() 
    }

val parMapTime = measureTimeMillis {
    runBlocking {
        val res = (1..10)
            .toList()
            .parMap { getDataAsync("$URL/$it") }
    }
    println(res)
}
println(" map  $parMapTime ")
      
      



Dispatchers.IO 2 ~ 610 . ! ( , excel ..) . , - .





Java ParallelStream

, stackowerflow parallelStream. , IDEA.





val javaParallelTime = measureTimeMillis { 
    val res = (1..10).toList()
        .parallelStream()
        .map { getDataAsync("$URL/$it") }
    res.forEach { println(it) }
}
println("Java parallelSrtream  $javaParallelTime ")
      
      



, . , . stream . , , , "" , Json.





, - , async . .





The results can be seen in the table below. For myself, I definitely decided to leave async await . Mainly because of the simpler error handling, of course. And there is no need to go beyond the coroutines here.





Method





Time (ms)





Asynchronous method





1487





Pmap implementation from the web





1523





My option is parallelMap





610





Java.parallelStream





578





In the future, there are thoughts to arrange this in a small library and use it for personal purposes, and of course rewrite it all from the "Hindu code" to the human one, as long as there are enough possibilities. And then upload it all to vds.





I hope my experience is useful to someone. I would be glad to receive constructive criticism and advice! Thanks to all








All Articles