Any benefit in using Async when reading a file?

vivanov · August 29, 2024, 8:59pm

Hi. I read a file line by line and do some processing for each line. If I were to use Async to read line by line instead of ordinary reading a file via an Inchannel.input_line, will there be any difference in performance? Since there is a scheduler running I wonder if it anticipates that there will be another file read and let’s the system wait for a line read while doing some processing in parallel? That is, the application reads file and processes lines in parallel. I think since the file reading is buffered probably this effect isn’t large since the system reads ahead several lines of the file into memory anyway.

darrenldl · August 31, 2024, 1:56am

It depends on whether reading the file is the only thing your application is trying to do at that time.

If reading the file is all you do at that point, then in principle running an additional scheduler (e.g. Async, Lwt, Eio, Miou) will cause a slowdown, but since IO is very slow compared to CPU, the slowdown is negligible. There is no benefit in using Async or anything alike in this case, but also no very tangible disadvantage.

If reading the file is not all you do at that point, and the way you wrote your application allows other parts of the code to progress while the read operation is blocking on IO, then it’s likely you will get something done in “parallel” timeline wise.

Whether everything combined improves the overall performance or not depends on the actual workload pattern of your pipeline, i.e. you have Reading -> Processing:

If Processing stage processes at least as fast as Reading can read, then the pipeline is working as fast as it can, i.e. the pipeline is IO bound
- You generally cannot mitigate this without changing the underlying storage layer or minimising IO required (e.g. swapping to a more compact format, or loading compressed data into memory, then decompress it)
If Processing stage processes slower than Reading stage can read, then the pipeline is not optimal, i.e. pipeline is CPU bound
- To mitigate this you can scale your code vertically (make it just more performant directly), or scale it horizontally (multithreading), or both

If your application is consistently IO bound or CPU bound, then in principle buffer size doesn’t impact your performance consideration that much. But if workload pattern is not constant, i.e. some lines might be more CPU heavy than other lines, stalling the Processing stage, then the buffer can help avoid stalling the entire pipeline.

If Processing stage cannot catch up with Reading sporadically, then the buffer can allow Reading to continue if it has enough free space
If Reading stage cannot catch up with Processing sporadically, then the buffer can allow Processing to continue if has enough backlog built up

That is, the application reads file and processes lines in parallel. I think since the file reading is buffered probably this effect isn’t large since the system reads ahead several lines of the file into memory anyway.

I’m not quite sure what you mean by effect here.

Topic		Replies	Views
Eio library vs threads library for concurrent programming Learning	8	1066	October 5, 2023
Async or Lwt. Multiple source, single reader Learning lwt , async	11	854	May 26, 2023
Beginner async question Learning async	4	1976	December 27, 2019
Lwt alternative to async.pipe Ecosystem lwt , cohttp , async	3	971	February 13, 2021
Lwt_unix.read from a large file and scheduling Learning lwt , unix	5	2167	July 7, 2017

Any benefit in using Async when reading a file?

Related topics