Pixinsight Performance and Swap Files Pleiades Astrophoto PixInsight · Jim Raskett · ... · 15 · 1529 · 6

Juno16 5.01
...
· 
·  10 likes
·  Share link
Hi PI users,

I know that many AP’ers here use Pixinsight (PI) for image processing. I just want to pass on what I learned recently about PI performance that you may or may not know.

This might be old news to most of you, but it was a very cool find for me. 

PI users know that PI consumes quite a bit of system resources. High end cpu’s, ram memory, big video cards, and fast storage disks all contribute to the performance of PI.

The performance of PI is so important, that many folks spend many thousands of dollars on a stacked high-performance pc (laptop or desktop) to run PI.
Understandably so. Some PI processes and scripts take many minutes to hours to run.

My son and I built my desktop about 2 ½ years ago on a budget.
I kept the budget at $700, but I really just needed the “box” as I had a monitor, keyboard, and 500GB ssd (sata).
I have added a few things the past few years, a ton of storage (10.5TB total) including a nice 2TB nvme m.2 (gen3) boot drive. Using the same 4 ½ year old cpu.

Recently (Christmas) I received a really nice Christmas gift (that I asked for) of 64GB of DDR4 3200 ram. It was an upgrade from the 32GB of DDR4 2667 ram that I had before. Faster and double the capacity.

Many of you PI users are familiar with the benchmark in PI to check performance. It is a script called Pixinsight Benchmark nested under Benchmarks.

I don’t have a benchmark before the ram upgrade, I believe that it was a modest boost (+6-8%), but I have learned a little bit about the PI swap files. It seems that even though PI thrives on huge fast ram memory, it also depends heavily on swap files which seems odd if large amounts of free ram memory are available.
Swap files are temporary files/folders that PI uses (exclusive of ram) to process and temporarily store data. The default PI swap file is created on the local “C” drive.

I read several posts on CN post about increasing the number of swap files. Seems strange, but if you do an internet search, you will find many references to using multiple swap files to increase PI performance.
It helps to have a fast storage drive (nvme or ssd) to create the swap files on, but it also seems that PI performance can be enhanced by creating multiple swap files on even conventional spinning drives.

Below are some tests that I did recently and the results did surprise me.
Yes, spending lots of money on better, faster hardware components can surely increase the performance of PI, but for free, you can create extra swap files that can increase performance to. Sometimes significantly.

Under the EDIT drop down menu, you will find GLOBAL PREFERENCES. At PREFERENCES, select Directories and Network. From there, you can see the default Swap storage directories.


This was my PI benchmark score using the single default swap file directory (12456).
I created six folders on my C: drive (nvme) to use as temporary swap files (I really don't think that it matter where you create the folders.
Here is the benchmark with no change except using six swap storage directories (15635). A 26% increase in the PI benchmark over the single default swap file directory.
I created a 16GB ramdrive (out of my 64GB of total ram) using free software and dedicated it to swap file directories.
The benchmark went up to 16236 a 30% increase in the PI benchmark from using the single default swap file directory. A very significant increase for free!

I have since tested both the nvme and ramdrive with four swap file directories and the performance was basically the same as with six swap file directories. So, some fine tuning can help.

Your mileage may vary depending on your hardware.

Hope that this is helpful to someone!

Jim
Edited ...
Like
DalePenkala 19.38
...
· 
·  1 like
·  Share link
I currently use 8 swap files myself. I had read in the past that 6 was a good happy medium but you could play around with that number. I did some tests myself and ended up with 8 and now my benchmark is consistently in the 25k-26k range. For me this is fast but I know many here are in the 30k range.
I’m sure there are other things one can do for another tweak to squeak out a bit more but I was happy with my performance boost and just left it.

Dale
Like
Juno16 5.01
Topic starter
...
· 
·  1 like
·  Share link
Dale Penkala:
I currently use 8 swap files myself. I had read in the past that 6 was a good happy medium but you could play around with that number. I did some tests myself and ended up with 8 and now my benchmark is consistently in the 25k-26k range. For me this is fast but I know many here are in the 30k range.
I’m sure there are other things one can do for another tweak to squeak out a bit more but I was happy with my performance boost and just left it.

Dale

Thanks Dale. I really didn’t have any idea about increasing the number of swap files until I upgraded ram memory recently.
I did some internet searches and read about performance increases by increasing the number of swap files and creating a ram drive.
Just wanted to let anyone who was unaware (like myself) of the benifits of increasing the number of swap files from the default PI setting of one. Heck, can’t beat the price!
Thanks for sharing your experience. I will try 8 just to see.

Jim
Like
DalePenkala 19.38
...
· 
·  1 like
·  Share link
Thanks for sharing your experience. I will try 8 just to see.


Just thought I'd add to your post Jim. Let us know what your change is by adding the extra 2 folders. It will be interesting what they do for you as well.

Agreed! Can't beat the price!

Dale
Like
jiberjaber 0.00
...
· 
·  Share link
I've been playing around with a system I have just built my conclusion is that through a combination of changing the threads for read & write and some ram disks for swap seems to be the best I can get at the moment.  The new system is 11x faster in the bench mark going from 3000 to about 35,000 total so I am more than happy but of course there's always that little bit of interest in whats the max I can get it to!

So from my experiments using combinations of number of swap directories and locations.
My optimum is 4 swap directories with 25 read and 25 write max threads. This gives me an average total of 36,340 over 3 benchmark runs.


Problem is if I repeat this later on it might give a lower 32000 for some reason. 

4 seperate directories on a M2 drive is close but still less performance, spread over multiple M2 drives seems to be the killer soluton though (not my system here)  https://pixinsight.com/benchmark/benchmark-report.php?sn=H21148386W4QCLU7KCNI4D1IF78TZ09V

Adding or subtracting swap directories or increasing disk size has small to medium detrimental impact on the total score for me.

Here's one of my benchmarks with 8G swap drives (ave score over 3 tests was 35,410), I can't find the 4G one I ran at the moment.  
https://pixinsight.com/benchmark/benchmark-report.php?sn=B7JW915K202BIG8Q41YH0OTJ6CHV0S07

I am tempted to put on a dual boot Linux and see what that brings, I understand it is quite a performance improvement.

Jason
Like
WhooptieDo 10.40
...
· 
·  1 like
·  Share link
Jason:
I've been playing around with a system I have just built my conclusion is that through a combination of changing the threads for read & write and some ram disks for swap seems to be the best I can get at the moment.  The new system is 11x faster in the bench mark going from 3000 to about 35,000 total so I am more than happy but of course there's always that little bit of interest in whats the max I can get it to!

So from my experiments using combinations of number of swap directories and locations.
My optimum is 4 swap directories with 25 read and 25 write max threads. This gives me an average total of 36,340 over 3 benchmark runs.


Problem is if I repeat this later on it might give a lower 32000 for some reason. 

4 seperate directories on a M2 drive is close but still less performance, spread over multiple M2 drives seems to be the killer soluton though (not my system here)  https://pixinsight.com/benchmark/benchmark-report.php?sn=H21148386W4QCLU7KCNI4D1IF78TZ09V

Adding or subtracting swap directories or increasing disk size has small to medium detrimental impact on the total score for me.

Here's one of my benchmarks with 8G swap drives (ave score over 3 tests was 35,410), I can't find the 4G one I ran at the moment.  
https://pixinsight.com/benchmark/benchmark-report.php?sn=B7JW915K202BIG8Q41YH0OTJ6CHV0S07

I am tempted to put on a dual boot Linux and see what that brings, I understand it is quite a performance improvement.

Jason



That's a way better benchmark than I've ever seen.    I played around alot with it when I learned.... found that some hard drives slow down the overall performance.   I have all SSD's in my system, alot of them in fact, but some are SATA.  Adding swap directories on those killed the performance.   Kept all the swap files on the NVME's and we're golden.   I'm about a generation behind now in my current system with a 12900K, but the scores are still very reasonable.  32GB RAM and RTX 3090 for the AI stuff.    Gotta be honest though, I barely felt a difference in performance after optimizing the swap files.
7f346eb861f93c4ac794b20c017bf150[1].png


Ryzen definitely works alot better for this setup, obvious reasons.   And it seems the best benched setups are on Linux.
Edited ...
Like
Warhen 2.11
...
· 
·  1 like
·  Share link
Swap files primarily help with maintaining the histories of open views or projects, using the SWAP directory rather than RAM. Once PI is closed, Swap files may be deleted; unfortunately, PI doesn't do this automatically. Periodically check your Swap folder, it may be huge! A NVMe or SSD should be used. It's recommended to use a single directory/location, naming it multiple times based on the performance of the Benchmark script. Having worked on a few people's systems including my own, this seems to be a great place to start: double the physical cores and add another dozen (IOW, my I9 has 8 physical cores, so 28 total). YMMV; give it a try and good luck!
Like
Jeff_Reitzel 2.15
...
· 
·  1 like
·  Share link
Definitely use the benchmark to see how many swap files you need. Performance will start to drop after a certain number unique to your system. My old Ryzen7 system with a pair of 1tb Samsung 980 SSDs peaked with 8 swap files. The new system using 13th gen Core I9 and a pair of 4TB Samsung 990 SSD's peaks with 5 swap files. Lots of cores and fast SSD's are really key components to getting the best PI performance you can. The Nvidia 4090 GPU sure helps on processes CUDA acceleration affects but that is only a few add on processes. Current directory for me looks like this.
CS,
Jeff
Swap Setup.jpg
Edited ...
Like
pfile 3.10
...
· 
·  Share link
Warren A. Keller:
Swap files primarily help with maintaining the histories of open views or projects, using the SWAP directory rather than RAM. Once PI is closed, Swap files may be deleted; unfortunately, PI doesn't do this automatically. Periodically check your Swap folder, it may be huge! A NVMe or SSD should be used. It's recommended to use a single directory/location, naming it multiple times based on the performance of the Benchmark script. Having worked on a few people's systems including my own, this seems to be a great place to start: double the physical cores and add another dozen (IOW, my I9 has 8 physical cores, so 28 total). YMMV; give it a try and good luck!

well, more precisely PI is **supposed** to delete the swap files when exiting, but sometimes it doesn't, particularly of course when PI crashes. it is good advice to check the swap director(ies) from time to time (while PI is not running!!) to check for and delete orphaned swap files.
Like
Alan_Brunelle
...
· 
·  1 like
·  Share link
Warren A. Keller:
Swap files primarily help with maintaining the histories of open views or projects, using the SWAP directory rather than RAM. Once PI is closed, Swap files may be deleted; unfortunately, PI doesn't do this automatically. Periodically check your Swap folder, it may be huge! A NVMe or SSD should be used. It's recommended to use a single directory/location, naming it multiple times based on the performance of the Benchmark script. Having worked on a few people's systems including my own, this seems to be a great place to start: double the physical cores and add another dozen (IOW, my I9 has 8 physical cores, so 28 total). YMMV; give it a try and good luck!

In my experience, I second Warren's comments.  I tried the RAM drives to see what I could achieve.  Same result.  I don't consider a 20-30% increase in score that meaningful.  Computer nerds and gamers seem to be willing to pay big extra $$$ to get that kind of increase, but for processing images, at least not I.  The scores are great when we tune our machines, as you are finding out.  I actually find the times to completion for CPU and Swap to be a more intuitive read on performance.  And when you see the difference of 18 sec vs. 22 sec, you can judge just how important that difference is.  Otherwise its kind of like gear heads standing around bragging about how their upgraded valves have improved their top speed and they don't live anywhere near a road where they can step on the accelerator! 

I returned to using only my NVMe drives for PI.  Reason being is that after a while, I noticed a bit more instability in my system when running PI with the RAM drives.  I also use 64GB of RAM, and the issue may be that cutting into the 64 may not leave enough for PI.  In any case, the gain in acceleration was almost unnoticable and not worth the pain of having to repeat even one failed integration.  The comments on the PI page suggests that 32 is basically a minimum, and I was not aware of that at the time.
Edited ...
Like
Alan_Brunelle
...
· 
·  Share link
Jason:
I've been playing around with a system I have just built my conclusion is that through a combination of changing the threads for read & write and some ram disks for swap seems to be the best I can get at the moment.  The new system is 11x faster in the bench mark going from 3000 to about 35,000 total so I am more than happy but of course there's always that little bit of interest in whats the max I can get it to!


Jason, yes there are a lot of users of PI that are not aware of the need to do this, if they care.  You have a nice machine, it is good that you discovered how to get the best out of it.  If you review the many benchmarks on the PI benchmark page, you can see a good deal of high end machines getting pretty poor scores and most of it is down to the fact that many are not aware of the benefit of optimizing swap files.  My machine is 5 years old, a 9th generation i9, only 16 logical processors, only DDR4 running at 3200 and it still beats a good number of late generation machines listed.  If you look at the numbers there, you will see that it is almost all due to the swap performance.
Like
pfile 3.10
...
· 
·  1 like
·  Share link
Alan Brunelle:
Warren A. Keller:
Swap files primarily help with maintaining the histories of open views or projects, using the SWAP directory rather than RAM. Once PI is closed, Swap files may be deleted; unfortunately, PI doesn't do this automatically. Periodically check your Swap folder, it may be huge! A NVMe or SSD should be used. It's recommended to use a single directory/location, naming it multiple times based on the performance of the Benchmark script. Having worked on a few people's systems including my own, this seems to be a great place to start: double the physical cores and add another dozen (IOW, my I9 has 8 physical cores, so 28 total). YMMV; give it a try and good luck!

In my experience, I second Warren's comments.  I tried the RAM drives to see what I could achieve.  Same result.  I don't consider a 20-30% increase in score that meaningful.  Computer nerds and gamers seem to be willing to pay big extra $$$ to get that kind of increase, but for processing images, at least not I.  The scores are great when we tune our machines, as you are finding out.  I actually find the times to completion for CPU and Swap to be a more intuitive read on performance.  And when you see the difference of 18 sec vs. 22 sec, you can judge just how important that difference is.  Otherwise its kind of like gear heads standing around bragging about how their upgraded valves have improved their top speed and they don't live anywhere near a road where they can step on the accelerator! 

I returned to using only my NVMe drives for PI.  Reason being is that after a while, I noticed a bit more instability in my system when running PI with the RAM drives.  I also use 64GB of RAM, and the issue may be that cutting into the 64 may not leave enough for PI.  The comments on the PI page suggests that 32 is basically a minimum, and I was not aware of that at the time.

the benchmark isn't super accurate in the first place - if you read the code you'll see that each process that runs reports it's wall clock time. the script then computes the wall clock time from the start of the script to the end of the script, adds up all the process wall times, and subtracts that from the script runtime. it then attributes that number to disk. but in reality that number contains OS overheads and other delays that have nothing to do with the disk.

the big drawback of using a RAMdisk for swap space is that usually you can't allocate more than some double-digit number of GB. since PI stores the processing history of all open views in swap, you can pretty easily fill up the ramdisk, especially if you are using a big CMOS sensor.  PI usually doesn't recover very gracefully from a full swap folder. and i agree, the performance delta between that and a modern, fast SSD is not really enough to justify the risk/small swap space a RAMdisk gives you.

in the end what matters is real performance. tuning your system to the nth degree based on the benchmark is not really going to yield real world results. i think it's useful for figuring out how many cpu/io threads to use and how many swap directories to use, but stressing out about tiny improvements is not necessary.
Like
Alan_Brunelle
...
· 
·  Share link
Jim Raskett:
PI users know that PI consumes quite a bit of system resources. High end cpu’s, ram memory, big video cards, and fast storage disks all contribute to the performance of PI.

The performance of PI is so important, that many folks spend many thousands of dollars on a stacked high-performance pc (laptop or desktop) to run PI.


As with most computers, processing efficiencies between old CPUs and the latest, very much more expensive CPUs, does not seem to scale better than linearly.  In fact compared to the current price for the CPUs, the performance metrics, per dollar spent, actually are quite poor.  If you notice, each new release of the new iX processor generation is heralded with announcements of big increases in allowed clock speed, etc, etc.  Yet seeing any benchmark comparisons in real use, such as can be found on line shows increases in actual performances of a few 10s of a percent at best.  Guaranteed, you will not be paying just a few 10s of percent extra for a new generation processor vs. the previous one!  That is why any computer rag that has articles about building computers advises never to get the latest new release generation of CPU.  And likely applies to other computer hardware, such as GPUs (more on that later).  Good advice.  For PI users (who do not game), the difference in cpu process times on the PI benchmark page pretty much confirms this.  The biggest gains by far are to ensure that the swap folders and drives are optimized as you find here.

Regaring "big video cards", I have serious doubts about that.  My old computer runs an RTX 2070 4GB RAM card, not even close to the best at the time I built my computer.  For the few functions that benefit from GPU acceleration at this time, and there are only a few modules/scripts that can use GPU acceleration in PI at this time (nothing else in PI benefits from the GPU, understand that), I am not seeing any significant benefits of those "big video cards"  I would like to see a comparison of speeds to job completion between my lowly 2070 to newer faster GPUs.  But every time I see someone using those newer cards, they all say that it takes StarXTerminator about 3 minutes to complete a job.  Well, mine does that on a 12,000 X 8,000 pixel image in under 3 minutes.  So I believe that the makers of these Modules have not designed their software to take full advantage of the newer cards and therefore there is no significant advantage in buying expensive in this regard.  It may be that the makers want to ensure backward compatibility of the Modules to older machines and do not want to have to redesign the Modules to take advantage of newer functions yet have to maintain backward compatibility.  What "is" certain, is for these modules in PI, having any compatible GPU (Nvidia) is a typically 9X decrease in job completion time vs not having that acceleration.

I laud you for bringing these issues up here, since this has been discussed in another current forum topic about computers and I have brought up the fact that so many users of PI are not aware of the swap drive feature.  Your experience should be repeated by everyone.

You state that this has been discussed in forums from multiple astophotography sites.  This is true, but be aware that PI also has its information about the use of swap files on the application and on its web site archives as well.  I would go to those primary source at least first before going to anywhere else.  There are important tidbits.  In fact, I believe (but maybe incorrect because of faulty memory) that the PI information includes some advice that states one should not set up the swap files on classical rotating disks, or at least not multiple folders on a spinning disk.  The reason being is that PI activity may cause harm to the disk.  Don't anyone quote me on that!  But confirm yourselves.  Also, they do discuss the optimum number of folders, drives, etc. issue.  It is not determinitive information, but gives ideas of where to start and how to use the resources to come to your best usage.

Best and CS!
Edited ...
Like
Alan_Brunelle
...
· 
·  Share link
Alan Brunelle:
Warren A. Keller:
Swap files primarily help with maintaining the histories of open views or projects, using the SWAP directory rather than RAM. Once PI is closed, Swap files may be deleted; unfortunately, PI doesn't do this automatically. Periodically check your Swap folder, it may be huge! A NVMe or SSD should be used. It's recommended to use a single directory/location, naming it multiple times based on the performance of the Benchmark script. Having worked on a few people's systems including my own, this seems to be a great place to start: double the physical cores and add another dozen (IOW, my I9 has 8 physical cores, so 28 total). YMMV; give it a try and good luck!

In my experience, I second Warren's comments.  I tried the RAM drives to see what I could achieve.  Same result.  I don't consider a 20-30% increase in score that meaningful.  Computer nerds and gamers seem to be willing to pay big extra $$$ to get that kind of increase, but for processing images, at least not I.  The scores are great when we tune our machines, as you are finding out.  I actually find the times to completion for CPU and Swap to be a more intuitive read on performance.  And when you see the difference of 18 sec vs. 22 sec, you can judge just how important that difference is.  Otherwise its kind of like gear heads standing around bragging about how their upgraded valves have improved their top speed and they don't live anywhere near a road where they can step on the accelerator! 

I returned to using only my NVMe drives for PI.  Reason being is that after a while, I noticed a bit more instability in my system when running PI with the RAM drives.  I also use 64GB of RAM, and the issue may be that cutting into the 64 may not leave enough for PI.  The comments on the PI page suggests that 32 is basically a minimum, and I was not aware of that at the time.

the benchmark isn't super accurate in the first place - if you read the code you'll see that each process that runs reports it's wall clock time. the script then computes the wall clock time from the start of the script to the end of the script, adds up all the process wall times, and subtracts that from the script runtime. it then attributes that number to disk. but in reality that number contains OS overheads and other delays that have nothing to do with the disk.

the big drawback of using a RAMdisk for swap space is that usually you can't allocate more than some double-digit number of GB. since PI stores the processing history of all open views in swap, you can pretty easily fill up the ramdisk, especially if you are using a big CMOS sensor.  PI usually doesn't recover very gracefully from a full swap folder. and i agree, the performance delta between that and a modern, fast SSD is not really enough to justify the risk/small swap space a RAMdisk gives you.

in the end what matters is real performance. tuning your system to the nth degree based on the benchmark is not really going to yield real world results. i think it's useful for figuring out how many cpu/io threads to use and how many swap directories to use, but stressing out about tiny improvements is not necessary.

I completely agree that tuning to the nth degree is a waste of time.  The nice thing is that you can see your results compared to other similar machines, and get an idea pretty easily if you are working close to the best of those.  If true, be done with it and go process.  One thing that I have wondered is that there seems to be a rough correlation to CPU times to the Swap performance times.  And the ratio of those numbers seems to be similar for the listed machines that are working at their best.  Its seems to "roughly" hold regardless of the raw CPU number.  That may be one way to tell you if you are there, without having to compare your results with the rest of the world.
Edited ...
Like
pfile 3.10
...
· 
·  1 like
·  Share link
I completely agree that tuning to the nth degree is a waste of time.  The nice thing is that you can see your results compared to other similar machines, and get an idea pretty easily if you are working close to the best of those.  If true, be done with it and go process.  One thing that I have wondered is that there seems to be a rough correlation to CPU times to the Swap performance times.  And the ratio of those numbers seems to be similar for the listed machines that are working at their best.  Its seems to "roughly" hold regardless of the raw CPU number.  That may be one way to tell you if you are there, without having to compare your results with the rest of the world.

my guess is that SSDs are so fast that the amount of CPU time spent on the I/O transaction starts to become relevant vs. rotational disks which take forever and a day to complete compared to the CPU time spent.

on the other hand almost all such IO is done using DMA nowadays, but there might sometimes still be a data copy from kernel memory space to user memory space. though zero-copy DMA is also a thing. hard to know for sure without instrumenting everything and doing a real performance analysis. people report much higher IO speeds on linux (vs. windows), even when running PI under WSL. maybe this putative CPU bottleneck can be explained by that if somehow linux spends much less CPU time on IO operations (and the benchmarks you have noted are on windows.) all just a guess, though.
Like
Alan_Brunelle
...
· 
·  Share link
I completely agree that tuning to the nth degree is a waste of time.  The nice thing is that you can see your results compared to other similar machines, and get an idea pretty easily if you are working close to the best of those.  If true, be done with it and go process.  One thing that I have wondered is that there seems to be a rough correlation to CPU times to the Swap performance times.  And the ratio of those numbers seems to be similar for the listed machines that are working at their best.  Its seems to "roughly" hold regardless of the raw CPU number.  That may be one way to tell you if you are there, without having to compare your results with the rest of the world.

my guess is that SSDs are so fast that the amount of CPU time spent on the I/O transaction starts to become relevant vs. rotational disks which take forever and a day to complete compared to the CPU time spent.

on the other hand almost all such IO is done using DMA nowadays, but there might sometimes still be a data copy from kernel memory space to user memory space. though zero-copy DMA is also a thing. hard to know for sure without instrumenting everything and doing a real performance analysis. people report much higher IO speeds on linux (vs. windows), even when running PI under WSL. maybe this putative CPU bottleneck can be explained by that if somehow linux spends much less CPU time on IO operations (and the benchmarks you have noted are on windows.) all just a guess, though.

I am no expert on these matters beyond my experience.  And here I am stretching it!  I have noticed in the PI benchmark page that for some of the better numbers reported (some are from cloud computing machines) that they report data from the same machine with Linux and Windows.  Linux seems to win most of the time, but not always a huge difference.  BTW the best numbers seem to be reported from machines of Oracle Cloud computing and the server that PixInsight uses in support of their MARS program.  Clearly, these guys at PI know how to configure their machine for their software!
Like
 
Register or login to create to post a reply.