Optimal data flow from remote observatory? Generic equipment discussions · Willem Jan Drijfhout · ... · 16 · 321 · 0

Wjdrijfhout 6.78
...
· 
·  Share link
Attached to one of the latest great images from @John Hayes, he described a recent improvement in his dataflow. After sharing some thoughts back and forth, we thought this might be a good topic to bring to the forum for a bit of a wider discussion and sharing of experience.

There are many different approaches. Some do preprocessing remotely, some keep a lot of data, others only keep final products, data is stored on NAS-based solutions, SSD's, cloud, etc. Looking forward to read other approaches.


Let me kick off the discussion by describing my own workflow. 

When I started imaging remotely, there were two main new challenges when it comes to data management. First is the volume of data. In my situation, I collect roughly 5-10GB of data per night of imaging for roughly 200-225 nights per year, so approximately 1.5-2TB. The second is that I can be away from home for extended periods, and when imaging keeps continuing, I'd like to be able to work on the data remotely as much as I can do it at home. I've gone through a few iterations, but have currently settled on the following dataflow.
(To keep things simple, <> = syncing, > = moving).
  1. Remote PC  <> Cloud storage <> Home PC
  2. Home PC > SSD
  3. Editing on the SSD
  4. When working remote, Home PC is replaced by Laptop. I bring the SSD along. So same workflow.


Ad 2. This is happening during the initial data review process. I just like to see each image before I put it in the 'to be processed' pile.

A couple of points of attention:
- Speed. SSD's these days are very fast. I use the Sandisk Pro-Blade system. Small sticks that hold up to 4TB each. You can simultaneously put 4 into a dock, speed is ~2000MB/s. Connect an individual stick to the PC and speed is still ~1000MB/s, very well workable.
- Backup. None of the steps above are backups, this is only a workflow. To safeguard data it is important to have a good backup strategy. In my case, my home PC and any SSD connected to it is automatically backed up to both a local copy (spinning drive) and in the cloud (special cloud backup service, very different than the Cloud storage).
- Cost. Cloud storage can get quite expensive. In this method Cloud is just a method for transfer. Data stays there only until I have had time to review the data and move it to an SSD. If I don't review data for two weeks, it contains two weeks of data. If I review every day, it never contains more than one night of imaging. And since all of it is synced, the same is true for my Remote PC and Home PC. SSD's have come down in price a lot in recent years, and very reasonably priced nowadays. So far I have everything on SSD's, but of course you can always opt to move older data from the SSD to spinning drives, which are much cheaper and have much higher capacity. 
- The weak point in this system is still the backup when working remote. One solution is to copy to SSD instead of move to SSD when working remote. In that case the data will remain on Home PC and its backups. But it means that Remote PC, Cloud and Home PC slowly build up more data. When it is just for a few weeks that is no problem. When coming back home however, this data needs to be manually deleted again.  

Looking forward to hear other setups and insights into pro's and con's of different solutions.
Like
hbastro
...
· 
·  1 like
·  Share link
I drive 3 hours up plug the SSD into each of three computers, download the data, and drive 3 hours back. Once or twice a month. most of the time I pull maintenance and stay overnight.. If guests come along with their rigs we'll stay a few days. It is very present driving there and being up there… DSL is just to slow to download, takes longer to download than th integrate. Now that I have added more solar panels for power I plan on bringing Starlink on line, that will be a game changer…
Edited ...
Like
carted2 4.17
...
· 
·  2 likes
·  Share link
I have a NAS and a Mac processing computer on site at my remote installation. I save everything to the NAS and then once I gather all my data I process it on my on-site Mac. I then upload my master files to Dropbox and do all my final processing at home. When I visit the remote location I backup the data from my NAS and bring all my sub frames home and save them on my home NAS. The internet connection at my remote install is Starlink and it isn't the fastest so I don't upload all my subframes. I have two scopes that both backup to the same NAS so uploading all of my data nightly isn't feasible for 2 full frame cameras.
Like
danieldh206 1.20
...
· 
·  1 like
·  Share link
The remote site has StarLink, so I upload each image to the cloud as it is taken. When I use the QHY268 with the IMX571, I configure NINA to save the files in XISF and use Zstandard compression to make the files smaller. For the Player One Ares with the IMX533, I simply save in FITS as the file size is small. As long as my exposure time is 180 seconds or longer, the file usually finishes uploading before the exposure completes. I also configure the cloud storage software to only upload at a maximum of 1Mb. This way I don't DDoS my remote access or other observatory users' access.  If I ever move to an IMX455 or IMX461, I will use XISF and employ Zstandard compression, just as I do with my QHY268. I don't do any scientific work, so I don't need FITS. 

For storage, I store the original, uncalibrated light FITS or XISF images, along with the matching calibration masters, for approximately 18 months on my NAAS. I used to store the calibrated lights, but 32-bit files use up too much storage space. I also blink the images and delete the ones that are bad.
Like
jhayes_tucson 26.84
...
· 
·  2 likes
·  Share link
Willem,
Thanks for opening up the subject for a more general discussion.  Again, here's what I do and why.  First of all, I have a substantial investment in imaging equipment located in Chile where it cost a lot to get it down there and where there are ongoing leasing and maintenance costs.  I don't even want to think of how much each of my images actually costs but the point is that what I'm ultimately paying for is the data itself.  I personally love owning and fiddling with my own gear but all of that is just a means to the ultimate end–image data.  Therefore, I want a very reliable and easy to use means to download the data along with a very safe way to store it.

Ever since I first started remote imaging around 8-years ago, I used a number of different data management schemes.  My original method was to simply automatically backup my data to Google Drive.  The files from my ML16803 camera were relatively small and Google drive wasn't all the expensive.  Incoming data was uploaded to Google Drive where I could download it to a local USB drive for processing as soon as I was ready.  The download ritual was normally done manually every morning.   That scheme was put to bed when Google drive eliminated their backup service and replaced it with file synching.  I didn't have a huge drive on my observatory PC so I had to be very careful about what I downloaded and where it went to avoid erasing any raw data.  After losing some stuff due to circumstances I won't go into, I knew I needed a better method.

When I installed my CDK20 system in Chile, I set up a Synology NAS system in my house in Oregon along with a backup NAS that's in an airplane hangar across town.  Synology has a really nice backup agen that basically backs up any new file that appears in the data buffer folder on my observatory PC directly to my NAS system 7,000 miles away.  The bandwidth at the observatory is so fast that there's very little latency to this process.  New images generally arrive only minutes after they are captured.  At that point, I simply walk downstairs to my NAS system, plug in a USB drive and download whatever data I fell like processing…and that's worked well for a couple of years.  Still it's not perfect and I'm filling gobs of USB drives by keeping everything in sight on them.  The other problem is that I'm spending more of my winters full time in Tucson where I have to rely on internet downloads from the NAS to get at my data, and that is a huge bottleneck.  One way to solve that problem might be to move my backup NAS from Bend to Tucson  but then it's not accessible if my main system were to get zapped.  I could put another backup NAS in Tucson but that's more expensive and complicated than necessary.  I could also just bring the NAS system with me but I really don't want to be carrying it around.

Instead, I've purchased a MinisForum UM890 Pro mini-PC (basically a NUC) with a Rizen 9 processor and 64GB of RAM and a 1TB SSD.  This thing was recently on sale for right around $700 so it's not super pricy.  I've loaded it up with a super fast 4T SSD to use as a data buffer.  I've connected it through a network cable to the same router feeding my NAS system so downloads are reasonably fast.  It's not as fast as direct download to a USB drive but it's WAY faster than downloading over the Web.  I've installed PI on this machine and I run the whole thing headless using Chrome Remote Desktop.  With this setup, I can download all image data for a single object, run it through WBPP, and create master RGB and master Lum files that can be very quickly downloaded through the web.  Once I'm done processing, I can erase all of the calibrated files left over from WBPP to free up drive space.  An added benefit is that I can process one image on my MacBook Pro while preprocessing another image on the UM890 simultaneously.  The PI benchmarks shows the UM890 to be slightly more than twice as fast as my old M1 powered MacBook Pro so that completely eliminates tying up my main "work" PC with long preprocessing runs.

This whole scheme checks all of my boxes:  
1) Data from the telescopes is seamlessly uploaded to my NAS system with no operator intervention.  The amount of data can range from 4-18GB/night depending on how many scopes are running and how long the nights are.
2) My data is stored on a multi-disk RAID drive system so it is mostly protected against drive hardware failure.
3) All of the data is backed up to an offsite location.  If my house burns down in a wildfire, I shouldn't lose any data.
4)  I no longer have a bottleneck getting data remotely from the NAS.  Of course this assumes no power failures or internet interruptions.   Pacific Power is now cutting power to areas where wildfire risk is extreme so this could become a significant problem as we move into fire season.  The other night they cut my power for about four hours, which was longer than my UPS system could handle so my whole system eventually went down.  Either way, there isn't much I can do about this short of buying a large, whole house backup power system.
5) I can run the UM890 remotely to pre-process data, which frees up my main processing PC.

Now, I simply have to get a LOT more rigorous about erasing unnecessary calibration files once I've processed an image.  I just need to spend some time with all my old drives erasing and consolidating files and I think that I'll wind up with around 5-6 large capacity USB drives that I can reuse.

John
Like
Wjdrijfhout 6.78
Topic starter
...
· 
·  Share link
That Mini-PC you got, John, seems like a real winner. Amazing how much performance you get out of such a small computer. It's tempting to have a separate processing PC, freeing up the main PC to editing at all times. On the other hand it's yet another device to manage, keep up to date etc. Also, I like keeping everything from a single target together (originals, masters, logs, processed files etc). It's just one folder to move around and when done processing the only thing to do is delete all intermediate files and the whole folder is moved to its 'archive' location.

You are completely right, the data in the end of the day is the most valuable we get out of all our equipment investments. That is something to be very precious about. When I had a 5-bay RAID drive completely fail on me many years ago, I was very lucky to have an on-line backup. And while it was a PITA to put so much data back again, at least nothing was lost. But it has changed my approach to backup for forever. I now always have a local backup (Mac mini in basement with as much drives connected as I want) and an online backup (Backblaze, unlimited data, including all connected drives, such as my Sandisk Blades. The local backup is for technical issues, back in operation in no-time. The online backup for disasters (fire, etc), more time-consuming but under those circumstances that's the least of anyones worries.   

Interesting to read how many quite different approaches there are to the data management aspects of running a remote observatory. It looks like there is a lot of difference between approaches for locations that you can drive to in a few hours, vs places that take days of travel to get to.
Like
morefield 12.31
...
· 
·  1 like
·  Share link
My process is this:

1) Raw data is captured to a NUC PC and saved there till the project is completed 
2) After culling bad subs, data are transferred daily to my processing computer via the cloud
3) Data for ongoing projects reside on a 4 tb NVME SSD in the processing computer 
4) When a project is complete, the raw subs, cosmetized subs, and the masters are transferred to a large RAID5 NAS for long term storage
5) At this point the original subs are deleted from the NUC at the remote observatory

I’ve never had a problem with the volume in process subs being too large for the 2TB drive on the data capture NUC.  Holding them there temporarily provides a backup data set while I’m processing.  

I’m relying on the RAID5 NAS as the only home for the raw data after project completion.  I know that’s not a 100% solution but I’m OK with that.  

Kevin
Like
jhayes_tucson 26.84
...
· 
·  1 like
·  Share link
Kevin,
Your process sounds good.  I tend to let my 2TB “buffer drive” at the observatory fill to about 80% - 90% capacity before erasing about 6 months of data at a time.  That typically leaves roughly the last 4-6 months of data sitting on the observatory computer at all times.

John
Like
jhayes_tucson 26.84
...
· 
·  1 like
·  Share link
Willem Jan Drijfhout:
That Mini-PC you got, John, seems like a real winner. Amazing how much performance you get out of such a small computer. It's tempting to have a separate processing PC, freeing up the main PC to editing at all times. On the other hand it's yet another device to manage, keep up to date etc. Also, I like keeping everything from a single target together (originals, masters, logs, processed files etc). It's just one folder to move around and when done processing the only thing to do is delete all intermediate files and the whole folder is moved to its 'archive' location.

You are completely right, the data in the end of the day is the most valuable we get out of all our equipment investments. That is something to be very precious about. When I had a 5-bay RAID drive completely fail on me many years ago, I was very lucky to have an on-line backup. And while it was a PITA to put so much data back again, at least nothing was lost. But it has changed my approach to backup for forever. I now always have a local backup (Mac mini in basement with as much drives connected as I want) and an online backup (Backblaze, unlimited data, including all connected drives, such as my Sandisk Blades. The local backup is for technical issues, back in operation in no-time. The online backup for disasters (fire, etc), more time-consuming but under those circumstances that's the least of anyones worries.   

Interesting to read how many quite different approaches there are to the data management aspects of running a remote observatory. It looks like there is a lot of difference between approaches for locations that you can drive to in a few hours, vs places that take days of travel to get to.

Thanks Willem.  We’ll see how it works out but managing the “NAS-PC” should add very little overhead.  The goal is to balance convenience/usefulness against any increased hassle.  Managing the data from a remote observatory is something that deserves a lot more thought than most of us give it when we first jump into the whole thing.  Transferring it, saving it, backing it up, and processing it requires a decent plan and a bit more organizational effort than you realize up front.  


John
Like
jhayes_tucson 26.84
...
· 
·  2 likes
·  Share link
Dave Erickson:
I drive 3 hours up plug the SSD into each of three computers, download the data, and drive 3 hours back. Once or twice a month. most of the time I pull maintenance and stay overnight.. If guests come along with their rigs we'll stay a few days. It is very present driving there and being up there... DSL is just to slow to download, takes longer to download than th integrate. Now that I have added more solar panels for power I plan on bringing Starlink on line, that will be a game changer...

Wow Dave…that’s quite a process, even if it is a pleasant drive!   It sounds like Starlink should be a total game changer.   Good luck with it! 

- John
Like
hbastro
...
· 
·  1 like
·  Share link
John Hayes:
Dave Erickson:
I drive 3 hours up plug the SSD into each of three computers, download the data, and drive 3 hours back. Once or twice a month. most of the time I pull maintenance and stay overnight.. If guests come along with their rigs we'll stay a few days. It is very present driving there and being up there... DSL is just to slow to download, takes longer to download than th integrate. Now that I have added more solar panels for power I plan on bringing Starlink on line, that will be a game changer...

Wow Dave…that’s quite a process, even if it is a pleasant drive!   It sounds like Starlink should be a total game changer.   Good luck with it! 


Tough to do anything when the remote location has no services. No Power, No Water, No Sewer and the neaest store is 80 miles round trip. But I remote operate (3)Domes moonless clear nights under Bortle 1-2 skys at my DYI facility.

. So the drive and being there is a pleasure...
Edited ...
Like
danieldh206 1.20
...
· 
·  1 like
·  Share link
John Hayes:
Dave Erickson:
I drive 3 hours up plug the SSD into each of three computers, download the data, and drive 3 hours back. Once or twice a month. most of the time I pull maintenance and stay overnight.. If guests come along with their rigs we'll stay a few days. It is very present driving there and being up there... DSL is just to slow to download, takes longer to download than th integrate. Now that I have added more solar panels for power I plan on bringing Starlink on line, that will be a game changer...

Wow Dave…that’s quite a process, even if it is a pleasant drive!   It sounds like Starlink should be a total game changer.   Good luck with it! 

- John

The Starlink settings include a standby timer function that allows you to conserve power.  Starlink doesn't allow public IP addresses, but you can use Tailscale to gain direct access to your equipment. Tailscale also works with many NAAS systems so that all your gear is in the same VPN network.
Like
hbastro
...
· 
·  2 likes
·  Share link
John Hayes:
Dave Erickson:
I drive 3 hours up plug the SSD into each of three computers, download the data, and drive 3 hours back. Once or twice a month. most of the time I pull maintenance and stay overnight.. If guests come along with their rigs we'll stay a few days. It is very present driving there and being up there... DSL is just to slow to download, takes longer to download than th integrate. Now that I have added more solar panels for power I plan on bringing Starlink on line, that will be a game changer...

Wow Dave…that’s quite a process, even if it is a pleasant drive!   It sounds like Starlink should be a total game changer.   Good luck with it! 

- John

The Starlink settings include a standby timer function that allows you to conserve power.  Starlink doesn't allow public IP addresses, but you can use Tailscale to gain direct access to your equipment. Tailscale also works with many NAAS systems so that all your gear is in the same VPN network.

Yes, I build a remote control system that allows me to turn it, and 40 other instruments, on and off over the DSL to conserve power. There is 1600Amp hours of batteries with shared switchable solar panels at the 10' dome 100' distant. So i can remote switch to charge either the 10' dome batteries or these AGM batteries as needed..

Good info on Tailscale, I'll check that out, Thanks!!
Edited ...
Like
jhayes_tucson 26.84
...
· 
·  2 likes
·  Share link
Dave Erickson:
Tough to do anything when the remote location has no services. No Power, No Water, No Sewer and the neaest store is 80 miles round trip. But I remote operate (3)Domes moonless clear nights under Bortle 1-2 skys at my DYI facility.

. So the drive and being there is a pleasure...

It’s a hike to get down to work on my scopes in Chile but I feel the same way.  I really love being down there.  So, I get it.

John
Like
Wjdrijfhout 6.78
Topic starter
...
· 
·  Share link
The Starlink settings include a standby timer function that allows you to conserve power.  Starlink doesn't allow public IP addresses, but you can use Tailscale to gain direct access to your equipment. Tailscale also works with many NAAS systems so that all your gear is in the same VPN network.

Is Tailscale comparable to ZeroTier?
Like
Wjdrijfhout 6.78
Topic starter
...
· 
·  Share link
John Hayes:
Managing the data from a remote observatory is something that deserves a lot more thought than most of us give it when we first jump into the whole thing.  Transferring it, saving it, backing it up, and processing it requires a decent plan and a bit more organizational effort than you realize up front.  


John

That is so true!
Like
danieldh206 1.20
...
· 
·  1 like
·  Share link
Willem Jan Drijfhout:
ZeroTier

Tailscale looks similar to ZeroTier.
Like
 
Register or login to create to post a reply.