Pollination Cloud Slow Runs

antoinedao · May 24, 2021, 5:26pm

Hi folks,

We’re currently experiencing some latency issues with our internal backend messaging system. As a result some of you (hi @Patryk ) might have noticed that Jobs/Runs are left hanging for hours.

We apologise for this and thank you for bearing with us while we work through this issue. We are going to do what we can to recover any lost data but unfortunately some runs might be lost.

Feel free to comment below with any questions/concerns/feedback You can tag myself or @tyler if something urgent comes up!

patryk_wozniczka · May 24, 2021, 6:04pm

I did schedule a few perhaps unnecessarily large daylight and energy simulations yesterday
Should I cancel them? Some are hanging, a few are running?

mostapha · May 24, 2021, 6:09pm

Hi @Patryk, that actually gave us a very good case to see how far we can go before something breaks. Thanks!

I think you can just let them be. We might need to cancel some of them on your behalf and you can re-run them later but let’s see where do we get with this. Thanks again!

tyler · May 24, 2021, 6:13pm

Hi @Patryk I am looking into this now. As @Mostapha said, we may need to cancel some on your behalf, but will follow up here afterwards. Thanks again for giving us a great use case!

tyler · May 24, 2021, 7:42pm

So I ended up having to cancel the jobs that had started >= 5 runs in order to reduce the load on our database. The issue is with how we are storing/updating the jobs in the database which became a bottleneck for a couple different services. This is an optimization that we will make a high priority.

Some of the jobs will continue to show the status as ‘Running’ until we can do a more thorough clean up of the DB. As a workaround, you could try starting these large jobs one at a time while we implement a fix. Thanks again for your patience and for the stress test!

devang · May 25, 2021, 8:49am

Welcome @Patryk!

I just wanted to share that I am really glad you’re running these

We appreciate this!

mostapha · June 2, 2021, 2:10am

Just a quick update that we have been working on this and we have made some good progress. Most of the changes has already been merged into production server. We are working on some other side-effects of these changes. We will post a more comprehensive update once the issue is fully resolved!

mostapha · September 14, 2021, 6:38pm

13 posts were split to a new topic: EnergyPlus simulation runs successfuly but the run stays unfinished for a long time