Monitoring system and service uptime with heartbeats?

IridiuM

Well-Known Member
Joined
Oct 19, 2006
Messages
142
How do all the 1337 developers out there monitor their system and service uptime? :unsure:

I worked on https://heartbeat.sh which is a free service that does just that (complete with dark mode :thumbsup:), it may be useful for anybody looking to monitor the execution of scripts, cronjobs, server uptime, backups, processes etc. - Hell just about anything... (did the gate motor close?)

Shell script: https://github.com/heartbeat-sh/heartbeat.sh
Python client (PIP): https://github.com/heartbeat-sh/heartbeat.py
JS client (NPM): https://github.com/heartbeat-sh/heartbeat.js

Let me know what you think or help me make it better if you're keen to jump in an make use of it!

^_0
 

Little Mac

Honorary Master
Joined
Jul 18, 2008
Messages
53,522
How do all the 1337 developers out there monitor their system and service uptime? :unsure:

I worked on https://heartbeat.sh which is a free service that does just that (complete with dark mode :thumbsup:), it may be useful for anybody looking to monitor the execution of scripts, cronjobs, server uptime, backups, processes etc. - Hell just about anything... (did the gate motor close?)

Shell script: https://github.com/heartbeat-sh/heartbeat.sh
Python client (PIP): https://github.com/heartbeat-sh/heartbeat.py
JS client (NPM): https://github.com/heartbeat-sh/heartbeat.js

Let me know what you think or help me make it better if you're keen to jump in an make use of it!

^_0
Looks promising! Can it be self hosted? If not, could you detail basics on what is stored on the server end, how authentication details are stored and how redundancy is handled on the server end? Do you have a lot of clients? I guess I'm trying to get a feel for whether this is better for internal use or whether I could use it with client services. I don't want to start if it's not a sustainable model and I can rely on it in 5 years time.

I took a brief look at the documents and thought what might be missing are some use cases explaining how it could be used...
e.g.
How do I approach it if I want it to monitor my daily backup routine...
I'm thinking I may want to send a heartbeat at backup start and end with the heartbeat at the end signaling the end of the backup... would the best approach be to create a backup:start and a backup:end heartbeat? I assume exception reporting and alerting should happen on my system end
e.g. to check if backup worked, I could query the start and end heartbeat about a minute after start and maybe 30 minutes after expected end time and send out my own alerts if there's an anomaly in there?
Or perhaps the approach is to set up a daily backup:start and backup:end, 24 hours apart about a minute after expected start of the backup and just poll the API for failures in the backup event?
I assume there's no alert mechanism in the system.
Thanks!
 

IridiuM

Well-Known Member
Joined
Oct 19, 2006
Messages
142
Hi Eric (on behalf of ariedebok - who is banned for no reason :) -> plz someone unban him)

Thanks for your feedback! I can see that you gave this some serious thought, and I will definitely implement some of your suggestions. Adding a notification system is my next development goal.

We use the heartbeat system for our in house monitoring at Direct Debit and SnapBill. We created heartbeat.sh, because the system works well for us, and we want to make it available as a service to the public. For now it is a "beta" version to see the reaction of the public, but the idea is to make it permanent and to grow it according to the feedback we receive.

We host the heartbeat.sh through AWS, we don't have a self-hosted version available currently. Redundancy-wise we have 2 app servers. The heartbeats themselves are stored in a Redis instance with redundancy spread over 3 nodes. We store user details in a Postgres DB on Amazon's RDS service. We make daily snapshots of the DB.

As for the information we keep: we store heartbeat details (timestamp of the last beat, configured timeouts, name, subdomain) and login details for the users that registered an account. The user passwords are hashed in our db, and every password is hashed with a randomly generated salt (as opposed to using the same salt for every password). Our hashing algorithm is cryptographically secure.

I've started to write some how-to's on our documentation page. For most cases, it is good enough to just send a heartbeat at the end of every script run, making the timeout a bit longer than the frequency of the script.

The approach of sending start and end heartbeats is good for mission critical tasks, where you want to be notified early. I would keep one heartbeat, and just swop out the timeouts. If you expect a backup to take 30 minutes, send a heartbeat with a 30 minute timeout at the start. Once the backup is done, send a heartbeat with the same name, but a 24 hour timeout.

You can also send a heartbeat with a 0 second timeout in case something went wrong. And then delete the heartbeat once you fixed the issue.

Our alert mechanism is still under development. Fow now, polling the API is best. If you know your way around the shell, I would recommend polling https://{subdomain}.heartbeat.sh/heartbeats/text. As that would give a response that can easily be parsed with awk. If you prefer to work with JSON, you could just use /heartbeats/json instead.

For our in house monitoring, we hooked up a robot(yes, a traffic light) to a raspberry pi. The pi polls our API, and changes the robot's colour according to the status.

Thanks again for your feedback!
Arie, heartbeat.sh developer
 
Last edited:

Little Mac

Honorary Master
Joined
Jul 18, 2008
Messages
53,522
Hi Eric (on behalf of ariedebok - who is banned for no reason :) -> plz someone unban him)

Thanks for your feedback! I can see that you gave this some serious thought, and I will definitely implement some of your suggestions. Adding a notification system is my next development goal.

We use the heartbeat system for our in house monitoring at Direct Debit and SnapBill. We created heartbeat.sh, because the system works well for us, and we want to make it available as a service to the public. For now it is a "beta" version to see the reaction of the public, but the idea is to make it permanent and to grow it according to the feedback we receive.

We host the heartbeat.sh through AWS, we don't have a self-hosted version available currently. Redundancy-wise we have 2 app servers. The heartbeats themselves are stored in a Redis instance with redundancy spread over 3 nodes. We store user details in a Postgres DB on Amazon's RDS service. We make daily snapshots of the DB.

As for the information we keep: we store heartbeat details (timestamp of the last beat, configured timeouts, name, subdomain) and login details for the users that registered an account. The user passwords are hashed in our db, and every password is hashed with a randomly generated salt (as opposed to using the same salt for every password). Our hashing algorithm is cryptographically secure.

I've started to write some how-to's on our documentation page. For most cases, it is good enough to just send a heartbeat at the end of every script run, making the timeout a bit longer than the frequency of the script.

The approach of sending start and end heartbeats is good for mission critical tasks, where you want to be notified early. I would keep one heartbeat, and just swop out the timeouts. If you expect a backup to take 30 minutes, send a heartbeat with a 30 minute timeout at the start. Once the backup is done, send a heartbeat with the same name, but a 24 hour timeout.

You can also send a heartbeat with a 0 second timeout in case something went wrong. And then delete the heartbeat once you fixed the issue.

Our alert mechanism is still under development. Fow now, polling the API is best. If you know your way around the shell, I would recommend polling https://{subdomain}.heartbeat.sh/heartbeats/text. As that would give a response that can easily be parsed with awk. If you prefer to work with JSON, you could just use /heartbeats/json instead.

For our in house monitoring, we hooked up a robot(yes, a traffic light) to a raspberry pi. The pi polls our API, and changes the robot's colour according to the status.

Thanks again for your feedback!
Arie, heartbeat.sh developer
Hahaha, that traffic light is an uber cool idea. The kid in me wants one for my office now.
Thanks for the feedback. I love the service. I haven't set anything up full time yet but will consider this for the next project we start. Personally I think the sooner you get a pricing model going and some use cases detailed for anyone interested, the better. With the pricing model it will feel a little more sustainable and would put more minds at rest.
Great work!
 

Anthro

Expert Member
Joined
Jun 13, 2006
Messages
3,202
Someone call the Govenment, this man stole a light from a random intersection !
(What Im actually asking is.. where did you "buy" that from ?
 

Little Mac

Honorary Master
Joined
Jul 18, 2008
Messages
53,522
Someone call the Govenment, this man stole a light from a random intersection !
(What Im actually asking is.. where did you "buy" that from ?
It's an old school incandescent one too. Epic!
 

Gnome

Executive Member
Joined
Sep 19, 2005
Messages
6,486
Are you new to software development?

Heartbeat the way you phrase it sounded much more like metrics to me.

Anyway the most common way to monitor is metrics and alarming. The alarm can either be to send an email or push a notification. In my company we have a paging app installed on our phones and it sets off an alarm for the engineering to check in and fix the problem.

If you are "anti cloud" then you can build it yourself and get something less useful and less reliable for more money I guess.

Traffic lights are great if you only care about having your app running when people are in the office. But having worked in companies where 99.99% uptime is a requirement, I see the traffic light thing as a bit of an amateur hour.

"Heartbeat", the phrase, is more commonly applied pattern in a leader election based multi server application, for example using paxos or a number of other leader election strategies. The leader heartbeats so that another agent can take over if it dies.
 
Last edited:

Little Mac

Honorary Master
Joined
Jul 18, 2008
Messages
53,522
Are you new to software development?

Heartbeat the way you phrase it sounded much more like metrics to me.

Anyway the most common way to monitor is metrics and alarming. The alarm can either be to send an email or push a notification. In my company we have a paging app installed on our phones and it sets off an alarm for the engineering to check in and fix the problem.

If you are "anti cloud" then you can build it yourself and get something less useful and less reliable for more money I guess.

Traffic lights are great if you only care about having your app running when people are in the office. But having worked in companies where 99.99% uptime is a requirement, I see the traffic light thing as a bit of an amateur hour.

"Heartbeat", the phrase, is more commonly applied pattern in a leader election based multi server application, for example using paxos or a number of other leader election strategies. The leader heartbeats so that another agent can take over if it dies.
You sound like you're 12 years old.
You have no idea what the product is so rather stop. There are links in the OP, you didn't even have to Google.

This is actually pretty innovative. A central portal you can use for monitoring any service or script uptime where you don't need access inwards to the service or special routines to do that. Your system merely needs Https out and can transmit a 'heartbeat' using any number of existing libraries, scripts or code. You can send heartbeats to mark the start and end of a process, or just to announce "I'm alive and well". You set a timeout on the heartbeat and if the process takes longer than expected, you can receive alerts any which way you like to announce there's an issue. Perfect for retrofitting legacy systems or a quick solution with very little effort on a small project.

It's a great tool for resolving some false negatives in your monitoring since you push the heartbeat from the service side.

The traffic light? It's a gimmick that's a bit of fun. But fun seems like your middle name so you should have got that LMAO.
 
Last edited:

IridiuM

Well-Known Member
Joined
Oct 19, 2006
Messages
142
You sound like you're 12 years old.
You have no idea what the product is so rather stop. There are links in the OP, you didn't even have to Google.

This is actually pretty innovative. A central portal you can use for monitoring any service or script uptime where you don't need access inwards to the service or special routines to do that. Your system merely needs Https out and can transmit a 'heartbeat' using any number of existing libraries, scripts or code. You can send heartbeats to mark the start and end of a process, or just to announce "I'm alive and well". You set a timeout on the heartbeat and if the process takes longer than expected, you can receive alerts any which way you like to announce there's an issue. Perfect for retrofitting legacy systems or a quick solution with very little effort on a small project.

It's a great tool for resolving some false negatives in your monitoring since you push the heartbeat from the service side.

The traffic light? It's a gimmick that's a bit of fun. But fun seems like your middle name so you should have got that LMAO.
What Eric said :)...
 
Top