Archive for October, 2013

Monitoring – the pitfalls that no one wants to talk about

October 27th, 2013 No comments

Over the years of having to deal with various monitoring systems, I have learned a lesson or two….  I am sharing some of them, in the hope that you can avoid some of the pitfalls and ultimately have fewer scars than I do.

  1. Every alert needs to be actionable.
  2. False Positives will quickly overload the team and drive up OPEX like no other operational line item will.
  3. Hard coded thresholds are a maintenance nightmare and require a staff to maintain them.
  4. Event and alert naming is crucial and needs to include the data center, device name, unique identifier, and brief human readable short description.  Ideally a link to a runbook and a reference to the automation that did / didn’t catch the issue.
  5. To ease troubleshooting, all monitoring systems need to use the same time zone (UTC is recommended).
  6. As of October 2013, I have not seen any commercial solution that works properly.  In fact, there are quite a few commercial monitoring technologies that just do not work the moment you move beyond the basics.  To validate them, ask the vendor to show you their text case matrix, especially on storage devices.
  7. Most engineers will avoid working on the monitoring definitions because they don’t see the value and based on their experience it will result in more work and not help them.  As such, you will need to have a strong automation capability / mindset / understanding in the team in advance in order to keep things under control.
  8. People do not like to wake other people up in the middle of the night and therefore will avoid it.
  9. Most people do not answer their phones when called the first time.  Based on my experience, only 30% will answer on the first call.  So use an automated system to notify people and don’t rely on the on-call engineers to call other people.
  10. Most people require approximately 7 minutes to wake up when called.
  11. When the on-call people are called for trivial things, it really irritates them.  As such everything needs to be done to minimize trivial notifications.
  12. The on-call rotation needs a clean handover from the previous on-call rotation.  In my experience, handing over a physical item helps with the hand over.
  13. Contact lists and on-call people means that the appropriate roles are contacted when needed.  These lists need to be easily accessible with multiple locations.  My recommendation is in at least 5 locations.  The list needs to contain at least the name, subject matter, contact details and primary and secondary on-call roster.
  14. Predefining escalation criteria is overlooked and this often delays getting the correct people onto the issue.
  15. Averaging metrics will skew your metrics because the high and low outliers will mask issues.
  16. What will be monitoring the monitoring system?  This is almost always overlooked and it is critical to know when elements of the monitoring system have failed.  This is one of the reasons why I do not believe in a single monolithic monitoring system with vendor claims to solve all monitoring problems.







Categories: Technology Tags:

So I switched to Mac from Windows

October 19th, 2013 No comments

I have been using a Mac user for a while but it was just because the photography apps are soooo much better on Mac.  Now for the last year or so, I have been using a Mac exclusively for work too.  Based on conversations, I know that a lot of people are considering the same and therefore I am sharing my experiences.

The Equipment

Hardware:  Mac Book Air with 8GB RAM and a Thunderbolt 27 inch monitor.

Accessaries:  External Apple Bluetooth keyboard and trackpad, Logitech headphones

Phone: iPhone 5 – upgrade from iPhone 4 and a 3 before that.

Tablet: iPad Mini – my iPad 3 arrived 3 days after they announced the Mini and so I used the Apple 14 day satisfaction policy to return it.  Thanks Apple.

The Con’s

  1. Outlook for the Mac – well it crashes at least twice a day, wildcarding in rules does not work, you cannot attach a file to meeting request, the rules don’t work properly, and when someone sends you an email with voting buttons – this feature does not exist in the Mac version.  This is what came to mind as I was typing this blog post.  Basically it sucks!  Are there any competitors out there?
  2. Microsoft Excel crashes 70% of the time when opening a spreadsheet stored on SharePoint. I have to download it and then open it. Pivot tables, are different to Windows and I’ve come to prefer how it’s done on the Mac version.
  3. No Visio for Mac.   I am using ConceptDraw and I like some the features that it has that are not in Visio.  Here is a link to the product website.
  4. I have not found a decent and affordable scp GUI client that supports drag and drop and integration with password caching or KeyChain integration.  Basically there is nothing comparable to winscp .
  5. iTunes – I am still getting used to the new UI but I still don’t like it.  If there was an alternative, I would more than likely use it.
  6. Lync. It crashes 3 or 4 times a day. Logs itself out another 3 or 4 times a day. I switched to Skype as my primary chat client and phone application.


The Pro’s

  1. It’s consistently fast.  There are no slowdowns in the middle of the day like I experienced with Windows 7.
  2. Ah yes, the reboots because of the all the Windows patches.  I’ve had to reboot my Mac twice in the last 3 months.
  3. The startup time, flip open the screen and within 10 seconds I’m typing in my password.
  4. The Thunderbolt monitor is superb.  Almost no eye strain with massive amounts of real estate in addition to point #6.
  5. The retina screen on the Mac book Air has the same quality as the external monitor, just smaller.  Therefore it is just as usable as my main monitor.
  6. Virtual desktops – means that I don’t miss the real estate when using the laptop screen and  means that I don’t have a single cluttered screen or NEED multiple screens like Microsoft employees do.
  7. Initially, I used Windows Mesh because it has a Mac client that works really nicely and therefore I can sync my files between machines without uploading them to the cloud but now that I only use one machine, I don’t need it anymore.
  8. All my devices sync with no issues using iCloud.
  9. No IE!!  I no longer have to fight with my browser.
  10. Built-in firewall that does not blackhole traffic without telling you.  The Apple firewall has great logging which makes debugging an issue a breeze.
  11. AppleScript – pretty easy to learn and allows you to automate simple tasks on your desktop.
  12. Mail.  The built-in mail client works well and supports multiple email account from a single UI.
  13. The Macbook Air form factor.  Simply awesome, light, portable with 6+ hours of battery life even with my screen brightness turned up and power savings off.
  14. Oh, yeah, it’s cool!


Oh, yeah before I the flame mails telling me to use Windows 8.  I have and find the Metro interface horrible without a touchscreen.  When I use it on a touchscreen, I hate the fingerprints all over the screen.  I’m constantly cleaning my screen.


Categories: Technology Tags:

© 2008-2023 Gavin McMurdo aka SparkPilot All Rights Reserved