16. Mai. 2012 by Kevin
System administrators don’t have an easy life. Not at all. Especially when working for a larger IT or telecommunication company as they tend to have quite a lot of hardware keeping the place warm...
Let’s take a large telecoms company as an example. Modern telecommunication infrastructure which has to be capable of serving 10 million subscribers requires 9-20 19” full size racks filled with hardware to the very last U. These include computing nodes for real-time processing, large data storage solutions for keeping all the data generated during processing subscribers requests as well as fancy UPS solutions to help ensuring there won’t be any power related system failures. And we have our poor system administrator who has to keep an eye on everything what’s happening in the datacentre.
A lot of that requires browsing log files generated by each service running on every machine. For daily supervision we only need log entries generated during last 24 or so hours. Retrospective addresses this particular need with a search feature for historical data and a ‘tail’ feature for looking at live stuff. Regular searching is good when we need to troubleshoot some kind of an event that took place in a distant, but well defined past. For that we simply define a profile containing data sources that we want to search (if we haven’t done that already), we then just specify the search filters (date time, maybe error codes or interesting “known” strings) and start searching.
Tailing isn’t that much different. You don’t really need to do any changes in configuration what so ever. You simply create a searching profile or use already existing one and add any additional log files which you need to tail. Since the tailing function is all about processing log files in real time, it makes the most sense to use it with logs which are constantly updated with new entries. When we have the source files defined all what we have left to do is (optionally) define a search filter and then enable the tailing feature with a single click on the [Start Tail] button. And that’s pretty much it! With the tailing activated you have your search results updated in real time whenever a new entry containing defined search phrase is added to the log file. Wasn’t that easy?!
If you no longer need to keep track of live events, you simply click [Stop tail] and the search results are no longer updated. You can always resume tailing whenever you want.
Just imagine if the [Start Tail] button wasn’t there. All you have is the basic search capabilities. You configure your profile, add data sources, define search filters and click the [Start Search] button. After certain time which depends on the data volume which has to be processed, you get your results. But the second they are displayed on the screen it’s very likely that they are already outdated as the log files have been updated with new entries. So a minute later you click the [Start Search] button again to check whether there are new entries available matching your previously defined search criteria. After a few seconds you have the search results available for browsing and viewing. But, yet again, you cannot be sure that these are 100% accurate. Having to repeat the search every few minutes isn’t very convenient for two reasons. First of all, manually triggering the search engine can be really annoying for the user. Secondly, searching the data sources all over again takes time. The longer it takes the more probable is that the log files, especially those which were processed in first place, have been already updated when you get your search results set. This is a real deal-breaker for monitoring infrastructure hosting real-time services which require high-availability.
I’m not a system admin but I can’t really imagine being one without a proper logs browsing solution with a tail search option. That, and a big coffee mug :)