Evaluation Logging

Last modified by Jakob Ruckel on 2019/03/21 12:55

Introduction

Why do you need to evaluate logfiles?

Data Logging is a good way to trace footprints of operating systems. Recording the process in a log file on the operating system, logging files give you opportunity to make a better development and let you have a review, what has occurred in last days on gateways and server. While the evaluations by Evaluation-Offline-Control and monitoring provide the information about existence of errors and data quality, logging-files can offer the exact sort of errors in the case of unexpected operating. More information about Data Logging and analysis: https://community.ogema-source.net/xwiki/bin/view/Tutorial%20Collection/Data%20logging%20and%20analysis/

For this purpose we will use EventLog with the recorded log-files. The Texts below explain, which events should be found in log-files and how to develop EventLog.

Event Log

These events should be found in evaluation.

  • Homematic Errors: The Homematic errors are one of the often occured errors. They could be fixed by developers. They occur reporting the logging text “discarding write to”. 

  • No Homematic Data: The Homematic errors should be found, but another problem is just when there is no homematic data to evalute. It is also an error and it’s logging text is “PING failed”.
  • Framework Restart and Databank Closing: The framework in gateways could be restarted and also should be checked while evaluation. Its logging text is "Flushing Data every: ".  In this case, it is important to decide and to know why it is restarted.It could be restarted by the user removing and plugging in again their connector or device itself could restart. If the user restarted the device, the framework would restart without shutting down and it means there is no any other text in logging file before framework restart. Also device itself could be restarted without shutting down the framework. For these two cases, the eventLog should give a message that it is restarted and the reason for this is possibly by user or device itself without shutdown. The other case is, device could be restarted by developer for their developing or testing something, then developer would shut down the framework first and start it again. Device itself could also restart with proper shutting down the framework. Shutting down means closing the databank. So the logging text should be reported in logging file "Closing FendoDB data/slotsdb".
  • Update Server no Connection: Being checked when the update is not successful because of connection’s failure is also important.
  • Inactive Bundle: Some bundles are inactive in gateways and they couldn’t be found before you check each gateway’s bundle state. The gateway with them would have problem somewhere in their functionality while you don’t notice it. Using Inactive Bundle EventLog you can also find inactive bundles. At founding an inactive bundle the gateway system will report in log file “Inactive bundle found”. This event log will not be often found at evaluation.

How to develop the EventLog?

Get the eventlogdataprovider in your local pc: https://github.com/ogema/ogema-widgets-extended/tree/master/src/timeseries-multieval-eventlogdataprovider

Configuration of the types of incidents to be searched takes place in org.ogema.timeseries.eval.eventlog.incident.EventLogIncidents.addDefaultTypes().

Tutorial: Detecting Framework restarts

For this guide, we'll be looking at the steps needed to implement detection of framework restarts. To differentiate between forceful and proper (i. e. `stop 0`) restarts, we'll need to make extensive use of incident filters.

After the OGEMA Framework restarts, it logs Flushing Data every: to its log file.

Thus, we add the following default type to the types List inside of addDefaultTypes():

types.add(new EventLogIncidentType("FW_RESTART", "The framework (was) restarted", "Flushing Data every: ");

Now, the Eventlog Evaluation will look for and count restarts.

There are, as explained above, two types of FW restarts. In order to determine how exactly the FW was restarted, we need to look for shutdowns of the Fendo Database:

EventLogIncidentType shutdownDB = new EventLogIncidentType("SHUTDOWN_DB", "FendoDB shutdown", "Closing FendoDB data/slotsdb");
types.add(shutdownDB);

This is only a "helper incident" and by itself irrelevant for evaluating Gateway performance. Thus, we don't want it to show up on the KPI Page:

shutdownDB.display = false;

We also don't want it to count towards the total incident count - more about that later.

In order to let a FW_RESTART filter determine whether or not a SHUTDOWN_DB has occurred, SHUTDOWN_DB needs to set a flag after having been detected. The flag is set in the HashMap EventLogIncidents.flags. To set a flag indicating the SHUTDOWN_DB's occurrence, we need to use an OccurrenceFlagFilter like this:

shutdownDB.filter = new IncidentFilter.OccurrenceFlagFilter(flags, false);

The second option lets the filter reject the incident and it will not be counted towards the total.

Now, after a SHUTDOWN_DB has occurred, a flag labeled has_occurred_SHUTDOWN_DB will be set to true by the OccurrenceFlagFilter.

Next, the incident type FW_RESTART is split up and a CheckOccurrenceFlagFilter is added:

EventLogIncidentType fwResP = new EventLogIncidentType("FW_RES_PROPER", "proper restart", "Flushing Data every: ");
fwResP.filter = new IncidentFilter.CheckOccurrenceFlagFilter(flags, "SHUTDOWN_DB");
types.add(fwResP);

EventLogIncidentType fwResF = new EventLogIncidentType("FW_RES_FORCE", "forceful restart", "Flushing Data every: ");
fwResF.filter = new IncidentFilter.CheckOccurrenceFlagFilter(flags, "SHUTDOWN_DB");
fwResF.reverseFilter = true;
types.add(fwResF);

This filter checks whether or not the has_occurred_SHUTDOWN_DB flag is true and lets the FW_RES_PROPER count only if the database was shut down beforehand. Note that by setting EventLogIncidentType.reverseFilter for FW_RES_FORCE, it will only be counted if a database shutdown has not occurred.

...

CSV Output

The EventlogDataprovider outputs CSV files in <Rundir>/EventLogEvaluationResults. To further process these with spreadsheet software like Calc or Excel, they need to be manually cleaned which can be done with a terminal command like this (substitute the file names accordingly):

sort -ur EventLog_2019-03-20.csv > EventLog_2019-03-20.clean.csv

(On Windows, this requires Git/Git Bash: Right click the EventLogEvaluationResults directory in Explorer and click Git Bash here)

Tags:
Created by Su Hyun Hwang on 2019/03/13 17:54