Clustering machines based on event logs within the MANTIS project

Introduction

Liebherr participates in the MANTIS project as an industrial partner with the division Liebherr hydraulic excavators. As expected, the main expertise of Liebherr consists in developing and optimizing excavators under consideration of different information sources. However, after the delivery of the excavator to the customer, every excavator generates event respectively message data automatically, which are actually mainly used for fault diagnostics but not extensively for further investigations.

This event data logger records among other things basically:

  • timestamp, when an event occurs
  • type of event, e.g. info, warning or error
  • unique message identifier of this event class

In combination with anonymized data concerning service partner and customer the following questions are relevant:

  • Is there a relation between the message patterns and the corresponding anonymized service partner?
  • Is there a relation between the message patterns and the anonymized customer?

Analysis approach for clustering

The related analysis was performed by the University of Groningen (RUG) as a research partner within the MANTIS project by considering each excavator as a stochastic message generator. In the context of preprocessing, the different messages were first counted per excavator and afterwards normalized with the total amount of occurrence per unique message identifier.

Based on the computed message probabilities per machine a k-means clustering was performed. To overcome initialization influences the clustering was performed 100 times with random initialization. The relationship of the cluster assignment of each excavator with the corresponding service partner or customer for each ‘k’ was subsequently examined with the chi-square test. The average estimate of the significance of the 100 model estimations of each ‘k’ then represented the quality function.

Results of cluster analysis

As can be seen in figure 1, there is no tendency for a relationship between the service partner and the messages per excavator. The average significance level is obviously higher than 0.05 and all of the single levels have nearly the same magnitude.

Average p-significance levels as a function of k (number of groups), for the interaction clusters versus service partner
Average p-significance levels as a function of k (number of groups), for the interaction clusters versus service partner

In contrast to figure 1, figure 2 shows a clear minimum at k=7, indicating that for this number of groups, it is likely that the distribution of machines over customers is not likely to be random. Although the p_signif – value is with 0.0588 slightly above the significance level of 0.05, the magnitude at k=7 is obviously lower than at other k-values.

 

Average p-significance levels as a function of k (number of groups) for the interaction clusters versus customers.
Average p-significance levels as a function of k (number of groups) for the interaction clusters versus customers.

In order to explain the minimum at k = 7, Liebherr decoded the anonymized customers and tried to find manually a description of the clusters. The cumbersome work did actually not yield to the expected result, namely the detection of short cluster descriptions, but rather to the recognition of customer data mismatching.

In summary the carried out analysis pointed out, that with the skillful usage of analysis algorithms superficial unmanageable data can disclose insights. But one of the basic requirements for later usage of the results is the proper preparation of data.