a simple/practical example of fuzzy c-means algorithm

4.7k Views Asked by At

I am writing my master thesis on the subject of dynamic keystroke authentication. To support ongoing research, I am writing code to test out different methods of feature extraction and feature matching.

My current simple approach just checks if the reference password keycodes matches the currently typed in keycodes and also checks if the keypress times (dwell) and the key-to-key times (flight) are the same as reference times +/- 100ms (tolerance). This is of course very limited and I want to extend it with some sort of fuzzy c-means pattern matching.

For each key the features look like: keycode, dwelltime, flighttime (first flighttime is always 0).

Obviously the keycodes can be taken out of the fuzzy algorithm because they have to be exactly the same. In this context, how would a practical implementation of fuzzy c-means look like?

2

There are 2 best solutions below

1
On

Generally, you would do the following:

  1. Determine how many clusters you want (2? "Authentic" and "Fake"?)
  2. Determine what elements you want to cluster (individual keystrokes? login attempts?)
  3. Determine what your feature vectors will look like (dwell time, flight time?)
  4. Determine what distance metric you will be using (how will you measure the distance of each sample from each cluster?)
  5. Create exemplar training data for each cluster type (what does an authentic login look like?)
  6. Run the FCM algorithm on the training data to generate the clusters
  7. To create the membership vector for any given login attempt sample, run it through the FCM algorithm using the clusters you found in step 6
  8. Use the resulting membership vector to determine (based on some threshold criteria) whether the login attempt is authentic

I'm not an expert, but this seems like an odd approach to determining whether a login attempt is authentic or not. I've seen FCM used for pattern recognition (eg. which facial expression am I making?), which makes sense because you're dealing with several categories (eg. happy, sad, angry, etc...) with defining characteristics. In your case, you really only have one category (authentic) with defining characteristics. Non-authentic keystrokes are simply "not like" authentic keystrokes, so they won't cluster.

Perhaps I am missing something?

0
On

I don't think you really want to do clustering here. You might want to do some proper fuzzy matching though instead of just allowing some delta on each value.

For clustering, you need to have many data points. Additionally, you'd need to know the proper number of means you need.

But what are these multiple objects meant to be? You have one data point for every keycode. You don't want to have the user type the password 100 times to see if he can do it consistently. And even then, what do you expect the clusters to be? You already know which keycode comes at which position, you don't want to find out what keycodes the user use for his password...

Sorry, I really don't see any clustering here. The term "fuzzy" seems to have mislead you to this clustering algorithm. Try "fuzzy logic" instead.