Tuesday, December 07, 2004

Pattern Discovery in Time Series

It's an algorithm that will take your time series (discrete time, nominal values - meaning things like sequences of bits or letters), and return a best fit markov model for it. NB: A special kind of markov model, which in the paper they call "deterministic". "Deterministic" doesn't mean that if you know the state of the model, you will be able to predict the model's output. It means that if you know the state of the model and one output symbol you will be able to predict the next state of the model.

It's better than the hidden markov model training algorithms because the number of states is not fixed. Also, the code is GPLed, and available, and easy to compile, which is very nice. I didn't give it any tricky data, but on the periodic sequences I tried it on, it did fine, with a pleasant dot.

I think that some people have worked hard on DFA-learning, which is similar to this, though of course they're talking about really deterministic automata.

Cons:
The sentence "Given data produced by a process, how can one extract meaningful, predictive patterns from it, without fixing in advance the kind of patterns one looks for?" You can't. This algorithm looks for patterns expressible by a small \epsilon-machine.
Overusing the word "causal" and the phrase "computational mechanics".
Not enough examples of \epsilon-machines (though the draft of part II, apparently abandoned, has a satisfactory number.)
The best answer for "how do you know the correct number of states for a HMM" that I know of is "use the minimum message length principle". This doesn't seem to be mentioned at all. On the other hand, I can't find a MML-based algorithm to download.

Overall, I recommend this paper.

0 Comments:

Post a Comment

<< Home