I had to pry myself away from Lucene to explore WEKA, a open-source machine learning tool that has been around a while and like all good open source, just keeps getting better and better. And the book that introduces the concepts, theory, and practice couldn’t be better written.
If you’re like me, when you think of data mining you think of databases, but that’s only a part of what data mining. WEKA is about machine learning. It’s like having your own Magic 8 Ball. Ask a question; shake the ball, out comes the answer.
Here’s my first question.
If I only know the counts of key words in a legacy source code module, can I determine the functionality within that code?
So I retrieved a legacy application, counted the key words, classified the key word by functionality, calculated the percentages, and told WEKA what the outcomes were.
WEAK crunched on the numbers and found actual rules I could use to make future module classifications:
Correctly Classified Instances 116 65.9091 %
Incorrectly Classified Instances 60 34.0909 %
WEKA even built a graphical decision tree for me:
How’d WEKA do? It correctly classified 66% of the cases. Can it get better? Sure, as I collect more and more classification data, WEKA will learn to produce better rules.
The best part is that we can build WEKA into VI Explorer. More to come on this exciting technology….
We encourage you to share your comments on this post. Comments are moderated and will be reviewed and posted as promptly as possible during regular business hours.
To ensure your comment is published, please follow our community guidelines.