This is the first note of my research looking for implementing R to analyze data of many many rows and looking for the best way to implement it like a data mining tool.
First of all, it's clear R is not very suitable for many rows when you don't have enough RAM memory (http://datamining.togaware.com/). So the immediate solution is the incrementation to 32GB (best 64 bits). But I'm looking for another solutions....
Talking about a Data Mining suit, I'm trying to investigate the best possible solutions about this:
- Weka
- Rapid Miner
- KNIME
- Rattle
Maybe I need to consider standards like PMML (sourceforge) (I found an integration within Pentaho)
I'm looking for something like WebFOCUS RStat but open source.
ROOT system can work with very large compressed data files on the flight
ReplyDeletehttp://root.cern.ch/drupal/content/documentation