Wednesday, September 16, 2009

My R Research

This is the first note of my research looking for implementing R to analyze data of many many rows and looking for the best way to implement it like a data mining tool.

First of all, it's clear R is not very suitable for many rows when you don't have enough RAM memory ( So the immediate solution is the incrementation to 32GB (best 64 bits). But I'm looking for another solutions....

Talking about a Data Mining suit, I'm trying to investigate the best possible solutions about this:
- Weka
- Rapid Miner
- Rattle

Maybe I need to consider standards like PMML (sourceforge) (I found an integration within Pentaho)

I'm looking for something like WebFOCUS RStat but open source.

Bayesian vs. Classical Statistics


Medical Search Engine

Culling through 10 million health articles and sorting search results on two types of data, "conditions" and "treatments," into manageable subsets, HealthBase includes "causes of," "treatments for," "complications of," and "pros and cons of treatment." Content sources are also provided and ranked.