Loading data via rClr to create a data frame is slow

Apr 29, 2014 at 2:32 AM
Hi,

I report something that I got through another channel.
Example. Loading Time Series with thousands of records.
The data itself loads in seconds. But to go from data -> data frame via the rclr calls is incredibly slow.
Is there anyway that I can create a dataframe on the native side using dot.net and just send this back to R?
Transferring numeric/integer/boolean data via rClr is at the rate of 20~30 million numbers per second. Character vectors are slower, though I cannot recall the speed. That said, once R.NET is activated (setRDotNet(TRUE)) the speed may drop a lot. R.NET had some inefficient mechanisms, which I have since fixed. I cannot recall whether these fixes are present in the latest rClr release.

Do you have a self contained sample code illustrating the issue?

Cheers
Apr 29, 2014 at 1:12 PM
Edited Apr 29, 2014 at 2:29 PM
I've noticed with every call I make to get the loop gets slower. Ideally, I could just push this entire data structure into R some how.
# Dict contains enumerator to arrays of "RBar" objects
dict = clrCallStatic('MyAssebmly.R','loadDailyDatasForR', symbols, as.Date(startDate), as.Date(endDate))
  
for (i in 1:length(symbols))
{
  if (clrCall(dict, 'ContainsKey',symbols[i]))
  {
    iter = clrCall(dict,'get_Item',symbols[i])
    while(clrCall(iter,'MoveNext'))
    {
      dat = clrGet(iter,'Current')
      dt = clrGet(dat, 'EndTime')
      open = clrGet(dat, 'Open')
      high = clrGet(dat, 'High')
      low = clrGet(dat, 'Low')
      close = clrGet(dat, 'Close')
      volume = clrGet(dat, 'Volume')
      df = data.frame(Symbol=symbols[i],EndTime=dt,Open=open,High=high,Low=low,Close=close,Volume=volume)
      rbind(bars,df) -> bars
    }
    cat(paste('Loaded: ', symbols[i],'\n'))
  }      
  
}
So the solution seems to be to put everything into array form on the native side and return an object which contains the relevant fields.

E.g.
type MultipleRBars = { Symbol:string[]; EndTime:DateTime[]; Open:float[]; High:float[]; Low:float[]; Close:float[]; Volume:float[] }
Apr 29, 2014 at 11:28 PM
Hi,

You are right that passing longer arrays is more efficient than iterating through items; the calls to clrGet() are necessarily costly.

However, not sure the slowing down in your loop was caused by rClr; rbind(bars,df) -> bars is probably what causes it (copy of increasingly large data frames)

I am working on an update of the package, using the latest R.NET as a dependency. Should not be too long before I can upload it.
Jul 30, 2014 at 11:32 PM
Eagerly looking forward to the update of the package!