Polymorphism in the vector subset Rcpp

Here is one R method that I would like to translate to C ++ to speed things up.

setMethod("[[", signature=signature(x="ncdfFlowSet"),
               definition=function(x, i, j, use.exprs = TRUE, ...)
{
  #subset by j
  if(!missing(j)){
    if(is.character(j)){
      j <- match(j, localChNames)
      if(any(is.na(j)))
        stop("subscript out of bounds")
    }  

    fr@parameters <- fr@parameters[j, , drop = FALSE]
    localChNames <- localChNames[j]
  }

  #other stuff

}) 

      

Kevin's good work on the vector subset makes my life a lot easier for this subsetj

    // [[Rcpp::export]]
    Rcpp::S4 readFrame(Rcpp::S4 x
                        , std::string sampleName
                        , Rcpp::RObject j_obj
                        , bool useExpr
                        )
    {
        Rcpp::Environment frEnv = x.slot("frames");
        Rcpp::S4 frObj = frEnv.get(sampleName);
        Rcpp::S4 fr = Rcpp::clone(frObj);

          //get local channel names
          Rcpp::StringVector colnames = x.slot("colnames");

          Rcpp::StringVector ch_selected;
         /*
          * subset by j if applicable
          */
         int j_type = j_obj.sexp_type();
         //creating j index used for subsetting colnames and pdata
         Rcpp::IntegerVector j_indx;

         if(j_type == STRSXP)//when character vector
         {
             ch_selected = Rcpp::StringVector(j_obj.get__());
             unsigned nCol = ch_selected.size();
             j_indx = Rcpp::IntegerVector(nCol);
             //match ch_selected to colnames
            for(unsigned i = 0 ; i < nCol; i ++)
            {
                const Rcpp::internal::string_proxy<STRSXP> &thisCh = ch_selected(i);
                Rcpp::StringVector::iterator match_id = std::find(colnames.begin(), colnames.end(), thisCh);
                if(match_id == colnames.end()){
                    std::string strCh = Rcpp::as<std::string>(thisCh);
                    Rcpp::stop("j subscript out of bounds: " + strCh);
                }else
                {
                    j_indx(i) = match_id - colnames.begin();
                }
            }
         }
         else if(j_type == NILSXP)//j is set to NULL in R when not supplied
         {
             ch_selected = colnames;
         }
         else if(j_type == LGLSXP)
         {
             Rcpp::LogicalVector j_val(j_obj.get__());
             ch_selected = colnames[j_val];
             #to convert numeric indices to integer
         }
         else if(j_type == INTSXP)
         {
             Rcpp::IntegerVector j_val(j_obj.get__());
             j_indx = j_val - 1; //convert to 0-based index
             ch_selected = colnames[j_indx];
         }
         else if(j_type == REALSXP)
         {
             Rcpp::NumericVector j_val(j_obj.get__());
             #to convert numeric indices to integer
         }
         else
             Rcpp::stop("unsupported j expression!");
        /*
         * subset annotationDataFrame (a data frame)
         * 
         */
         if(j_type != NILSXP)
         {
            Rcpp::S4 pheno = fr.slot("parameters");
            Rcpp::DataFrame pData = pheno.slot("data");

            Rcpp::CharacterVector pd_name = pData["name"];
            Rcpp::CharacterVector pd_desc = pData["desc"];
            Rcpp::NumericVector pd_range = pData["range"];
            Rcpp::NumericVector pd_minRange = pData["minRange"];
            Rcpp::NumericVector pd_maxRange = pData["maxRange"];

            Rcpp::DataFrame plist = Rcpp::DataFrame::create(Rcpp::Named("name") = pd_name[j_indx]
                                                        ,Rcpp::Named("desc") = pd_desc[j_indx]
                                                        ,Rcpp::Named("range") = pd_range[j_indx]
                                                        ,Rcpp::Named("minRange") = pd_minRange[j_indx]
                                                        ,Rcpp::Named("maxRange") = pd_maxRange[j_indx]
                                                        );
            pheno.slot("data") = plist;
         }

      

However, j

indexing in R

usually allows different types of input ( character

, logical

or numeric

). I wonder if the same mechanism exists polymorphic

(possibly via an abstract vector pointer / reference) so that the redundant code (simply due to a different Rcpp :: ** Vector type) for [-subsetting

on data.frame

later can be avoided.

+3


source to share


1 answer


We usually advocate separating logic into a submit step and a template function step. Therefore, you should solve your problem something like this:

#include <Rcpp.h>
using namespace Rcpp;

template <typename T>
SEXP readFrame(Rcpp::S4 x, std::string sampleName, T const& j, bool useExpr) { 
    // use the typed 'j' expression
}

// [[Rcpp::export(subset)]]
SEXP readFrame_dispatch(Rcpp::S4 x, std::string sampleName, SEXP j, bool useExpr) 
    switch (TYPEOF(j)) {
    case INTSXP: return readFrame<IntegerVector>(x, sampleName, j, useExpr);
    case REALSXP: return readFrame<NumericVector>(x, sampleName, j, useExpr);
    case STRSXP: return readFrame<CharacterVector>(x, sampleName, j, useExpr);
    case LGLSXP: return readFrame<LogicalVector>(x, sampleName, j, useExpr);
    default: stop("Unsupported SEXP type");
    }
    return R_NilValue;
}

      

One of the design goals in Rcpp is to avoid runtime polymorphism whenever possible, for speed reasons - almost all polymorphism is done statically, and runtime lookups should ideally only be done once (unless we have to call back to R for some routines).



The dispatch code is a bit ugly and mechanical, but it allows for this "style" of programming. The code becomes much more readable if the "dispatch" is also separated from the "implementation", since you can hide the dispatch omission in one place.

I'm really wondering if there is macro magic that could reduce code duplication in the code for submitting this form, though ...

+5


source







All Articles