How many vectors can you add to DataFrame :: create (vec1, vec2 ...)?

I am creating a DataFrame to store parsed http logs haproxy files which have quite a lot of fields (25+).

If I add more than 20 vectors (one for each field), I get a compilation error:

no matching function call to 'create'

      

Create method:

    return DataFrame::create(
      _["clientIp"]     = clientIp,
      _["clientPort"]   = clientPort,
      _["acceptDate"]   = acceptDate,
      _["frontendName"] = frontendName,
      _["backendName"]  = backendName,
      _["serverName"]   = serverName,
      _["tq"]           = tq,
      _["tw"]           = tw,
      _["tc"]           = tc,
      _["tr"]           = tr,
      _["tt"]           = tt,
      _["status_code"]  = statusCode,
      _["bytes_read"]   = bytesRead,

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedRequestCookie"]   = capturedRequestCookie,
#endif     

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedResponseCookie"]   = capturedResponseCookie,
#endif    

      _["terminationState"] = terminationState,
      _["actconn"]          = actconn,
      _["feconn"]           = feconn,
      _["beconn"]           = beconn,
      _["srv_conn"]         = srvConn,
      _["retries"]          = retries,
      _["serverQueue"]      = serverQueue,
      _["backendQueue"]     = backendQueue 
    );

      

Questions

  • Did I manage to limit the limit?
  • Is there a workaround that allows me to add more than 20 vectors to the dataframe?
+3


source to share


2 answers


Yes, you are facing a hard Rcpp

cap - limited by the C ++ 98 standard, which requires explicit code bloat to support "variadic" arguments. Essentially, a new overload function must be generated for every function used create

, and to avoid suffocation, the compiler Rcpp

simply provides up to 20.

A workaround would be to use the 'builder' class where you add items sequentially and then convert at DataFrame

the end. A simple example of such a class - we create an object ListBuilder

for which we successively add

add new columns. Try running Rcpp::sourceCpp()

with this file to see the result.



#include <Rcpp.h>
using namespace Rcpp;

class ListBuilder {

public:

   ListBuilder() {};
   ~ListBuilder() {};

   inline ListBuilder& add(std::string const& name, SEXP x) {
      names.push_back(name);

      // NOTE: we need to protect the SEXPs we pass in; there is
      // probably a nicer way to handle this but ...
      elements.push_back(PROTECT(x));

      return *this;
   }

   inline operator List() const {
      List result(elements.size());
      for (size_t i = 0; i < elements.size(); ++i) {
         result[i] = elements[i];
      }
      result.attr("names") = wrap(names);
      UNPROTECT(elements.size());
      return result;
   }

   inline operator DataFrame() const {
      List result = static_cast<List>(*this);
      result.attr("class") = "data.frame";
      result.attr("row.names") = IntegerVector::create(NA_INTEGER, XLENGTH(elements[0]));
      return result;
   }

private:

   std::vector<std::string> names;
   std::vector<SEXP> elements;

   ListBuilder(ListBuilder const&) {}; // not safe to copy

};

// [[Rcpp::export]]
DataFrame test_builder(SEXP x, SEXP y, SEXP z) {
   return ListBuilder()
      .add("foo", x)
      .add("bar", y)
      .add("baz", z);
}

/*** R
test_builder(1:5, letters[1:5], rnorm(5))
*/

      

PS: Rcpp11

we have variadic functions and hence the constraints are removed.

+5


source


Another common approach with Rcpp is to simply use an external list containing as many DataFrame objects (each limited to the number of elements provided via old school layout extension / repetition) in the appropriate header) as needed.

In the (untested) code:



Rcpp::DataFrame a = Rcpp::DateFrame::create(/* ... */);
Rcpp::DataFrame b = Rcpp::DateFrame::create(/* ... */);
Rcpp::DataFrame c = Rcpp::DateFrame::create(/* ... */);

return Rcpp::List::create(Rcpp::Named("a") = a,
                          Rcpp::Named("b") = b,
                          Rcpp::Named("c") = c);

      

+3


source







All Articles