How many vectors can you add to DataFrame :: create (vec1, vec2 ...)?
I am creating a DataFrame to store parsed http logs haproxy files which have quite a lot of fields (25+).
If I add more than 20 vectors (one for each field), I get a compilation error:
no matching function call to 'create'
Create method:
return DataFrame::create(
_["clientIp"] = clientIp,
_["clientPort"] = clientPort,
_["acceptDate"] = acceptDate,
_["frontendName"] = frontendName,
_["backendName"] = backendName,
_["serverName"] = serverName,
_["tq"] = tq,
_["tw"] = tw,
_["tc"] = tc,
_["tr"] = tr,
_["tt"] = tt,
_["status_code"] = statusCode,
_["bytes_read"] = bytesRead,
#if CAPTURED_REQUEST_COOKIE_FIELD == 1
_["capturedRequestCookie"] = capturedRequestCookie,
#endif
#if CAPTURED_REQUEST_COOKIE_FIELD == 1
_["capturedResponseCookie"] = capturedResponseCookie,
#endif
_["terminationState"] = terminationState,
_["actconn"] = actconn,
_["feconn"] = feconn,
_["beconn"] = beconn,
_["srv_conn"] = srvConn,
_["retries"] = retries,
_["serverQueue"] = serverQueue,
_["backendQueue"] = backendQueue
);
Questions
- Did I manage to limit the limit?
- Is there a workaround that allows me to add more than 20 vectors to the dataframe?
source to share
Yes, you are facing a hard Rcpp
cap - limited by the C ++ 98 standard, which requires explicit code bloat to support "variadic" arguments. Essentially, a new overload function must be generated for every function used create
, and to avoid suffocation, the compiler Rcpp
simply provides up to 20.
A workaround would be to use the 'builder' class where you add items sequentially and then convert at DataFrame
the end. A simple example of such a class - we create an object ListBuilder
for which we successively add
add new columns. Try running Rcpp::sourceCpp()
with this file to see the result.
#include <Rcpp.h>
using namespace Rcpp;
class ListBuilder {
public:
ListBuilder() {};
~ListBuilder() {};
inline ListBuilder& add(std::string const& name, SEXP x) {
names.push_back(name);
// NOTE: we need to protect the SEXPs we pass in; there is
// probably a nicer way to handle this but ...
elements.push_back(PROTECT(x));
return *this;
}
inline operator List() const {
List result(elements.size());
for (size_t i = 0; i < elements.size(); ++i) {
result[i] = elements[i];
}
result.attr("names") = wrap(names);
UNPROTECT(elements.size());
return result;
}
inline operator DataFrame() const {
List result = static_cast<List>(*this);
result.attr("class") = "data.frame";
result.attr("row.names") = IntegerVector::create(NA_INTEGER, XLENGTH(elements[0]));
return result;
}
private:
std::vector<std::string> names;
std::vector<SEXP> elements;
ListBuilder(ListBuilder const&) {}; // not safe to copy
};
// [[Rcpp::export]]
DataFrame test_builder(SEXP x, SEXP y, SEXP z) {
return ListBuilder()
.add("foo", x)
.add("bar", y)
.add("baz", z);
}
/*** R
test_builder(1:5, letters[1:5], rnorm(5))
*/
PS: Rcpp11
we have variadic functions and hence the constraints are removed.
source to share
Another common approach with Rcpp is to simply use an external list containing as many DataFrame objects (each limited to the number of elements provided via old school layout extension / repetition) in the appropriate header) as needed.
In the (untested) code:
Rcpp::DataFrame a = Rcpp::DateFrame::create(/* ... */);
Rcpp::DataFrame b = Rcpp::DateFrame::create(/* ... */);
Rcpp::DataFrame c = Rcpp::DateFrame::create(/* ... */);
return Rcpp::List::create(Rcpp::Named("a") = a,
Rcpp::Named("b") = b,
Rcpp::Named("c") = c);
source to share