Varnish C VRT variables / functions

I am starting to pick a varnish and come across references to VRT functions in C code in our configuration (and examples on the net) for which I cannot find documentation (I understand my C knowledge is not -existant). This is the best I can find, but these are just prototypes: http://fossies.org/dox/varnish-4.0.2/vrt__obj_8h.html#a7b48e87e48beb191015eedf37489a290

So, here's the example we're using (and which seems to be copied from the net as I've found this many times):

C{
  #include <ctype.h>
  static void strtolower(char *c) {
    for (; *c; c++) {
      if (isupper(*c)) {
        *c = tolower(*c);
      }
    }
  }
}C

sub vcl_recv {
...stuff....
if (req.url ~ "<condition>" && (<another if condition>)) {
  C{
    strtolower((char *)VRT_r_req_url(sp));
  }C
}

      

So my questions are:

  • What's here? Where is it from? It is not defined anywhere and cannot find anything about it.
  • What does VRT_r_req_url do? Why is it VRT_ prefixed and what is r (I see there are also VRT_l_ functions). What is this structure from which they receive data?
  • Are all VRT functions parallelized to get variables equivalent to req.url query outside the C block?
  • Is there any documentation somewhere that says what all of this does? For example, I've seen this several times:

    sub detectmobile {
      C{
        VRT_SetHdr(sp, HDR_BEREQ, "\020X-Varnish-TeraWurfl:", "no1", vrt_magic_string_end);
      }C
     }
    
          

    So what are HDR_BEREQ and vrt_magic_string_end?

+3


source to share


1 answer


This will be a rather long answer, because there is a fair bit to say about your question. First, some nits about the C code in your VCL:

  • The implementation strtolower

    is probably unnecessary; the standard vmod has a std.tolower function. If you are using Varnish 3 you should use this instead. (However, the existence of this seems to mean that you can use Varnish 2, so who knows?)
  • Your challenge VRT_SetHdr

    seems unnecessary. I don't see any difference between this andset bereq.http.X-Varnish-TeraWurfl = "no1";

Some of my answers may not be very accurate because it is not clear which version of varnish you are using, but I'm going to guess

Now to answer your questions:

  • What's here? Where is it from? It is not defined anywhere and cannot find anything about it.

sp

is idiomatic in varnish, meaning a pointer to a session. It is of type struct sess

and contains some context about the execution request. Depending on which version of Varnish you are using, this may have more or less context, so it is difficult to determine the scope. In Varnish 2, a session contains everything from the workspace to requesting status (and more). Varnish 4 decomposed it significantly.

I am assuming you are using Varnish 2 or Varnish 3. In Varnish 4 you would be going through something called ctx

.

Anyway, from a configuration standpoint, the only thing you really need to know about sp

is that it is always the first argument to any function VRT

.

  1. What does VRT_r_req_url do? Why is it VRT_ prefixed and what is r (I see there are also VRT_l_ functions). What is this structure from which they receive data?

VRT stands for V CL R un T ime. It is a set of functions that are implemented inside the Varnish binary itself. Function signatures and some opaque structures are exposed to VCL via a header file. The VCL compiler uses this header file, along with the C code output it generates from your VCL, to create a generic object that can be loaded into varnish. In addition, there is a TCL script (this is Python in Varnish 4) that binds various VCL builtins and variables to VRT functions.

r and l mean right and left, and this has to do with where the variable is evaluated in the expression. Since the VCL does not allow any "complex" expressions (like addition or subtraction, it is certainly nowhere near Turing complete unless you set max_restarts to an unlimited value), there are really only two place variables available: right side or left. For example:

set req.url = req.url + "/"

      

will compile to

VRT_l_req_url(sp, VRT_r_req_url(sp), "/", vrt_magic_string_end);

      

Accessing req.url on the left forces the compiler to call VRT_l_req_url, while accessing the right side forces it to use VRT_r_req_url.

An easier way to think about it might be l means "set" and r means "get" (or "read" if you prefer). But it really means left and right.

To link this with a piece of code:

strtolower((char *)VRT_r_req_url(sp));

      

VRT_r_req_url

returns a const char *

representing the value req.url

. This pointer is thrown onto char *

to remove the qualifier const

. (This cast is a bug in your configuration.) A fill pointer is sent to strtolower

, which then decrements the string.

This is a mistake for several reasons. VRT_r_req_url

provided you const char *

back, so you really shouldn't change it. I don't think it will break anything, but it is a violation of the API contract you are giving. Also, the way you can write to req.url

is through the interface VRT_l_req_url

, not directly in your implementation strtolower

. So the correct way to do this is to use either std.tolower vmod or make a copy of the url in the session workspace to modify that copy and then save it back using VRT_l_req_url.

Alternatively, the implementation strtolower

does not need to be verified if (isupper(*c))

. This check only serves to confuse the processor branch predictor. tolower(3)

basically every implementation uses a non-extended lookup table, and characters (such as numbers) without a lower-order equivalent will not be converted.

  1. Are all VRT functions parallelized to get variables equivalent to req.url query outside the C block?

Yes. All VRT functions implement either function calls or variables. But I think you mean "inside the C block".

  1. Is there any documentation somewhere that says what all of this does? For example, I've seen this several times:
sub detectmobile {
  C{
    VRT_SetHdr(sp, HDR_BEREQ, "\020X-Varnish-TeraWurfl:", "no1", vrt_magic_string_end);
  }C
 }

      

So what are HDR_BEREQ and vrt_magic_string_end?



There is some documentation, but this requires the original diving. If you can tell which version of Varnish you are using, I can point you to some files that may be helpful in understanding what is going on.

HDR_BEREQ

tells to VRT_SetHdr

use a specific workspace containing a request to be sent to the server.

vrt_magic_string_end

is a sentry. Basically, all functions that can take a string argument can also contain a collection of strings concatenated together. Varnish solves this problem by using varargs for these functions by passing multiple arguments to it char *

. Typically, if you have a function with a variable number of arguments that are all pointers, you simply use a pointer NULL

to indicate that there are no more arguments. However, this is perfectly true for the value NULL

that must be passed to many of these functions. vrt_magic_string_end

is a constant pointer value that cannot be confused for any other pointer and is therefore a safe method to determine when no more functions have been passed to a function.

Consider calling log

like:

log req.url + " " + req.http.Wookies + "ha!"

      

This call will be converted to:

VRT_log(sp, VRT_r_req_url(sp), " ", VRT_GetHdr(sp, HDR_REQ, "\10Wookies:"), "ha!", vrt_magic_string_end);

      

If we didn't use vrt_magic_string_end and instead relied on NULL

, we would never have been able to figure out what "ha!" also requires printing.

Anyway, there is a lot of answer here. Hope this is helpful; please do not hesitate to ask questions if you have more.

Edit: follow-up questions

  • So, all the operations outside the C block are actually just calling C functions under the covers, and hence all functions and variables in the VCL correspond to a VRT function?

Yes, effective. Technically speaking, the VCL doesn't really have variables (or it probably works too). It is not a programming language in the strict sense of the word. It's just a language for customizing the Varnish HTTP state machine.

  1. In VRT_SetHdr, why are you specifying a workspace, but in VRT_r_req_url, you don't? As in do, I run VRT_r_bereq_url to get the internal address or I need to call it with the workspace to get this, something like VRT_r_req_url (sp, BEREQ) (or is it just not a valid operation because you never look up backend url)?
  2. How do I know when I need to pass a workspace or not, and what they are all about (i.e. HDR_BEREQ are obviously backend request headers, but what other workspaces are there)?

The answers to them are related, so I'll answer both of them in the same place.

This is because the place for req.url's solution is built into the function name, and it has some common weirdness in the way the VCL compiler does things. In HTTP, the URL is not part of the headers, but Varnish seems to treat it as it is. Likewise, things like beresp.ttl

or req.hash_always_miss

are not headers. When the bits we are looking at are not headers, we have to implement them on purpose.

Indeed, finding where req.url

implemented is difficult due to some rather unfortunate use of a macro without any comments. You are interested in cache_vrt_var.c: 64-95 .

Either way, the headers are dynamic, and you don't know where they will be (if they exist at all) until you get a request. When you access to the titles through any of the interfaces for the different states ( req.http.*

, bereq.http.*

, beresp.http.*

and resp.http.*

), you must enable them for this particular condition. To reduce code duplication, any header read or set using these methods is passed through VRT_GetHdr

or VRT_SetHdr

, respectively. As these functions are separated for all the states of the VCL, you pass them a hint to let them know, do not say whether you are on the headlines req

, bereq

, beresp

or resp

. So, as you can imagine, you have HDR_REQ

, HDR_BEREQ

, HDR_BERESP

and HDR_RESP

.

  1. For learning purposes (ignoring that there is a vmod for this), could you please update the post to show a better way to implement the strtolower function, avoiding the const modification through the dodgy cast and passing the wrong tolower type?

To be honest, you cannot do it safely because the VCL compiler is assigned an opaque type for struct sess

. Without creating a VMOD, the best thing you can do is:

#include <ctype.h>
static void 
strtolower(char *c)
{
  while (*c != '\0) {
    *c++ = tolower(*c);
  }
}

      

If you compile for C99 support, you can do this:

C{
  #include <ctype.h>
  static void 
  strtolower(const char *c, char *obuf)
  {
    while (*c != '\0') {
      *obuf++ = tolower(*c++);
    }
    *obuf = '\0';
  }
}C

...

if (req.url ~ "[A-Z]") {
  C{
    const char *url = VRT_r_req_url(sp);
    size_t urllen = strlen(url) + 1;
    char obuf[urllen];

    strtolower(url, obuf, urllen);
    VRT_l_req_url(sp, obuf, vrt_magic_str_end);
  }C
}

      

To be honest, this implementation is also small. You run the risk of blowing the stack by doing this when you get a long url and you don't want malloc inside the VCL. The actual implementation of strtolower does not perform bounds checking; it just requires that you have a buffer big enough to hold the string. These are all solvable problems, but I really don't want to spend a lot of time on this because this is the wrong way to do it. This is the exact reason for the creation of VMOD.

You can see that the standard stratoupper / strtolower implementation is significantly different: it reserves space from the work area, copies to the work area buffer, and then frees the space it did. Don't use.

(PS I got rid of the undefined behavior comments because I realized that the tolower (3) control reference I quoted meant that the input should be represented in an unsigned char. This is because tolower (3) takes an integer argument. the value you are passing may be out of range. So it was bad information and I canceled that.)

+9


source







All Articles