Dremel - level of repetition and definition

Reading Online Analyzing Web Scale Datasets , I came across the concept of repetition and level of definition. While I understand the need for these two in order to be able to disambiguate the occurrences, it attaches repetition and definition to each value.

It is not clear to me how they calculated the levels ...

As it shown on the picture: enter image description here

It says:

Consider the code field in Figure 2. This happens three times in r1. The occurrences "en-us" and "en" are in the first name, and en-gb in the third. To eliminate these errors, we add a level of repetition to each value. This tells us which repeating field across the field's field has repeated the value.


The Name.Language.Code field path contains two duplicate fields, name and language. Therefore, the code repetition rate ranges from 0 to 2; Level 0 means the start of a new record. Now suppose we scan the r1 record from top to bottom. When we came across "en-us" we didn't see any duplicate fields, ie. The repetition level is 0. When we see "en, the Language field is repeated, so the repetition levelis2.

I just can not force me to get around it, Name.Language.Code

in the r1

matter en-us

and en

. So far this is the first r = 0

, and the second r = 2

is because two definitions were repeated? (language and code)?

If it was:

Name
    Language
       Code: en-us
Name 
    Language
        Code: en
Name
    Language
        Code: en-gb

      

Will it be?

0 2
1 2
2 2 

      


Definition levels. Each field value with direction p, esp. each NULL has a definition level that determines how many fields in p that may not be defined (since they are optional or duplicate) are actually present in the record.

Why is the definition level 2 then? Doesn't the path Name.Language

contain two fields Code

and Country

where only 1 is there optional\repeated

?

+3


source to share





All Articles