Defining Sequence Notation ... (A), (A> B) and (A) - (A> B)

Question

Defining Sequence Notation ... (A), (A> B) and (A) - (A> B)

Hopefully quick ...

For output from operations seqefsub()

, please indicate the definition of output record.

More specifically, parentheses, eg.

(A)

means that;
than the sign in (A>B)

means that;
and the hyphen in (A)-(A>B)

means what.

Section 10

the excellent User Guide contains examples, but I may have missed an unambiguous definition.

To give an example in Section 10.2

tutorials, what is the conceptual difference between (Parent)-(Parent>Left)

and simple (Parent>Left)

?

Thank,

Dave

Update after Gilbert's comment ....

In trying to clarify what I may have missed on page 106 of the user manual, I think the explanation - or at least confirmation - what I was looking for was something like the following structure. Apologies for any awkward publicity.

The context is here when the results appear in the console seqefsub()

....

(A)

This is the number of times that state A appears as the first state and not as any subsequent state. That is - it counts the number of times A appears in the first column. I assume that I haven't missed another configuration option that takes into account the first and all subsequent states of that type. If there is, please let me know.

(A>B)

This is the number of occurrences of an event (ie, a state change) from A to B. This number refers to events anywhere in the sequence. I am assuming this is slightly different from the above government account, assuming I did not accidentally misrepresent things. I note that limits can be set to output one or more occurrences.

(A)-(A>B)

this counts the number of times that state A occurs as the first state, and where event AB occurs anywhere in the sequence. This includes AB events immediately after the first state, and may include interference of other states between the first A state and the AB event.

Hopefully this helps, and I hope this is the correct set of statements (based on research later than my original question).

Second update after Gilbert's comment asking for an example.

For a real dataset ... (where J and I instead of A and B)

> data   
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1   I  J  J  I  J  J  I  K  J   D   J
2   G  K  R  I  J  D  J  R  I   J   N
3   K  K  I  R  M  M  K  R  J   K   I
4   R  R  B  R  I  G  R  G  R   G   G
5   J  J  J  J  J  J  J  T  Z   J   Z
6   R  K  R  K  M  R  R  J  J   J   R
7   J  I  I  I  I  I  I  I  I   I   I
8   J  J  J  J  J  J  J  J  J   J   R
9   J  R  J  R  J  R  J  J  I   S   R
10  J  J  J  J  J  I  J  J  J   J   J
11  G  J  J  J  J  I  I  I  R   J   J
12  I  I  D  M  D  I  I  D  I   I   D
13  R  M  R  R  J  J  J  J  J   J   J

then

> dataseq <- seqdef(data)

> dataseqe <- seqecreate(dataseq)

> datasubseq <- seqefsub(dataseqe, pMinSupport = 0.05)

> datasubseq[1:10]

gives

    Subsequence   Support Count
1          (J) 0.3846154     5
2        (J>I) 0.3846154     5
3        (R>J) 0.3846154     5
4        (J>R) 0.3076923     4
5        (I>J) 0.2307692     3
6    (J)-(J>I) 0.2307692     3
7        (K>R) 0.2307692     3
8          (R) 0.2307692     3
9        (D>J) 0.1538462     2
10         (G) 0.1538462     2

So....

1) counting 5 J-states (J)

only applies to the first column / entry and not to any subsequent J-states. There are 57 J-states in total.

2) the count of 5 J-state state events in the I-state (J>I)

is the total (for this limit option) when they occur.

3) counting 3 J-states followed by subsequences of (J)-(J>I)

J-state-to-I approaches are the event counts on line 7 (cols 1 and 2), line 9 (col 1 and cols 8 and 9) and finally line 10 (col 1 and cols 5 and 6); the last two cases have intermediate states and / or events between (J)

and (J>I)

.

In response to the question, this is correct and expected behavior and correct interpretation. If so, why do state counts done on a different basis take into account event / state changes?

+3

traminer

Big old dave 29 nov. 14 at 21:24

source to share

1 answer

Gilbert · Accepted Answer · 2014-12-02T07:31:14+0000

In your example, event sequences are derived from a state sequence object dataseq

with seqecreate(dataseq)

. Since you don't provide an argument tevent

, the default is used tevent = "transition"

(see help(seqecreate)

). With this value, events are defined as state- A

to-state transitions B

and are marked as A>B

. In addition, a specific event with a label A

is associated with the start of the sequence, indicating the state at the beginning of the sequence. Thus, although the same symbol is used, it A

is an event in a sequence of events --- the start of an event --- and should not be confused with A

states where this is a state.

The above applies to the option tevent="transition"

. For example, on tevent="state"

events will be the start of spells and are marked as A

to indicate the start of a spell in the state A

. In this case, the event A

can occur anywhere in the sequence, and not only at the beginning.

Now about the parentheses. They indicate transitions (or transactions), and a transition is defined as a collection of simultaneous events that provoke a state change. For example:

(a,b)

indicates that two events occur at the same time A

and B

,

(A>C)

means that we have one event A>C

at a time.

(a)-(b)

denotes a sequence of length 2, where the event A

precedes the event B

.

Hope it helps.

Defining Sequence Notation ... (A), (A> B) and (A) - (A> B)

More articles: