Can i use numeric functions in crf model

Is it possible / helpful to add numeric functions in crf models? for example a position in a sequence.

I am using CRFsuite . It seems that all functions will be converted to string, eg. 'pos = 0', 'pos = 1', which then lose its value as Euclidean distance.

Or should I use them to train another model for example. svm and then an ensemble with crf models?

+4


source to share


3 answers


I found out that CRFsuite works with numeric functions, at least according to this documentation :

  • {'string_key': float_weight, ...} defines where keys are observables and values ​​are their weights;
  • {'string_key': bool, ...} dict; True is converted to 1.0 weight, False to 0.0;
  • {'string_key': 'string_value', ...} dict; this is the same as {'string_key = string_value': 1.0, ...}
  • ['string_key1', 'string_key2', ...] list; this is the same as {'string_key1': 1.0, 'string_key2': 1.0, ...}
  • {'string_prefix': {...}} dicts: the nested dict is processed and 'string_prefix' is appended to each key.
  • {'string_prefix': [...]} dicts: the nested list is processed and 'string_prefix' is appended to each key.
  • {'string_prefix': set ([...])} dicts: the nested list is processed and 'string_prefix' is appended to each key.


Until:

  1. I am keeping the input correctly formatted;
  2. I am using float versus float string;
  3. I will normalize this.
+8


source


CRF itself can use numeric functions, and you should use them, but if your implementations convert them to strings (binary-encode "one encoding hotspot") then this may be downgraded. I suggest looking for a cleaner CRF that allows continuous variables.



An interesting fact is that the CRF at its core is simply structured by MaxEnt (LogisticRegression), which runs on a contiguous domain , this string encoding is actually a way to go from a categorical value to a contiguous domain , so your problem is actually the result of "overriding" CRFSuite, which forgot about the real possibilities of the CRF model.

+4


source


Just to clarify my answer (this is correct, but might be confusing to other readers like me until I tried it). It:

{'string_key': float_weight, ...} defines where keys are observables and values ​​are their weights

could be written as

{'feature_template_name': feature_value, ...} defines where keys are item names and values ​​are their values

those. in doing so, you are not setting the weight for the CRF corresponding to this feature_template, but the value of this function. I prefer to refer to them in function templates that have function values ​​to make things clearer than just "functions". The CRF then knows the weight associated with each of the possible feature_values ​​for that feature_template

0


source







All Articles