Remove field from event by pattern
So, I am using the standard ELK stack to parse Apache access logs, which works well, but I am looking to parse URL parameters as fields using a KV filter to allow me to write better queries.
My problem is that this application I am analyzing has dynamically generated "caching" options, resulting in tens of thousands of "fields", each happening once. ElasticSearch has serious problems with this, and they are of no value to me, so I would like to remove them. Below is a sample template
GET /page?rand123PQY=ABC&other_var=something
GET /page?rand987ZDQ=DEF&other_var=something
In the example above, the options I want to remove are running "rand". Currently my logstash.conf is using grok to fetch fields from access logs and then kv to fetch query string parameters:
filter {
grok {
path => "/var/log/apache/access.log"
type => "apache-access"
}
kv {
field_split => "&?"
}
}
Is there a way to filter out any fields that match the pattern rand[A-Z0-9]*=[A-Z0-9]*
? Most of the examples I've seen are targeting fields by exact name, which I cannot use. I actually wondered how to redraw the query field to a new field by running KV and then deleting it. Will this work?
source to share
If the set of fields you are interested in is known and well-defined, you can set the target
filter to kv , move the fields of interest to the top level of the message using the mutate filter, and remove the field using nested key / value pairs. I think this is pretty much what you suggested at the end.
Alternatively, you can use ruby filter :
filter {
ruby {
code => "
event.to_hash.keys.each { |k|
if k.start_with?('rand')
event.remove(k)
end
}
"
}
}
source to share