Parsing json in a list in logstash
I have json in the form
[
{
"foo":"bar"
}
]
I am trying to filter it using a json filter in logstash. But it doesn't seem to work. I found that I cannot parse the json list using the json filter in logstash. Can someone please tell me about any workaround for this?
UPDATE
My magazines
IP - - 0.000 0.000 [24/May/2015:06:51:13 +0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium+S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT%2B05%3A30&events=%5B%7B%22eV%22%3A%22com.olx.southasia%22%2C%22eC%22%3A%22appUpdate%22%2C%22eA%22%3A%22app_activated%22%2C%22eTz%22%3A%22GMT%2B05%3A30%22%2C%22eT%22%3A%221432386324909%22%2C%22eL%22%3A%22packageName%22%7D%5D * "-" "-" "-"
Decrypted version of the URL of the above log
IP - - 0.000 0.000 [24/May/2015:06:51:13 0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT+05:30&events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}] * "-" "-" "-"
Below is my config file for the specified logs.
filter {
urldecode{
field => "message"
}
grok {
match => ["message",'%{IP:clientip}%{GREEDYDATA} \[%{GREEDYDATA:timestamp}\] \*"%{WORD:method}%{GREEDYDATA}']
}
kv {
field_split => "&? "
}
json{
source=> "events"
}
geoip {
source => "clientip"
}
}
I need to parse events i.e. events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}]
source to share
I am assuming you have json in the file. You are correct, you cannot use json filter directly. You will have to use multi-line codec and use json filter afterwards.
The following configuration works for this input. However, you might have to change it to properly separate your events. It depends on your needs and the json format of your file.
Logstash config:
input {
file {
codec => multiline
{
pattern => "^\]" # Change to separate events
negate => true
what => previous
}
path => ["/absolute/path/to/your/json/file"]
start_position => "beginning"
sincedb_path => "/dev/null" # This is just for testing
}
}
filter {
mutate {
gsub => [ "message","\[",""]
gsub => [ "message","\n",""]
}
json { source => message }
}
UPDATE
After your update, I think I found the problem. Apparently you are getting jsonparsefailure because of the square brackets. As a workaround, you can manually remove them. Add the following mutation filter after your kv and before your json filter:
mutate {
gsub => [ "events","\]",""]
gsub => [ "events","\[",""]
}
UPDATE 2
It's good if your input looks like this:
[{"foo":"bar"},{"foo":"bar1"}]
Here are 4 options:
Option a) ugly gsub
An ugly workaround would be another gsub:
gsub => [ "event","\},\{",","]
But it will eliminate the internal relationship, so I guess you don't want to do that.
Option b) split
A better approach might be to use a split filter:
split {
field => "event"
terminator => ","
}
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
}
json{
source=> "event"
}
This will create several events. (First with foo = bar
, and the second with foo1 = bar1
.)
Option c) mutate split
You might want to have all the values ββin one logstash event. You can use mutate => split filter to generate array and parse json if entry exists. Unfortunately, you will need to set a conditional expression for each entry because logstash does not support loops in its configuration.
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
split => [ "event", "," ]
}
json{
source=> "event[0]"
target => "result[0]"
}
if 'event[1]' {
json{
source=> "event[1]"
target => "result[1]"
}
if 'event[2]' {
json{
source=> "event[2]"
target => "result[2]"
}
}
# You would have to specify more conditionals if you expect even more dictionaries
}
Option d) Ruby
As per your comment, I tried to find the ruby ββpath. The following works (after your kv filter):
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
}
ruby {
init => "require 'json'"
code => "
e = event['event'].split(',')
ary = Array.new
e.each do |x|
hash = JSON.parse(x)
hash.each do |key, value|
ary.push( { key => value } )
end
end
event['result'] = ary
"
}
Option e) Ruby
Use this approach after the kv filter (without installing the mutat filter):
ruby {
init => "require 'json'"
code => "
event['result'] = JSON.parse(event['event'])
"
}
It will analyze events like event=[{"name":"Alex","address":"NewYork"},{"name":"David","address":"NewJersey"}]
in
"result" => [
[0] {
"name" => "Alex",
"address" => "NewYork"
},
[1] {
"name" => "David",
"address" => "NewJersey"
}
Since kv filter behavior does not support whitespace. Hope you don't have any real inputs, do you?
source to share