Elasticsearch crawl and scroll - add to new index

Question

Elasticsearch crawl and scroll - add to new index

Elastic search and noobie command line programming.

I have elasticsearch installed locally on my machine and want to pull documents from a server using a different es version using the scan and scroll api and add them to my index. I am having trouble figuring out how to do this using the api array for es.

Right now in the testing phase, I just pull a few documents from the server using the following code (which works):

   http MY-OLD-ES.com:9200/INDEX/TYPE/_search?size=1000 | jq   .hits.hits[] -c | while read x; do id="`echo "$x" | jq -r ._id`"; index="`echo "$x" | jq -r ._index`"; type="`echo "$x" | jq -r ._type`"; doc="`echo "$x" | jq ._source`"; http put "localhost:9200/junk-$index/$type/$id" <<<"$doc"; done

Any clues on how scanning and scrolling works (noob and a little confused). So far, I know that I can scroll and get the scroll ID, but I don't know what to do with the scroll ID. If i call

http get http://MY-OLD-ES.com:9200/my_index/_search?scroll=1m&search_type=scan&size=10

I will get the scroll id. Could it be submitted and analyzed in the same way? Also, I believe I would need a while loop to tell it to keep asking. How am I supposed to do this?

Thank!

+1

http bash elasticsearch kibana-4

asia1028 04 Mar 15 at 12:17 am

source to share

1 answer

Justin warkentin · Accepted Answer · 2015-03-04T19:40:28+0000

The check and scroll documentation explains this pretty clearly. After you receive the scroll_id

(long base64 encoded string), you pass it in with the request body. With curl, the request will look something like this:

curl -XGET 'http://MY-OLD-ES.com:9200/_search/scroll?scroll=1m' -d '
c2Nhbjs1OzExODpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExOTpRNV9aY1VyUVM4U0 
NMd2pjWlJ3YWlBOzExNjpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExNzpRNV9aY1Vy
UVM4U0NMd2pjWlJ3YWlBOzEyMDpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzE7dG90YW
xfaGl0czoxOw==
'

Note that although there was a first request to open the scroll, there was a /my_index/_search

second request to read data /_search/scroll

. Every time you call this by passing a querystring ?scroll=1m

, it updates the timeout to automatically close the scroll.

There are two more things to be aware of:

The size

scroll that is passed on opening is applied to each shard, so you get size

multiplied by the number of shards in your index for each query.
Each request /_search/scroll

will return a new one scroll_id

, which you must pass on the next call to get the next batch of results. You can't just call with the same scroll_id

.

It is completed if no hits are returned in the scroll request.

Elasticsearch crawl and scroll - add to new index

More articles: