Elasticsearch crawl and scroll - add to new index

Elastic search and noobie command line programming.

I have elasticsearch installed locally on my machine and want to pull documents from a server using a different es version using the scan and scroll api and add them to my index. I am having trouble figuring out how to do this using the api array for es.

Right now in the testing phase, I just pull a few documents from the server using the following code (which works):

   http MY-OLD-ES.com:9200/INDEX/TYPE/_search?size=1000 | jq   .hits.hits[] -c | while read x; do id="`echo "$x" | jq -r ._id`"; index="`echo "$x" | jq -r ._index`"; type="`echo "$x" | jq -r ._type`"; doc="`echo "$x" | jq ._source`"; http put "localhost:9200/junk-$index/$type/$id" <<<"$doc"; done

      

Any clues on how scanning and scrolling works (noob and a little confused). So far, I know that I can scroll and get the scroll ID, but I don't know what to do with the scroll ID. If i call

http get http://MY-OLD-ES.com:9200/my_index/_search?scroll=1m&search_type=scan&size=10

      

I will get the scroll id. Could it be submitted and analyzed in the same way? Also, I believe I would need a while loop to tell it to keep asking. How am I supposed to do this?

Thank!

+1


source to share


1 answer


The check and scroll documentation explains this pretty clearly. After you receive the scroll_id

(long base64 encoded string), you pass it in with the request body. With curl, the request will look something like this:

curl -XGET 'http://MY-OLD-ES.com:9200/_search/scroll?scroll=1m' -d '
c2Nhbjs1OzExODpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExOTpRNV9aY1VyUVM4U0 
NMd2pjWlJ3YWlBOzExNjpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExNzpRNV9aY1Vy
UVM4U0NMd2pjWlJ3YWlBOzEyMDpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzE7dG90YW
xfaGl0czoxOw==
'

      

Note that although there was a first request to open the scroll, there was a /my_index/_search

second request to read data /_search/scroll

. Every time you call this by passing a querystring ?scroll=1m

, it updates the timeout to automatically close the scroll.



There are two more things to be aware of:

  • The size

    scroll that is passed on opening is applied to each shard, so you get size

    multiplied by the number of shards in your index for each query.
  • Each request /_search/scroll

    will return a new one scroll_id

    , which you must pass on the next call to get the next batch of results. You can't just call with the same scroll_id

    .

It is completed if no hits are returned in the scroll request.

+3


source







All Articles