Elasticsearch: How to Reduce Disk Usage

I have Elasticsearch 1.2.2 installed on a Debian server with ~ 5.3M documents indexed. When I run myindex/_stats

I get the following information:

{
   "_shards": {
      "total": 10,
      "successful": 5,
      "failed": 0
   },
   "_all": {
      "primaries": {
         "docs": {
            "count": 5306837,
            "deleted": 100209
         },
         "store": {
            "size_in_bytes": 32003706527,
            "throttle_time_in_millis": 1657592
         },
  ....
}

      

which tells me the total size of my documents is ~ 32GB

However, the size of the data folder in the folder elasticsearch 72 GB

From Elasticsearch doc , I tried working

curl -XPOST 'http://localhost:9200/myindex/_optimize?only_expunge_deletes=true'

      

Running this command

  • reduced the number of deleted documents from 300 to 100k (as pointed out by the _stats command above), but not to 0 as I expected
  • reduced disk usage from 90G to 72G but not 32G, which is the actual size of my documents.

(note: I also ran this command for all indices = curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true

, no significant difference)

How can I reduce the size of the data folder to the actual size of my documents?

+3


source to share


3 answers


By default elasticsearch only merges a segment if its delete percentage is more than 10%. If you want to delete all documents marked as deleted in the index, you must change index.merge.policy.expunge_deletes_allowed in elasticsearch.yml and set it to 0, and then run the optimize command:

curl -XPOST ' http: // localhost: 9200 / myindex / _optimize? only_expunge_deletes = true '



You can take a look at this link for more details on the merge policy.

+3


source


You should run the following:

curl -XPOST 'http://localhost:9200/myindex/_optimize?max_num_segments=1

      



You may need to run it more than once. (Because if there are too many segments, it won't merge with all of them in one step.)

0


source


I think the difference you see in size has to do with indexing and document metadata, which is normal for any database. The size of the indices depends on your mappings. So technically the size of your documents will never be the same as the size of the elasticsearch data folder.

The following links may help explain this better:

Using too much disk space

Elastic blog on storage requirements

0


source







All Articles