Indexing in elasticsearch example?

I have read a tutorial about document indexing in Elasticsearch. An example with mass indexing. My question is if this is correct when creating an array with two keys for one element in a loop:

for($i = 0; $i < 100; $i++) {
    $params['body'][] = array(
        'index' => array(
            '_id' => $i
        )
    );

    $params['body'][] = array(
        'my_field' => 'my_value',
        'second_field' => 'some more values'
    );
}

      

Why are there two array initializations in the loop $params['body'][]

? Should be setting the index using the same key as my_field

?

I mean one case where all the index information is added to an array by one key (index):

$params['body'][] = array(
            'index' => array(
                '_id' => $i
            ),

            'my_field' => 'my_value',
            'second_field' => 'some more values'
        );

      

Also after the search request I get the error:

Message: Invalid line offset 'match' in line containing:

$query['match']['name'] = $query;

      

where $query

is a string.

I suppose this error is with an issue when creating the index, so I started with that.

My code that adds the document to the index:

private function addDocument($data = array(), $type)
    {
        if (!empty($data)) {
            foreach ($data as $key => $val) {
                $params['body'][] = array(
                    'index' => array(
                        '_id' => $key,
                        '_type' => 'profiles',
                        '_index' => $this->_typeIndex($type)
                    )
                );

                $params['body'][] = (array)$val;
            }

            $this->client->bulk($params);
        }

    }

      

Is it correct? Because in search I get the error described here

+3


source to share


1 answer


For a large indexing job to work, the payload must contain one command (index, type, document id) and one line of content (actual document fields) per document, e.g .:

{"index": {"_id": "1234"}}               <--- command for doc1
{"field1": "value1", "field2": "value2"}  <--- source for doc1
{"index": {"_id": "1234"}}               <--- command for doc2
{"field1": "value1", "field2": "value2"}  <--- source for doc2
...

      

The PHP example you provided does exactly this:

$params['body'][] = array(
    'index' => array(
        '_id' => $i
    )
);

      

will create the first command line reading {"index": {"_id": "0"}}

and

$params['body'][] = array(
    'my_field' => 'my_value',
    'second_field' => 'some more values'
);

      

will create a second line of content containing {"my_field": "my_value", "second_field": "some more values"}

The for loop does this 100x and will create a payload containing 200 lines for 100 documents.

If you unite the body as you did with



$params['body'][] = array(
        'index' => array(
            '_id' => $i
        ),

        'my_field' => 'my_value',
        'second_field' => 'some more values'
    );

      

It won't work because it will create a separate line for each document, like this:

{"index":{"_id": "0"}, "my_field": "my_value", "second_field": "some more values"}

      

And the main operation will fail ...

Try again.

UPDATE

It doesn't work because you are adding too many lines. You have to uninstall foreach

and just do it like this. I just don't know how the field is called id

. Let's also assume that the array $data

contains the fields of the added document.

private function addDocument($data = array(), $type)
    {
        if (!empty($data)) {
            $params['body'][] = array(
                'index' => array(
                    '_id' => $data['id'],    <--- make sure to use the right id field
                    '_type' => 'profiles',
                    '_index' => $this->_typeIndex($type)
                )
            );
            $params['body'][] = $data;

            $this->client->bulk($params);
        }
    }

      

+4


source







All Articles