Solr returns HTTP error 400 or 500

So you work with Solr, read data from it, "Do all this data" and save the updates. It works! Let's send it! Then (when testing, thanks FSM) you start getting some odd glitches. Sometimes it works, sometimes the Solr server returns a 400 or 500 error. Tango Foxtrot whiskey?

Say this is a bookstore app. International bookstore. So, you have several code pages. Some names are in Spanish, some in Hebrew. The app itself is in American English. So your field names are English, titles and other text in Cyrillic, or for fun characters in Hebrew. You have noticed that one (but not all) titles in Hebrew are causing problems.

The next process is: Query Solr to get the record, update the record, and write the entire record back to Solr. You are updating the Account field from "5" to "4". Some headers are updated, some of them don't work. Googling reveals all sorts of red herrings: is this a byte memory issue? UTF8 control characters? Incorrect configuration? May be. But.

Given an update to the document that looks like this:

<add>
  <doc>
    <field name="StockNumber">1</field>
    <field name="Count">5</field>
    <field name="Title">רוקד עם זאבים</field>
    <field name="Translated_Title">Dances With Smurfs</field>
    <field name="Summary">Our Hero goes to another place, bonds with the Odd Looking Natives, & saves the day.</field>  
  </doc>
</add>

      

The problem is with the Summary field. In particular, "&" . It must be URL-encoded before "&amp;"

, otherwise the word following it is interpreted as a command and not as part of the update. Note that it was returned by the Solr query as "&" and not as "&amp;"

So you cannot simply accept the data returned from the query in Solr as being in the correct form to update Solr. Of course, if you encode the url of every field you read from Solr before you write it, you will be badly mangling it, since Hebrew (in our example) will be stored in hex form and then returned in that form (not as Hebrew) about future requests.

Solr, however, will save "&amp;"

as "&".

<and> have the same problems.

+3


source to share


1 answer


Try sending everything between CDATA tags from your client application. How:

<add>
  <doc>
    <field name="StockNumber"><![CDATA[1]]></field>
    <field name="Count"><![CDATA[5]]></field>
    <field name="Title"><![CDATA[רוקד עם זאבים]]></field>
    <field name="Translated_Title"><![CDATA[Dances With Smurfs]]></field>
    <field name="Summary"><![CDATA[Our Hero goes to another place, bonds with the Odd Looking Natives, & saves the day.]]></field>  
  </doc>
</add>

      



Of course, this is not necessary for entire fields, but if you are dynamically constructing a document from an application, using it is always easier.

The only warning is to make sure that the text does not already contain the CDATA tag. Double CDATA will cause problems all over the place.

+1


source







All Articles