Un-Published items displayed in Drupal search results (google search appliacne)

I recently inherited a Drupal 5 site and made a number of improvements. Some of them revolve around search results.

  • Unpublished pages displayed in search engine results. Some of these are old pages, others are recently unpublished. All are correctly marked as unpublished in the CMS and still appear.

  • Obsolete pages appear from the search engine. The URL path structure has been changed and these items are old results in the DB.

From what I can tell, the site is using the Google Search Appliance (GSA) for search, not Drupal's default search. Is there a way that I can be sure that it is using GSA other than the module is enabled?

If it's a GSA, it looks like I could get someone with GSA access to restore the site's search results. Is it correct?

If rebuilding the search results is the right way to go, it seems that whenever enough content is removed from the site, I need to get someone to rebuild the search. Is there a better / automatic way?

+2


source to share


4 answers


It looks like drupal is handling the search. Google will need db access to display unpublished nodes. You may be using views for search, but forgot to only publish the nodes.



If Drupal handles searches, you just need to reset and rebuild the search index. This can be done without too much trouble if you have too much content.

+1


source


GSA can still display remote content depending on the data source.

If content comes from a database feed and is then removed from the request, it will be removed. If content comes from natural crawl or through a dedicated connector channel, it will not be removed from the index when removed. Instead, it should naturally pop out of the index, which can take a while.



One way to block a remote URL from showing up is to do it through the interface. In the GSA Admin interface, go to Maintenance> Front Ends, then select your interface and go to the Remove URL tab. You can either list your url or block a group of urls via regular expressions.

+1


source


I posted an answer to your more general question regarding node access . The problem with search results could be related to this.

0


source


To update your Google Appliance, you can try XmlSiteMap , a module that publishes the correct xml sitemap for all your content.

For an online site, publishing a Sitemap is a good way to keep search engines updated as they can use it to check out new pages and to clean up old pages. I assume the Google Appliance will use this too.

0


source







All Articles