Why does Googlebot only ask for HTML for JSON addresses?

From a page like this: https://medstro.com/groups/nejm-group-open-forum/discussions/61

I have code like this:

$.getJSON("/newsfeeds/61?order=activity&type=discussion", function(response) {
  $(".discussion-post-stream").replaceWith($(response.newsfeed_html));
  $(".stream-posts").before($("<div class=\'newsfeed-sorting-panel generic-12\' data-id=\'61\'>\n<div class=\'newsfeed-type-menu generic-12\'>\n<ul class=\'newsfeed-sorting-buttons\'>\n<li>\n<span>\nShow\n<\/span>\n<\/li>\n<li>\n<select id=\"type\" name=\"type\"><option selected=\"selected\" value=\"discussion\">Show All (15)<\/option>\n<option value=\"discussion_answered\">Answered Questions (15)<\/option>\n<option value=\"discussion_unanswered\">Unanswered Questions (0)<\/option><\/select>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n"));
  Newsfeed.prepare_for_newsfeed_sort($(".newsfeed-sorting-panel"));
});

      

Googlebot decided that it wants to see if there is any interesting HTML code in /newsfeeds/61?order=activity&amp;type=discussion

. So it tries to bypass that url with an HTML request and my app reports an error. "ActionView :: MissingTemplate: Missing template news feed / show ..."

  • Why is Googlebot trying to bypass this URL? Just because he thinks there is something interesting in there and he is trying to crawl around? Or is it something wrong in my code?
  • What's the best way to handle this in Rails? I don't want to ignore all MissingTemplate errors because there might be cases that signal something really wrong in the future. It's the same with ignoring errors generated by bots. Do I have other options?
+3


source to share


2 answers


There is nothing wrong with bots trying to find new links on your page. They do their job.

Perhaps you can use one of these meta tags in your view: Is there a way to make robots ignore certain text?



These meta say googlebot "don't look here"

<!--googleoff: all-->

$.getJSON("/newsfeeds/61?order=activity&amp;type=discussion", function(response) {
$(".discussion-post-stream").replaceWith($(response.newsfeed_html));
$(".stream-posts").before($("<div class=\'newsfeed-sorting-panel generic-12\' data-id=\'61\'>\n<div class=\'newsfeed-type-menu generic-12\'>\n<ul class=\'newsfeed-sorting-buttons\'>\n<li>\n<span>\nShow\n<\/span>\n<\/li>\n<li>\n<select id=\"type\" name=\"type\"><option selected=\"selected\" value=\"discussion\">Show All (15)<\/option>\n<option value=\"discussion_answered\">Answered Questions (15)<\/option>\n<option value=\"discussion_unanswered\">Unanswered Questions (0)<\/option><\/select>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n"));
Newsfeed.prepare_for_newsfeed_sort($(".newsfeed-sorting-panel"));
});

<!--googleon: all>

      

+1


source


Presumably it parsed this URL from the page source and is just trying to crawl your site.

Better to tell Google to crawl / not crawl your sitemap.xml and robots.txt file.



You can tell Googlebot not to crawl pages with these (or any) GET parameters in your robots.txt file:

Disallow: /*?

      

+1


source







All Articles