From a development perspective, how does the site url structure and the .com site site work?

On the webmaster Q & A website, I asked the following:

https://webmasters.stackexchange.com/questions/42730/how-does-indeed-com-make-it-to-the-top-of-every-single-search-for-every-single-c

But I would like to get a little more information on this from a development perspective.

If you search Google for something related to work, like Gastonia Jobs (City + jobs), then in addition to the search results dominating the first page of Google, you get a URL structure that looks like this:

indeed.com/l-Gastonia,-NC-jobs.html

      

I'm pretty sure the L stands for location in the URL structure. If you're searching for an industry-related job or job with a specific company name, you get something like this (Microsoft jobs):

indeed.com/q-Microsoft-jobs.html

      

With just over 40,000 cities in the US, I thought okay, maybe they might have stuck with them and created a page for each one. It won't be difficult for a computer. But obviously the site is dynamic as each of these pages has 10,000 results and is broken down by 10. Obviously q is a query. Locations I can understand, but they may not have generated a web page for every combination of requests, could they?

Okay, he's getting a little weirder. I wanted to see if they had a sitemap, so I Google "really.com sitemap.xml". I got the answer:

indeed.com/q-Sitemap-xml-jobs.html

      

.. again, I searched for "url.com structure" and as I mentioned in another post on webmaster, I returned:

indeed.com/q-change-url-structure-l-Arkansas.html

      

Does the .com site actually use programming in some way to create a web page on the fly based on my Google input? If this is not the case, how can they have a static page for millions and millions and millions of possible combinations of queries, dynamically paginate them, and then all of them will dominate the first page of Google results (although this last question might be the best for the web QA masters)?

Is the javascript on the page interacting with the url in some way

+3


source to share


4 answers


Most likely not a bunch of pages. The "actual" page might be http://indeed.com/?referrer=google&searchterm=jobs%20in%20washington . The site then cleverly creates a human readable url using url rewriting, gets the jobs in the database matching the request, and voíla ...



Of course, I could be wrong. In truth, the technical aspect of this can probably be solved in many ways. Every time a job is added to a site, all the pages that need to be completed to match that job can be generated, which creates a huge number of pages for Google to crawl.

+1


source


This is a great question , but remains unanswered on the grounds that basic Google searches using

ste:indeed.com

      



returns 120MM results, and secondly, a query like "new york product manager" takes 1st place in the results. These pages are obviously pre-generated, as evidenced by the fact that the page is cached by the search engine (sometimes several days before), has different results from a real query on the site.

+1


source


It's easy if Google's crawler crawls pages on actual or any other job search site, this page is dynamically generated. Here's another site: http://jobuzu.co.uk I am running this, similar to how it works.

PHP is your friend on this and really doesn't just use standard databases for searches in Sphinx and Solr as they offer full text searches for better performance and then MySql etc.

0


source


They also make good use of rel = "canonical" and careful internal linking: http://www.indeed.com/find-jobs.jsp

Please note that all pages that actually rank can be found from this direct internal link structure.

0


source







All Articles