PDA

View Full Version : Chris, article: search engine friendly urls



nohaber
05-21-2004, 06:48 AM
Chris,
in a google paper + one of the patents, it was mentioned that google uses the number of "/" in the url to judge the importance of a page for crawling (it might also have something to do with ranking). It says that pages with more "/" have lower priority for crawling. They consider that more "/" in the url, means the page is less important. It also might have something to do with ranking.

Now, instead of using mydomain.com/article/parameter1/parameter2, why don't we use mydomain.com/article-parameter1-parameter2 and get the two parameters from the url. It should be fairly easy with php.

Any thoughts?

Kyle
05-21-2004, 06:54 AM
nohaber - I do not believe this and need proof. Please paste links to the documents you reference.

nohaber
05-21-2004, 07:06 AM
http://citeseer.ist.psu.edu/cho98efficient.html
Download the PDF and read the part about "Location Metric".
http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=3&p=1&f=G&l=50&d=PG01&S1=google.AS.&OS=an/google&RS=AN/google

This is a google patent. Check this part out:

[0036] The frequency of visit score equals log2(1+log(VF)/log(MAXVF). VF is the number of times that the document was visited (or accessed) in one month, and MAXVF is set to 2000. A small value is used when VF is unknown. If the unique user is less than 10, it equals 0.5*UU/10; otherwise, it equals 0.5*(1+UU/MAXUU). UU is the number of unique hosts/IPs that access the document in one month, and MAXUU is set to 400. A small value is used when UU is unknown. The path length score equals log(K-PL)/log(K). PL is the number of `/` characters in the document's path, and K is set to 20.

----

I also think that googleguy said it in a post, but I'll have to dig it up.

Kyle
05-21-2004, 07:10 AM
Hmm this is interesting. Visit score?

nohaber
05-21-2004, 07:22 AM
Yes, it's interesting. The patent talks about combining all other scores with document usage score. Generally, the more users visit/revisit a page, the higher its usage score. I don't think google uses this thing. It's a bit hard to implement, though I think in the PageRank paper, they said that they compared PageRank to usage stats to see if they match (PageRank is after all a probability score of a user visiting a page). The paper mentioned that they got the usage stats from the logs of some major web servers? :confused:
If they can buy the usage stats data from backbone web servers, they might implement it some day. Though, I don't know if there are some legal privacy concerns.
Anyway, the point is that these guys mention that the number of "/" can be used as a measurement of the importance of a page. I wouldn't be surprised if they use this info to weight links. For example, a link from mydomain.com might be weighted higher than a link from mydomain.com/sub/sub2/page.html with the same PR and # of links.

Kyle
05-21-2004, 07:31 AM
I would not worry about the /'s. I've seen lots of high ranking sites with many levels.. ESPECIALLY EDUs.. EDUs are the king of this.

chromate
05-21-2004, 10:19 AM
I've seen that shorter URLs get crawled easier / more often. GoogleGuy has also said something to indicate that this is true. But I don't think it would have any real effect on ranking. Maybe slight, but nothing really worth worrying about.

nohaber
05-21-2004, 10:45 AM
According to the above formula an url with one / (home page) will have ~2% higher score than an url with two "/"s (such as /articles/page.htm). The difference between 1 and 3 "/"s is ~4%.
Not a big deal, but that's why it is called "optimization" :)