Page 2 of 3 FirstFirst 123 LastLast
Results 16 to 30 of 42

Thread: Local Rank stuff...

  1. #16
    Registered Mike's Avatar
    Join Date
    May 2003
    Location
    UK
    Posts
    2,755
    Yeah, I'm interested in what chromate asked. Does google check the domain name owner or something?!?
    Don't you just love free internet games ?

  2. #17
    Registered
    Join Date
    Jan 2004
    Posts
    183

    google stuff

    It's difficult to explain everything(English is not my native lang. at all).
    Why don't everyone read the patents and after that we can discuss them.
    Go to: http://www.uspto.gov/patft/index.html
    There are 2 searches, one for issued, and one for published patents. Search for "an/google" (asignee "google") and read all of the 8 papers(ok, you can skip the "ambiguous query papers")

    Then come back and we can get all the peices together.
    Happy reading.

    Cheers

  3. #18
    Registered Mike's Avatar
    Join Date
    May 2003
    Location
    UK
    Posts
    2,755
    I'll have a read later. Btw, you do well to say your native language isn't english
    Don't you just love free internet games ?

  4. #19
    Registered Member incka's Avatar
    Join Date
    Aug 2003
    Location
    Wakefield, UK, EU
    Posts
    3,801
    I'm getting confused... Does this mean the links to our sites on places like here won't bring and PR any more?

  5. #20
    Future AstonMartin driver r2d2's Avatar
    Join Date
    Dec 2003
    Location
    UK
    Posts
    1,608
    How exactly is Google tracking document usage nohaber?

  6. #21
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Chris,
    google's patent on detecting duplicate and near duplicate documents, as well as query specific duplicate and near duplicate documents, as well as the local reranking patent are 3 years old. Google has been using IPs, similar documents info for a long time.
    First of all, having a patent and using it are two different things entirely. Second of all unless you're sure of something you shouldn't say it with such certainty.

    Up until November I was #1 on every coupon search I tried for with only incoming links from the same IP address. So unless your definition of "quite some time" is "from December" you're incorrect.

    Furthermore the use of IP addresses makes absolutely no sense when dealing with pages, as opposed to sites. Most internal site pages are going to have a preponderance of same-ip incoming links. How do you handle that? Google is not out to rank sites, they're out to rank pages. People often forget that.

    Google has also publicly stated that they do not like to give IP address bans because they know how hosting works. Thus we have a precedent where they do not like to do things via IP address.




    Now, my question to you, is where do you get the notion that site activity is going to influence Google Rank for pages that advertise with AdWords. Unless when you meant AdWords you meant AdSense then I'd ask why Google has persistently lied the many many times they have vehemently stated that AdWords & AdSense have no bearing on your rankings.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  7. #22
    Registered Member incka's Avatar
    Join Date
    Aug 2003
    Location
    Wakefield, UK, EU
    Posts
    3,801
    I think google will lean more on sites with links from similar pages, but as Chris said, all the IP stuff seems like B***S***...

    The duplicate content thing will mean all sites, such as online book sites, will be dropped because they have the same books. You must have got this wrong. Google will not be harsh on duplicate content or alot of the web wouldn't be listed.

  8. #23
    Registered
    Join Date
    Jan 2004
    Posts
    183

    google

    Chris,
    you are right - having a patent and using it are 2 different things. The patents I have read are from 3 years ago.

    I haven't said that incoming links from the same IP don't count. I am saying that they only count for the overall PR, not the local score.

    I don't understand what do you mean on the sites vs pages issue. I am only saying that google does not count same IP3 pages in the local score(which is just a part of all the factors). Based on the patents, I am 100% sure that google detects the same or nearly the same documents(like the same articles on other sites) and marks them as identical or nearly identical. When google generates the intitial set of candidate pages (for the search query), google removes duplicate pages and query-specific duplicate documents(another patent). Otherwise we will see the same most relevant documents for a given query. But we see different pages! Moreover, if a site has too many identical pages as another site, it is marked as an affiliate, and the incoming links from an affiliate site does not count in the local score(probably it also count in the initial relevance score).

    I think everyone took the local score to heart too much. Look, dmoz pages don't have any incoming links from the top results, and they still rank in top 10 (although they give local rank to other pages). Please, don't twist what I am saying.

    I never said anything about IP bans. I just say that certain links are given zero priority on the local score(if google wants, they can also give zero score for the initial relevance rankings).

    Where do I got the usage ranking idea?
    1) Google has a patent on it
    2) Certain pages fall from the search results. I have the following explanation:

    When a user puts a query, let's say "fitness software", google looks at the inverse index for "fitness" and "software". The inverse index contains ALL pages that contain a certain word. For both words ("fitness" and "software") there are millions of documents that contain them. Do you think google ranks them all every time a search is initiated??? HELL NO. That's impossible. If google tries to find all documents that contain both "fitness" and "software", google will have to go through MILLIONS of documents, which is impossible. Instead of this, google goes through the TOP N documents for a given word and calculates its rank. Suppose, that I search for "fitness software": google will have to find all documents that contain both "fitness" and "software", calculate their relevancy, and include proximity info for the ranknings(how close is fitness to software in a given document). It is impossible to answer a query quickly this way. It IS IMPOSSIBLE. So google, has to pre-sort the inverted index. So lets say that the word "software" is contained in 1000000 documents. They will be sorted by some rank into: "software", document1(highest score), document2..... and google will look only into the top (not more than 10000). The question is: "how does google sort its inverted index"?? It is not only the pagerank that matters, because higher-page-rank pages fall off the top 1000 pages. So, what are the additional scores that play?? I am 99% sure, that it is the usage score that plays here. The usage score is calculated by adsense and toolbar info. It could be the absolute number of visitors or a relative to previous periods of time visitors(for example, this month you can have 10% more visitors). I can't explain how a high PR page will fall from the initial Candidate Results Set(1000 pages). Can you give another explanation? I don't believe in overoptimization things.

    The inverted index is a very big data structure, and now I am thinking it is rebuilt very rarely together with the PageRank. PageRank calculation is a time-consuming task, on a graph of billions of vertices.

    Here's another example. Hours after google updated the page rank and inverted index, I changed my main page from "training software", "nutrition software" and optimized it to "diet software", "fitness software". After that I exchanged links with top "fitness software" pages. I am still not in the top 1000 results. WHY? Because, google, rebuilds its inverted index once a month or something and uses both PageRank and OTHER factors. Now, some PR1 pages are appearing, instead of my PR4 page! When google crawled it, and then indexed it(after having sorted the inverted index), it found it is too different from the previous page, and it marked it as DIFFERENT, and that's why I currently don't appear in the results. Google waits to recalculate the inverted index scores.

    I think people are twisting my words here. I am saying that the main factors are:
    PR -> ALL incoming links
    + usage info + other factors place a page on a given rank in the inverted index(candidate results, initial relevance score)
    For any page to appear in the results, it has to have a PR + the other factors and has to wait for the next inverted index update. The initial candidate results are produced from the above index, all duplicate and query-specific duplicate documents are removed and probably links from affiliate or same IP addresses are ranked less, that links from unaffiliated sites.

    After the initial candidate results are calculated using the sorted inverted index + proximity for multiple word queries, google does the local inter-connectivity reranking and shows the results. That's it. That's all I am saying. Every page, that has fallen off the results, is not in the initial candidate results. Google does not process all documents containing a word, because it will slow down the query. Every page has to do the following:
    1) build PR + use the keywords + other on-page factors + be requested(usage stats, according to me) to appear in the candidate results
    2) then local score reranks the pages.
    A page can be #1 without any local score, it really depends on the competition.

    I hope, someone understands me. Please, read the original Brin and Page paper on Google + all the patents.

    Cheers

  9. #24
    Senior Member chromate's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    2,348
    Google don't like duplicate content. Fact. To what extent do they look for the duplicate content? Who knows.

    The IP stuff doesn't seem like BS to me at all. We're not talking IP based bans as such, they're just been figured into the ranking algorithms. Of course, that's not to say Google are actually doing this. But I'm trying to cover all possibilities here It makes a lot of sense though. Google is based on a "voting system" right? Any voting system is made weaker by people being able to vote for themselves. I think this is the biggest weakness of the original concept behind PR. Google will almost certainly be trying to diminish this weakness. Looking at the correlation between IP's is one way to do this.

    I've read through the local rank patent now and it seems to make a lot of sense. If they have implemented it then I would imagine they've modified it some what. But it probably works along the same lines.

    I still don't understand why moving the site to another host wouldn't work though. The patent says that they would look at the first 3 octets of an IP address, that being the subnet of a class C IP address. I doubt that would have changed. If we move a site to another subnet, then why wouldn't that work?
    Last edited by chromate; 02-03-2004 at 02:43 PM.

  10. #25
    Senior Member chromate's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    2,348
    I've just noticed in fact that the current #1, #2 and #3 site for carbohydrate counter are all listed in that Carbohydrate Counter Yahoo directory category. Looks like I really need to get a spot in that category. Hmmm... could take ages. :/

  11. #26
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    For someone with no proof you've got a lot of wild theories you seem awfully sure of.

    I think that Google is likely drawing up a result set and reranking it based on local-interconnectivity. On that we agree.

    The rest of your post is pretty out there.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  12. #27
    Registered
    Join Date
    Jan 2004
    Posts
    183

    localrank

    First, I'm sorry, wherever I wrote AdWords, should be read AdSense.

    Chris,
    there's one simple way to test my wild theories, and that's visiting a particular stale site for one month.
    How do you explain, that sites with high PR fall off the google results? There's something more than PR. The local reranking stuff is done on the final 1000 candidate results, but why pages with high PR don't get in these 1000 candidate results? Local reranking is not new, because I have seen old articles noticing how DMOZ helps a lot(through local reranking). The patent is also old, and I don't think google's developers needed 3 years to put it into code.

    There's something new on google. The only other possible thing I can think of is word stemming(pages with "optimizations" kicking out pages for "optimization" or sth).

    Do you have a theory on this one?

    On the changing the IP stuff.
    If sites(pages) A and B have the same IP3 then they are labeled in the same affiliate group. If 2 sites have too much duplicate or near-duplicate content, their pages are also labeled with the same affiliate group.
    Now let's say you change the IP of site B. Google will crawl it as a completely new site, and when google indexes it, it will find that it is duplicate with the old B site, which still wouldn't have fallen from googles data. If a site X is affiliate with Y, and Y is affiliate with Z, then X and Z are also affiliates(by affiliate I mean duplicate content or IP3). I don't think google would treat the changed IP of site B as a new site, even in the future. Once a site is labeled an affiliate, it shouldn't change its affiliateID group unless both the IP and its CONTENT change.
    Basically changing the IP of site B wouldn't break its affiliate connection to site A. I wouldn't pay money for another IP just to find that google is smart.

    Cheers

  13. #28
    Registered
    Join Date
    Jan 2004
    Posts
    183

    sorry

    r2d2,
    sorry.
    The document usage can be tracked by the AdSense and the google toolbar. Everytime googleads appear google may log the IP. Every time you request a page, google toolbar requests its pagerank and it can be logged. Based on AdSense, very precise doc. usage can be obtained. Based on the google toolbar, it could be scaled by the % of users that have the toolbar or sth. Further the number of the times the site is shown in the results could play.
    The doc. usage score(if it exists) should play a marginal role, but on sites with a lot of competitions it should help. Read the original patent, the main idea of google was that a site needs time to get a good rank, while usage stats may be combined with the PR to get a combined rank and good sites will appear in the results faster.
    Also google may use it for some keywords.

    Anyway, the patent is old. If I were google I would experiment on this, and a human will compare the results with the innovation. If it improves the relevancy, it could be incorporated in google, otherwise, it would be just one of the many tried ideas at google

    That's what I think based on google's advice: "Make pages for visitors" (visitors may influence your rankings).

    Cheers

  14. #29
    Senior Member chromate's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    2,348

    Re: localrank

    Originally posted by nohaber
    On the changing the IP stuff.
    If sites(pages) A and B have the same IP3 then they are labeled in the same affiliate group. If 2 sites have too much duplicate or near-duplicate content, their pages are also labeled with the same affiliate group.
    Now let's say you change the IP of site B. Google will crawl it as a completely new site, and when google indexes it, it will find that it is duplicate with the old B site, which still wouldn't have fallen from googles data. If a site X is affiliate with Y, and Y is affiliate with Z, then X and Z are also affiliates(by affiliate I mean duplicate content or IP3). I don't think google would treat the changed IP of site B as a new site, even in the future. Once a site is labeled an affiliate, it shouldn't change its affiliateID group unless both the IP and its CONTENT change.
    Basically changing the IP of site B wouldn't break its affiliate connection to site A. I wouldn't pay money for another IP just to find that google is smart.
    This doesn't quite seem right. Otherwise every site on the same IP address would be considered the same site. Remember that the crawl is seperate to the ranking procedures. Once google crawles the site, it will find that the domain is associated with a new IP and that is all. I don't think it would consider it to be a totally new site purely because it has the same domain. This will be used in the initial result set and THEN that will be used to find interlinking between affiliated sites (with the same IP3). At least, from looking at the patent, that's how I understand it.

  15. #30
    Registered
    Join Date
    Jan 2004
    Posts
    183

    hm

    chromate,
    read the "detecting duplicate documents" patent.

    cheers

Similar Threads

  1. how soon can a new site get a page rank?
    By jimca in forum General Promotion
    Replies: 14
    Last Post: 11-24-2004, 06:16 PM
  2. how is this possible on google
    By Kyle in forum Search Engine Optimization
    Replies: 9
    Last Post: 01-28-2004, 05:40 AM
  3. Going local
    By Percept in forum General Chat
    Replies: 7
    Last Post: 01-03-2004, 07:30 AM
  4. Media Habit - page rank 7
    By Cloughie in forum Search Engine Optimization
    Replies: 4
    Last Post: 12-13-2003, 01:01 AM
  5. increasing page rank
    By Cloughie in forum Search Engine Optimization
    Replies: 2
    Last Post: 10-02-2003, 08:11 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •