PDA

View Full Version : Prevent stealing content



ogito
04-22-2006, 02:00 AM
Hi,

what do you do to prevent stealing content (with programs such Teleport, Vacuum, Titan, WebAuto etc..) ?
For example with Teleport you can change the user agent and the solution with robots.txt or .htaccess blocking certain agents doesn't work.

Hier is a solution with php, but I'm not sure if it could block also the SE spiders?

http://www.totalchoicehosting.com/forums/index.php?s=&showtopic=1997&view=findpost&p=12734

and

http://www.websitepublisher.net/forums/showpost.php?p=27092&postcount=40

Thanks

KLB
04-22-2006, 08:48 AM
As most legitimate SE's obey the robots.txt file and most site ripping programs ignore the robots.txt file the way I stop these programs is to create a "bad bot" trap. The idea is that I create a page that robots are denied access to via the robots.txt file. I then link to the page via hidden links on my pages. Machines will see the links but humans will not and if they do they'll get the point not to go there.

This bad bot page is then designed to record the useragent string, date and time of access and IP address of the requesting client to a database table. Each time one of my pages is loaded, that table is queried to see if the UA/IP address combination accessed the bad bot page recently. If it did the serving of the requested page is slowed down to a painful crawl by sleep instructions that are sprinkled throughout the source code of my scripts. Instead of taking one or two seconds to load my page takes one or two minutes to load for the offending client. The idea being this gums up the site caching program and slows down its progress. Once the requesting client has requested more than a predetermined number of pages my site starts feeding back very short empty pages such that are again sprinkled with sleep instructions so that the server very slowly dishes up the pages to the offending client.

The result is that the bad bot is slowed and contained preventing it from consuming too many resources and the individual trying to cache my site is none the wiser until later when they go back and discover they got a whole pile of blank pages. I'm sure sometimes there is someone out there who tries to cache my site on their laptop while at work or school so that they can take it home and review my site later only to discover when they get home that they got nothing.

To make sure the bot trap isn't picking off legititmate SEs, I occasionally review the bad bot tables and look for false hits by legitimate SEs, if I find one, I add their IP address to an exceptions table.

Marshal
01-03-2009, 07:03 AM
With Website Forge, you can protect everything on your web page, including HTML source code, JavaScript code, text, links and graphics. After successfully protecting your website, people will not be able to view or edit your source code. Your text cannot be copied to clipboard, your link addresses will not be displayed in the status bar of the browser, and your graphics will not be able to be saved by using the "Save image..." function provided by web browsers.
In short, your web page can get complete protection from Website Forge.
________
Amazon Gift Cards (http://bestfreegiftcard.com/amazon-gift-cards/)

Mr. Pink
01-04-2009, 11:15 AM
What are the drawbacks of using Website Forge? There must be some drawbacks, I am just curious as to what they may be.