Electronics & Programming

develissimo

Open Source electronics development and programming

  • You are not logged in.
  • Root
  • » Django
  • » Preventing Google Web Accelerator from prefetching [RSS Feed]

#1 Nov. 16, 2005 22:16:43

Richie H.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


Hi,

I've written a small piece of middleware to prevent Google Web Accelerator
(or any other prefetching client) from prefetching URLs. Since this is my
first piece of middleware, I'd appreciate it if those more experienced
than me could tell me whether it looks sensible, or whether it's flawed in
some way. The intended behaviour is to return a 403 Forbidden if the
request carries an "X-Moz: prefetch" header. Here's the code:

from django.utils.httpwrappers import HttpResponseForbidden

class NoPrefetchMiddleware:
"""Prevents prefetching clients (eg. Google Web Accelerator) from
prefetching URLs. If your site ever changes state in response to a
GET request (eg. with a Logout link rather than a Logout button), you
need to suppress prefetch."""

def process_request(self, request):
if 'prefetch' in request.META.get('HTTP_X_MOZ', '').lower():
return HttpResponseForbidden()

Thanks!

--
Richie Hindle

Offline

#2 Nov. 16, 2005 23:10:50

Jacob K.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


On Nov 16, 2005, at 4:16 PM, Richie Hindle wrote:I've written a small piece of middleware to prevent Google WebAccelerator(or any other prefetching client) from prefetching URLs. Sincethis is myfirst piece of middleware, I'd appreciate it if those more experiencedthan me could tell me whether it looks sensible, or whether it'sflawed insome way. The intended behaviour is to return a 403 Forbidden if the
request carries an "X-Moz: prefetch" header. Here's the code:The code looks good and isn't flawed in any way.However... the concept is. Developers shouldn't be blocking GWA; weshould be programming web apps that conform to expected HTTPbehavior. GWA *only* issues GET requests, and if an app modifiesdata based on a GET, then the app should be considered broken.As far as Django is concerned, this means your non-idempotent viewsshould check that they're not being called with GET;django.views.decorators.http contains a set of easy view decoratorsthat will check for a given method transparently.Jacob

Offline

#3 Nov. 16, 2005 23:15:22

Luke P.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


On Wed, 16 Nov 2005 22:16:23 +0000 Richie Hindle wrote:

>
> Hi,
>
> I've written a small piece of middleware to prevent Google Web
> Accelerator (or any other prefetching client) from prefetching URLs.

Can I ask first of all why you are doing this? If you are trying to
conserve your bandwidth or similar, fine, but I know some people want
to use links (i.e. HTTP GET requests) which have side effects, which is
Bad.

Secondly, there may be some things to consider with web caches. I
think you should add a vary header to indicate that the response will
vary depending on the value of the HTTP_X_MOZ header, or some such.
There is some relevant documentation here:http://www.djangoproject.com/documentation/cache/Luke

--
"Mistakes: It could be that the purpose of your life is only to serve
as a warning to others." (despair.com)

Luke Plant || L.Plant.98 (at) cantab.net ||http://lukeplant.me.uk/

Offline

#4 Nov. 17, 2005 00:21:54

Richie H.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching



> I know some people want to use links (i.e. HTTP GET requests) which
> have side effects, which is Bad.


> if an app modifies
> data based on a GET, then the app should be considered broken.

"Logout" is often a link, like it or not. (Amazon, Gmail, Yahoo...)

And yes, server resources are another issue. And Evil, that's another
issue. 8-)

> Secondly, there may be some things to consider with web caches. I
> think you should add a vary header to indicate that the response will
> vary depending on the value of the HTTP_X_MOZ header, or some such.

Good point, thanks:

def process_request(self, request):
if 'prefetch' in request.META.get('HTTP_X_MOZ', '').lower():
response = HttpResponseForbidden()
response = 'x-moz'
return response

--
Richie Hindle

Offline

#5 Nov. 17, 2005 08:48:05

Simon W.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


On 16 Nov 2005, at 23:10, Jacob Kaplan-Moss wrote:However... the concept is. Developers shouldn't be blocking GWA;we should be programming web apps that conform to expected HTTPbehavior. GWA *only* issues GET requests, and if an app modifiesdata based on a GET, then the app should be considered broken.I'm afraid I just don't buy this. It holds for most cases, but thereare some significant ones where it doesn't. My favourite example isFlickr's internal message system (or any other Webmail). It tells youat the top of the page if you have any unread messages, and when youview your inbox it shows unread messages in bold. The act of viewinga message (by following a GET link) marks that message as read.Sure, you could require people to click a "mark as read" button thatdoes a POST, or even have the interface to select a message to readuse POST buttons. That would suck though - it would break the abilityto open a bunch of messages in a new tab for one thing.Meanwhile, GWA hits your inbox and instantly marks all your unreadmessages as read! (That's assuming Flickr doesn't block it - I'llhave to check).HTTP purity is a nice ideal, but until the HTML form model containsbetter support for calling HTTP verbs that reflect what you areactually trying to do it just isn't practical in every case. It'sthose edge cases that make GWA's behaviour a bad idea.Cheers,

Simon

Offline

#6 Nov. 17, 2005 15:59:42

Jeremy D.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


On 11/17/05, Simon Willison <> wrote:
> HTTP purity is a nice ideal, but until the HTML form model contains
> better support for calling HTTP verbs that reflect what you are
> actually trying to do it just isn't practical in every case. It's
> those edge cases that make GWA's behaviour a bad idea.

To pile on here, another "if only" bit is that if app-level auth was
done through HTTP, then GWA could just not prefetch on any page that
would have required auth headers. As it is, GWA can't know what
cookie-based auth is doing.

Following that line, I think GWA could be safer by just not
prefetching any request that would pass along HTTP auth or -any-
cookie. The down-side is obviously less pre-fetching, but it wouldn't
be dangerous.

And if you build a non-safe operation that general robots will trip
over, well, too bad. ;-)

This still leaves open sites which pass auth info in the URL, though.

Offline

#7 Nov. 17, 2005 18:45:31

Luke P.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


On Thu, 17 Nov 2005 00:21:29 +0000 Richie Hindle wrote:

>
>
>
> > I know some people want to use links (i.e. HTTP GET requests) which
> > have side effects, which is Bad.
>
>
> > if an app modifies
> > data based on a GET, then the app should be considered broken.
>
> "Logout" is often a link, like it or not. (Amazon, Gmail, Yahoo...)

If you have to make it appear as a link, I would try these alternatives
first:

1) have a <a> link which actually does a javascript submit of a POST
form, and a <noscript> block which has an <input type=submit> which
does the same thing (most people will never browse the site with
javascript off so it doesn't matter that it doesn't look as good)

2) have an <input type=image> that looks like a link but as it is
really an input button it can do a POST form submit.

But I know that developers are not always given the freedom to do the
right thing. At work I was forced to implement a non-idempotent GET
request recently, despite my protests. At the time I didn't have the
example of Google Web Accelerator to make my point more forcefully, or
I might have won the argument. So now, if anyone browses the site we
developed with GWA installed, they will mysteriously find themselves
subscribed to every page they visit...

Luke

--
"My capacity for happiness you could fit into a matchbox without taking
out the matches first." (Marvin the paranoid android)

Luke Plant || L.Plant.98 (at) cantab.net ||http://lukeplant.me.uk/

Offline

#8 Nov. 19, 2005 00:20:50

Eugene L.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


Inline.

"Richie Hindle" <> wrote in message

>
>
>
>> I know some people want to use links (i.e. HTTP GET requests) which
>> have side effects, which is Bad.
>
>
>> if an app modifies
>> data based on a GET, then the app should be considered broken.
>
> "Logout" is often a link, like it or not. (Amazon, Gmail, Yahoo...)

FWIW, Gmail's "Refresh" link is not a link:

<span class="lk" id="refresh">Refresh</span>

It is styled exactly like a link ("lk") with underlined blue text. There is
a code, which handles onclick event on this pseudo link.

"Sign out" is a link with parameters:http://mail.google.com/mail/?logout&hl=en--- GWA doesn't follow links with
parameters.

My point is it is possible to prevent GWA and similar systems from following
your links without server-side support.

Thanks,

Eugene

Offline

#9 Nov. 19, 2005 11:03:25

h.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


>behavior. GWA *only* issues GET requests, and if an app >modifies
>data based on a GET, then the app should be considered >broken.

Actually the problem goes deeper: GWA can crawl areas that normally
can't be crawled, because they are behind logins. So GWS will hit pages
that were never meant to be hit by bots - private pages. But pages that
aren't meant for public consumption have different requirements: you
design them more often for convenience than for "good HTTP behaviour".
So you will find GET-with-sideeffects more often behind logins than
before logins (because those with side-effects on GET will already be
hit by public bots).

GWA is a very bad idea, and it is done in a very bad way. I can't think
of any other google project where they fucked up that often (the last
one being to drop the header that designates that some request is done
by GWA instead of the browser itself). Even if you code your app to
expected HTTP behaviour, GWA itself isn't allways. And we can't code
our apps to HTTP brokeness of other apps ...

So especially because of it's problems it is an absolute valid request
to know how to block it out of some web site.

bye, Georg

Offline

#10 Nov. 19, 2005 11:06:30

h.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

Preventing Google Web Accelerator from prefetching


>> Secondly, there may be some things to consider with web caches. I
>> think you should add a vary header to indicate that the response will
>> vary depending on the value of the HTTP_X_MOZ header, or some such.
>
>Good point, thanks:
>
> def process_request(self, request):
> if 'prefetch' in request.META.get('HTTP_X_MOZ', '').lower():
> response = HttpResponseForbidden()
> response = 'x-moz'
> return response

Actually that will break all vary-header handling of Django. Better to
hook into the existing vary-header code. Look at the
django.views.decorators.vary stuff. It's mostly using the
patch_vary_headers from django.utils.cache.

bye, Georg

Offline

  • Root
  • » Django
  • » Preventing Google Web Accelerator from prefetching [RSS Feed]

Board footer

Moderator control

Enjoy the 18th of November
PoweredBy

The Forums are managed by develissimo stuff members, if you find any issues or misplaced content please help us to fix it. Thank you! Tell us via Contact Options
Leave a Message
Welcome to Develissimo Live Support