When requests.get doesn’t work, what do you do?

If you are using the Python requests module to load web content, you may occassionally run into problem with access, where you get “Forbidden”, or “Access Denied” or “Too many requests” errors.

These are especially frustrating when you find that issuing the EXACT SAME REQUEST in a web browser results in no problem at all.

It really makes you wonder: why does my web browser work, but my requests.get or requests.post doesn’t? Especially when the requests are coming from the exact same computer?

So, it isn’t always the case, but it is often the case that the problem is something called the ‘User-Agent’.

The ‘User-Agent’ is the identifier for a particular kind of “browser-like-thing”. Each web browser, e.g. Firefox, Chrome, IE, Safari, etc. has its own particular signature that it sends along with requests in a header called the ‘User-Agent’.

And, when you send a request from the Python requests module, if you don’t set a User-Agent, then you might appear to be some kind of badly behaved bot. This often triggers code on the server that will keep you out!

There are a couple of ways to address this.

One is to set a User-Agent that describes who you are and what you are doing. It is best, in this case, to add something to your User-Agent string that differentiates your request from others, especially if you are on a shared machine such as an ACMS machine. For example:

>>> result = requests.get("http://www.reddit.com/r/ucsd.json")

>>> result = requests.get("http://www.reddit.com/r/ucsd.json", headers = {'User-agent': 'spis16t3 bot 1.0'})