Forum login using Python queries

I am trying to log into a forum using python requests. This is the forum I'm trying to login to: http://fans.heat.nba.com/community/

Here's my code:

import requests
import sys

URL = "http://fans.heat.nba.com/community/index.php?app=core&module=global&section=login"

def main():
    session = requests.Session()

    # This is the form data that the page sends when logging in
    login_data = {
        'ips_username': 'username',
        'ips_password': 'password',
        'signin_options': 'submit',
        'redirect':'index.php?'
    }

    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    q = session.get('http://fans.heat.nba.com/community/index.php?app=members&module=messaging&section=view&do=showConversation&topicID=4314&st=20#msg26627')
    print(session.cookies)
    print(r.status_code)
    print(q.status_code)

if __name__ == '__main__':
    main()

      

The url is the login page on the forums. With the variable "q", the session tries to access a specific web page on the forums (private messenger), which can only be accessed by logging in. However, the status code for this request returns "403", which means that I was unable to login successfully.

Why can't I login? In 'login_data', 'ips_username' and 'ips_password' are HTML forms. However, I believe that I have the actual login commands ("signin_options", "redirect") wrong.

Can anyone guide me to the correct login commands?

+3


source to share


2 answers


There is hidden input in the form auth_key

<input type='hidden' name='auth_key' value='880ea6a14ea49e853634fbdc5015a024' />

      

So, you need to parse it and pass it to the login page. You can just use regex



def main():
      session = requests.Session()

      # Get the source page that contain the auth_key
      r = requests.get("http://fans.heat.nba.com/community/index.php?app=core&module=global&section=login")
      # Parse it
      auth_key = re.findall("auth_key' value='(.*?)'",r.text)[0]


      # This is the form data that the page sends when logging in
      login_data = {
           'ips_username': 'username',
           'ips_password': 'password',
           'auth_key' : auth_key                                                                                                                      

      }

      

And the rest should be the same.

+5


source


As @Chaker pointed out in the comments, the login form requires you to submit auth_key

which you need to read from the first visit to the page.

auth_key

is a hidden form field with a random value (generated and stored by the server), so every regular web browser sends this with a request POST

. The server then validates the request and requires it to contain auth_key

what it knows is valid (by checking its list of auth_keys issued). Thus, the process should be as follows:

  • Go to the first page (or possibly any page below)
  • Read the value of the hidden field auth_key

  • Create a request POST

    that includes your credentials, andauth_key

So this works:

import re
import requests

USERNAME = 'username'
PASSWORD = 'password'

AUTH_KEY = re.compile(r"<input type='hidden' name='auth_key' value='(.*?)' \/>")

BASE_URL = 'http://fans.heat.nba.com/community/'
LOGIN_URL = BASE_URL + '/index.php?app=core&module=global&section=login&do=process'
SETTINGS_URL = BASE_URL + 'index.php?app=core&module=usercp'

payload = {
    'ips_username': USERNAME,
    'ips_password': PASSWORD,
    'rememberMe': '1',
    'referer': 'http://fans.heat.nba.com/community/',
}

with requests.session() as session:
    response = session.get(BASE_URL)
    auth_key = AUTH_KEY.search(response.text).group(1)
    payload['auth_key'] = auth_key
    print("auth_key: %s" % auth_key)

    response = session.post(LOGIN_URL, data=payload)
    print("Login Response: %s" % response)

    response = session.get(SETTINGS_URL)
    print("Settings Page Response: %s" % response)

assert "General Account Settings" in response.text

      



Output:

auth_key: 777777774ea49e853634fbdc77777777
Login Response: <Response [200]>
Settings Page Response: <Response [200]>

      

auth_key

is a regex that matches any pattern that looks like <input type='hidden' name='auth_key' value='?????' \/>

, where ?????

is a group of zero or more characters (not greedy, which means it looks for the shortest match). The re

module
documentation should start with regular expressions. You can also check this regex here , explain it and play with it.

Note . If you must parse (X) HTML, you should always use (X) HTML parser . However, for this quick and dirty way of extracting a hidden form field, a non-greedy regex does the job just fine.

+4


source







All Articles