Wednesday, 10 October 2018

Instagram Web Scraper

   Instagram api:  __a=1 is not working now

    You can use web scraper to extract instagram public posts data.

    Python3:

    In python3, urllib.request=urllib2
urllib is built-in python3, you don't need to install.

CODE:

import json, urllib
images__url_list = [] url = 'https://www.instagram.com/username/' http = urllib.request response = http.urlopen(url, timeout=10.0) instagram_html = response.read() instagram_html_data = instagram_html.decode('utf8').split('"ProfilePage":[') if len(instagram_html_data) > 1: instagram_html_final_data = instagram_html_data[1].split(']},"gatekeepers"') if len(instagram_html_final_data) > 1: json_dict = json.loads(instagram_html_final_data[0]) media = json_dict['graphql']['user']['edge_owner_to_timeline_media']['edges'] for obj in media: images__url_list.append({
                    'id': obj['node']['id'],
                    'caption': obj['node']['edge_media_to_caption']['edges'][0]['node']['text'],
                    'imgurl_standard': obj['node']['display_url'],
                    'imgurl_lower': obj['node']['thumbnail_resources'][4]['src'],
                    'imgurl_thumb': obj['node']['thumbnail_resources'][3]['src']
                })
images__url_list will have instagram contents data
    Pyhton 2:

    CODE:

    import json, urllib2

    img_list = []

    r = urllib2.urlopen(url, timeout=10.0)
    instagram_html = r.read()
    instagram_html_data = instagram_html.split('"ProfilePage":[')


    if len(instagram_html_data) > 1:
            for obj in media:

        instagram_html_final_data = instagram_html_data[1].split(']},"gatekeepers"')

        if len(instagram_html_final_data) > 1:
            json_dict = json.loads(instagram_html_final_data[0])
            media = json_dict['graphql']['user']['edge_owner_to_timeline_media']['edges']

                img_list.append({
                    'id': obj['node']['id'],
                    'caption': obj['node']['edge_media_to_caption']['edges'][0]['node']['text'],
                    'imgurl_standard': obj['node']['display_url'],
                    'imgurl_lower': obj['node']['thumbnail_resources'][4]['src'],
                    'imgurl_thumb': obj['node']['thumbnail_resources'][3]['src']
                })
    Image list will have all contents of instagram page.

No comments:

Post a Comment