What is urllib?

urllib is a Python module that can be used for opening URLs. It defines functions and classes to help in URL actions.

With Python you can also access and retrieve data from the internet like XML, HTML, JSON, etc. You can also use Python to work with this data directly. In this tutorial we are going to see how we can retrieve data from the web. For example, here we used a girishgodage video URL, and we are going to access this video URL using Python as well as print HTML file of this URL.

In this tutorial we will learn

How to Open URL using Urllib

Before we run the code to connect to Internet data, we need to import statement for URL library module or "urllib".

Internet Access with Python Tutorial: Open, Parse & Read URL

  • Import urllib
  • Define your main function
  • Declare the variable webUrl
  • Then call the urlopen function on the URL lib library
  • The URL we are opening is girishgodage tutorial on youtube
  • Next, we going to print the result code
  • Result code is retrieved by calling the getcode function on the webUrl variable we have created
  • We going to convert that to a string, so that it can be concatenated with our string "result code"
  • This will be a regular HTTP code "200", indicating http request is processed successfully

How to get HTML file form URL in Python

You can also read the HTML file by using the "read function" in Python, and when you run the code, the HTML file will appear in the console.

Internet Access with Python Tutorial: Open, Parse & Read URL

  • Call the read function on the webURL variable
  • Read variable allows to read the contents of data files
  • Read the entire content of the URL into a variable called data
  • Run the code- It will print the data into HTML format

Here is the complete code

Python 2 Example

                
         #  
         # read the data from the URL and print it
         #
         import urllib2
         
         def main():
         # open a connection to a URL using urllib2
            webUrl = urllib2.urlopen("https://www.youtube.com/user/girishgodage")
           
         #get the result code and print it
            print "result code: " + str(webUrl.getcode()) 
           
         # read the data from the URL and print it
            data = webUrl.read()
            print data
          
         if __name__ == "__main__":
           main()
        
        

Python 3 Example

                
         #
         # read the data from the URL and print it
         #
         import urllib.request
         # open a connection to a URL using urllib
         webUrl  = urllib.request.urlopen('https://www.youtube.com/user/girishgodage')
         
         #get the result code and print it
         print ("result code: " + str(webUrl.getcode()))
         
         # read the data from the URL and print it
         data = webUrl.read()
         print (data)