I want to scrape pages

gllt · August 30, 2010, 10:37am

I’m stoned and tired, I have heartburn, I’m going to bed

here is a pile of python code that does nothing

I want to turn beautifulsoup and mechanize/clientform/urllib2/pycurl/IDEK LOL into a uhm

here you go

[CODE]import urllib
import urllib2
from BeautifulSoup import BeautifulSoup

opnr = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
urllib2.install_opener( opnr )
pswd = urllib.urlencode( { ‘name’: ‘lame’, ‘pass’: ‘ass’ } )
logn = opnr.open( ‘https://www.site.net/login/’, pswd )
data = logn.read()
soup = BeautifulSoup(data)
print soup
logn.close()[/CODE]

import pycurl username="name" password="pass" pyc = pycurl.Curl() pyc.setopt(pycurl.SSL_VERIFYHOST, 0) pyc.setopt(pycurl.CAINFO, 'ca.pem') pyc.setopt(pycurl.POST, 0) pyc.setopt(pycurl.URL, "https://www.site.net/login/") pyc.setopt(pycurl.USERPWD, username + ":" + password) #pyc.setopt(pyc.HTTPPOST, [('title', title),('tags', tags),('body', body),("media", (pyc.FORM_FILE, file))]) #pyc.setopt(pyc.VERBOSE, 1) result = pyc.perform() pyc.close()

When I get up I’ll hack more of this together until it means something

right, no seriously I’m not spamming bbl

Yenairo_old · August 30, 2010, 11:17am

what is this for?

Jatz · August 30, 2010, 4:26pm

It was created for the purpose of doing nothing.

gllt · August 30, 2010, 4:28pm

Page/HTML/XML/Screen Scraping

Jatz · August 30, 2010, 4:30pm

I hate python xD

gllt · August 31, 2010, 5:03am

Oh look, Feed43

http://feed43.com/6218602455561226.xml

Whaddaya know