Scraping Techniques to Extract Advertisements from Web Pages

by Mirko Urru - Stefano Cotta Ramusino for EuroPython 2011

Online Advertising is an emerging research field, at the intersection of Information Retrieval, Machine Learning, Optimization, and Microeconomics. Its main goal is to choose the right ads to present to a user engaged in a given task, such as Sponsored Search Advertising or Contextual Advertising. The former puts ads on the page returned from a Web search engine following a query. The latter puts ads within the content of a generic, third party, Web page. The ads themselves are selected and served by automated systems based on the content displayed to the user.

Web scraping is the set of techniques used to automatically get some information from a website instead of manually copying it. In particular, we’re interested in studying and adopting scraping techniques for: i. accessing tags as object members ii. finding out tags whose name, contents or attributes match selection criteria iii. accessing tag attributes by using a dictionary-like syntax.

In this talk, we focus on the adoption of scraping techniques in the contextual advertising field. In particular, we present a system aimed at finding the most relevant ads for a generic web page p. Starting from p, the system selects a set of its inlinks (i.e., the pages that link p) and extracts the ads contained into them. Selection is performed querying the Google search engine, whereas extraction is made by using suitable scraping techniques.

Video

Comments

buongiorno,
questo progetto di rilevazione ADV su Internet ha avuto sviluppi?
Gentili saluti,
Paolo Valota
Visual Box srl
02/4800.8157
p.valota@visualbox.it
paolo valota, 11 October 2013 #

New comment

Comment

Name

Email address

URL

Captcha

Download slide (PDF Document)
(Size: 1.8 MB)

Language: EN
Duration: 60 minutes (inc Q&A)

Get Support

Support unavailable

Scraping Techniques to Extract Advertisements from Web Pages

Video

Comments

New comment

Our Sponsors