Talk:Web scraping

This is the talk page for discussing improvements to the Web scraping article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1: 2 months

Internet C‑class High‑importance

	Internet portal This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.InternetWikipedia:WikiProject InternetTemplate:WikiProject InternetInternet
C	This article has been rated as C-class on Wikipedia's content assessment scale.
High	This article has been rated as High-importance on the project's importance scale.

Computing C‑class Mid‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
C	This article has been rated as C-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-importance on the project's importance scale.

This is the talk page for discussing improvements to the Web scraping article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1: 2 months

legal issues

I don't this article's discussion of the legalities of scraping is correct, and I'm disputing its neutrality. The DMCA prohibits technical measures to bypass an effective access control measure. A robot acting like a browser bypasses no effective measures in doing so, and thereby doesn't fall afoul of the DMCA. Also, redistributing copyrighted material is illegal regardless of whether the DMCA is invoked.

Furthermore, not all material gotten through screen-scraping is copyrighted. Consider the case of a site that displayed film showtimes. The showtimes themselves are not copyrighted any more than the numbers in a phone book are, and therefore can be used by whoever scrapes them without fear of copyright infringement. Wholesale copying of content is illegal, yes, but it's not an issue specific to "web scraping."

Also, performing an action that violates a site's terms of use is not illegal. It merely violates the terms of use, not any law. It's not even a breach of contract, since the user doesn't even have to read, much less agree to the terms to use the site.

Also, I demand a citation for the "courts have held" claim. I find it unlikely, though not entirely impossible. — Preceding unsigned comment added by Quotemstr (talk • contribs) 03:26, July 27, 2007 (UTC)

legal issues section reworked

The legal issues section made several bold and unsourced claims that could be interpreted as scare-mongering. Can someone check out the reworked section? —The preceding unsigned comment was added by Quotemstr (talk • contribs) 00:06:14, August 20, 2007 (UTC).

Legal issues again

I'm not starting an edit war, I swear. :-)

First of all, I cleaned up and normalized the references a bit, and made some minor phrasing changes that shouldn't be controversial.

I removed the section about legal action occurring out of the public eye. That information isn't only unsourced: it's unverifiable.

The court cases cited in the article hardly count as defeats. In the Ticketmaster case, the court held that the particular instance of scraping mentioned was not a trespass. In the other cases listed, the claim was for a preliminary injunction only. As I understand it, a preliminary injunction does not set case law, and should not be considered with the same weight as a final decision.

As for the aggregate damage section -- is there a specific source? Maybe I just missed it.

I don't see how the DMCA is relevant here either; the cases mentioned in the previous version seem to be covered by normal copyright law. A scraper doesn't necessarily have to circumvent any access restrictions in place on a site, considering that one can act like just a browser. Also, doesn't the DMCA specifically allow circumvention for interoperability? —The preceding unsigned comment was added by Quotemstr (talk) (contribs) 02:30, August 21, 2007 (UTC)