Jump to content

Talk:Web scraping

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Berean Hunter (talk | contribs) at 14:22, 2 December 2018 (OneClickArchiver archived merge to screen scraping to Talk:Web scraping/Archive 1). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconInternet C‑class High‑importance
WikiProject iconThis article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
CThis article has been rated as C-class on Wikipedia's content assessment scale.
HighThis article has been rated as High-importance on the project's importance scale.
WikiProject iconComputing C‑class Mid‑importance
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
CThis article has been rated as C-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-importance on the project's importance scale.

I don't this article's discussion of the legalities of scraping is correct, and I'm disputing its neutrality. The DMCA prohibits technical measures to bypass an effective access control measure. A robot acting like a browser bypasses no effective measures in doing so, and thereby doesn't fall afoul of the DMCA. Also, redistributing copyrighted material is illegal regardless of whether the DMCA is invoked.

Furthermore, not all material gotten through screen-scraping is copyrighted. Consider the case of a site that displayed film showtimes. The showtimes themselves are not copyrighted any more than the numbers in a phone book are, and therefore can be used by whoever scrapes them without fear of copyright infringement. Wholesale copying of content is illegal, yes, but it's not an issue specific to "web scraping."

Also, performing an action that violates a site's terms of use is not illegal. It merely violates the terms of use, not any law. It's not even a breach of contract, since the user doesn't even have to read, much less agree to the terms to use the site.

Also, I demand a citation for the "courts have held" claim. I find it unlikely, though not entirely impossible. — Preceding unsigned comment added by Quotemstr (talkcontribs) 03:26, July 27, 2007 (UTC)

The legal issues section made several bold and unsourced claims that could be interpreted as scare-mongering. Can someone check out the reworked section? —The preceding unsigned comment was added by Quotemstr (talkcontribs) 00:06:14, August 20, 2007 (UTC).

I'm not starting an edit war, I swear. :-)

First of all, I cleaned up and normalized the references a bit, and made some minor phrasing changes that shouldn't be controversial.

I removed the section about legal action occurring out of the public eye. That information isn't only unsourced: it's unverifiable.

The court cases cited in the article hardly count as defeats. In the Ticketmaster case, the court held that the particular instance of scraping mentioned was not a trespass. In the other cases listed, the claim was for a preliminary injunction only. As I understand it, a preliminary injunction does not set case law, and should not be considered with the same weight as a final decision.

As for the aggregate damage section -- is there a specific source? Maybe I just missed it.

I don't see how the DMCA is relevant here either; the cases mentioned in the previous version seem to be covered by normal copyright law. A scraper doesn't necessarily have to circumvent any access restrictions in place on a site, considering that one can act like just a browser. Also, doesn't the DMCA specifically allow circumvention for interoperability? —The preceding unsigned comment was added by Quotemstr (talk) (contribs) 02:30, August 21, 2007 (UTC)