🤬
  • ■ ■ ■ ■ ■ ■
    README.md
    skipped 108 lines
    109 109   
    110 110  # Example crawl
    111 111   
    112  -A fews seconds of random crawling looks like this:
     112 +After a while of random crawling looks like this:
    113 113   
    114 114  ```
    115  -Added 1 links, 29045 total at url 'https://www.diapers.com/l/best-gifts-for-mom?ref=b_scf_leg_best_gifts_for_mom&icn=b_scf_leg&ici=best_gifts_for_mom'.
    116  -Added 1 links, 29045 total at url 'http://bananarepublic.gap.eu/browse/category.do?cid=1025790'.
    117  -Added 200 links, 29244 total at url 'http://www.bananarepublic.ca/products/mens-suits.jsp'.
    118  -Added 1 links, 29244 total at url 'http://cyworld.com.cy/en/jurisdictions/estonia7'.
    119  -Added 1 links, 29244 total at url 'http://bananarepublic.gap.eu/browse/category.do?cid=1025788'.
    120  -Added 2 links, 29245 total at url 'https://www.osti.gov/scitech/biblio/1337873-cyber-threat-vulnerability-analysis-electric-sector'.
    121  -Added 1 links, 29245 total at url 'https://www.amazon.com/30th-Anniversary-Collection-Time-Greatest/dp/B00000334E/ref=sr_1_9/153-5801643-0200824?ie=UTF8&qid=1491060352&sr=8-9&keywords=Paul+Anka'.
    122  -Added 40 links, 29284 total at url 'http://www.bendixking.com/Products/Displays'.
    123  -Added 1 links, 29284 total at url 'http://www.thefreedictionary.com/arid'.
    124  -Added 47 links, 29330 total at url 'http://www2.beltrailway.com/unemployment-sickness-benefits-for-railroad-employees/'.
     115 +This is ISP Data Pollution 🐙💨, Version 1.1
     116 +Downloading the blacklist… done.
     117 +Display format:
     118 +Downloading: website.com; NNNNN links [in library], H(domain)= B bits [entropy]
     119 +Downloaded: website.com: +LLL/NNNNN links [added], H(domain)= B bits [entropy]
     120 + 
     121 +http://eponymousflower.blogspot.com/2017/02/lu…: +6/32349 links, H(domain)=6.8 b
    125 122  ```
    126 123   
    127 124  The screenshot of a randomly crawled web page looks like this. Note that there are no downloaded images.
    skipped 11 lines
    139 136  I like `pip`, so on my machines I would say:
    140 137   
    141 138  ```
    142  -sudo pip-3.4 install numpy requests selenium Faker
     139 +sudo pip-3.4 install numpy requests selenium Faker OpenSSL
    143 140  ```
    144 141   
     142 +## PhantomJS
     143 + 
     144 +It is recommended that the `phantomjs` binary be installed directly from [phantomjs.org](http://phantomjs.org/download.html). Be sure to verify the [checksum](http://phantomjs.org/download.html#checksums) of the downloaded installation.
     145 + 
    145 146  ## macOS
    146 147   
    147 148  The [MacPorts](https://www.macports.org) install command is:
    148 149   
    149 150  ```
    150  -sudo port install py34-numpy py34-requests py34-psutil phantomjs psutil
     151 +sudo port install py34-numpy py34-requests py34-psutil py34-openssl phantomjs psutil
    151 152  ```
    152 153   
    153 154  This is what was also necessary on macOS:
    skipped 65 lines
Please wait...
Page is in error, reload to recover