Projects STRLCPY OnionSearch Commits f4270af8
🤬
Revision indexing in progress... (symbol navigation in revisions will be accurate after indexed)
  • ■ ■ ■ ■ ■ ■
    README.md
    skipped 1 lines
    2 2  ## Educational purposes only
    3 3  [![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
    4 4   
    5  -OnionSearch is a script that scrapes urls on different .onion search engines. In 30 minutes you get 10,000 unique urls.
     5 +OnionSearch is a Python3 script that scrapes urls on different ".onion" search engines.
     6 + 
     7 +In 30 minutes you get thousands of unique urls.
     8 + 
    6 9  ## 💡 Prerequisite
    7 10  [Python 3](https://www.python.org/download/releases/3.0/)
    8 11  
    9  -## �� Search engines used
     12 +## �� Currently supported Search engines
    10 13  - Ahmia
    11  -- Torch
     14 +- TORCH (x2)
    12 15  - Darksearch io
    13 16  - OnionLand
     17 +- not Evil
     18 +- VisiTOR
     19 +- Dark Search Enginer
     20 +- Phobos
     21 +- Onion Search Server
     22 +- Grams (x2)
     23 +- Candle
     24 +- Tor Search Engine (x2)
     25 +- Torgle (x2)
     26 +- Onion Search Engine
     27 +- Tordex
     28 +- Tor66
     29 +- Tormax
     30 +- Haystack
     31 +- Multivac
     32 +- Evo Search
     33 +- Oneirun
     34 +- DeepLink
    14 35   
    15 36  ## 🛠️ Installation
     37 + 
    16 38  ```
    17 39  git clone https://github.com/megadose/OnionSearch.git
    18 40  cd OnionSearch
    19 41  pip3 install -r requirements.txt
     42 +pip3 install 'urllib3[socks]'
    20 43  python3 search.py -h
    21 44  ```
     45 + 
    22 46  ## 📈 Usage
     47 + 
     48 +Help:
    23 49  ```
    24  -python3 search.py [-h] --search "search" [--proxy 127.0.0.1:1337] [--output mylinks.txt]
    25  -python3 search.py --search "computer" --output computer.txt
     50 +usage: search.py [-h] [--proxy PROXY] [--output OUTPUT]
     51 + [--continuous_write CONTINUOUS_WRITE] [--limit LIMIT]
     52 + [--engines [ENGINES [ENGINES ...]]]
     53 + [--exclude [EXCLUDE [EXCLUDE ...]]]
     54 + [--fields [FIELDS [FIELDS ...]]]
     55 + [--field_delimiter FIELD_DELIMITER] [--mp_units MP_UNITS]
     56 + search
     57 + 
     58 +positional arguments:
     59 + search The search string or phrase
     60 + 
     61 +optional arguments:
     62 + -h, --help show this help message and exit
     63 + --proxy PROXY Set Tor proxy (default: 127.0.0.1:9050)
     64 + --output OUTPUT Output File (default: output_$SEARCH_$DATE.txt), where $SEARCH is replaced by the first chars of the search string and $DATE is replaced by the datetime
     65 + --continuous_write CONTINUOUS_WRITE
     66 + Write progressively to output file (default: False)
     67 + --limit LIMIT Set a max number of pages per engine to load
     68 + --engines [ENGINES [ENGINES ...]]
     69 + Engines to request (default: full list)
     70 + --exclude [EXCLUDE [EXCLUDE ...]]
     71 + Engines to exclude (default: none)
     72 + --fields [FIELDS [FIELDS ...]]
     73 + Fields to output to csv file (default: engine name link), available fields are shown below
     74 + --field_delimiter FIELD_DELIMITER
     75 + Delimiter for the CSV fields
     76 + --mp_units MP_UNITS Number of processing units (default: core number minus 1)
     77 + 
     78 +[...]
    26 79  ```
     80 + 
     81 +### Multi-processing behaviour
     82 + 
     83 +By default, the script will run with the parameter `mp_units = cpu_count() - 1`. It means if you have a machine with 4 cores,
     84 +it will run 3 scraping functions in parallel. You can force `mp_units` to any value but it is recommended to leave to default.
     85 +You may want to set it to 1 to run all requests sequentially (disabling multi-processing feature).
     86 + 
     87 +Please note that continuous writing to csv file has not been *heavily* tested with multiprocessing feature and therefore
     88 +may not work as expected.
     89 + 
     90 +Please also note that the progress bars may not be properly displayed when `mp_units` is greater than 1.
     91 +**It does not affect the results**, so don't worry.
     92 + 
     93 +### Examples
     94 + 
     95 +To request all the engines for the word "computer":
     96 +```
     97 +python3 search.py "computer"
     98 +```
     99 + 
     100 +To request all the engines excepted "Ahmia" and "Candle" for the word "computer":
     101 +```
     102 +python3 search.py "computer" --exclude ahmia candle
     103 +```
     104 + 
     105 +To request only "Tor66", "DeepLink" and "Phobos" for the word "computer":
     106 +```
     107 +python3 search.py "computer" --engines tor66 deeplink phobos
     108 +```
     109 + 
     110 +The same as previously but limiting to 3 the number of pages to load per engine:
     111 +```
     112 +python3 search.py "computer" --engines tor66 deeplink phobos --limit 3
     113 +```
     114 + 
     115 +Please kindly note that the list of supported engines (and their keys) is given in the script help (-h).
     116 + 
     117 + 
     118 +### Output
     119 + 
     120 +#### Default output
     121 + 
     122 +By default, the file is written at the end of the process. The file will be csv formatted, containing the following columns:
     123 +```
     124 +"engine","name of the link","url"
     125 +```
     126 + 
     127 +#### Customizing the output fields
     128 + 
     129 +You can customize what will be flush in the output file by using the parameters `--fields` and `--field_delimiter`.
     130 + 
     131 +`--fields` allows you to add, remove, re-order the output fields. The default mode is show just below. Instead, you can for instance
     132 +choose to output:
     133 +```
     134 +"engine","name of the link","url","domain"
     135 +```
     136 +by setting `--fields engine name link domain`.
     137 + 
     138 +Or even, you can choose to output:
     139 +```
     140 +"engine","domain"
     141 +```
     142 +by setting `--fields engine domain`.
     143 + 
     144 +These are examples but there are many possibilities.
     145 + 
     146 +Finally, you can also choose to modify the CSV delimiter (comma by default), for instance: `--field_delimiter ";"`.
     147 + 
     148 +#### Changing filename
     149 + 
     150 +The filename will be set by default to `output_$DATE_$SEARCH.txt`, where $DATE represents the current datetime and $SEARCH the first
     151 +characters of the search string.
     152 + 
     153 +You can modify this filename by using `--output` when running the script, for instance:
     154 +```
     155 +python3 search.py "computer" --output "\$DATE.csv"
     156 +python3 search.py "computer" --output output.txt
     157 +python3 search.py "computer" --output "\$DATE_\$SEARCH.csv"
     158 +...
     159 +```
     160 +(Note that it might be necessary to escape the dollar character.)
     161 + 
     162 +In the csv file produced, the name and url strings are sanitized as much as possible, but there might still be some problems...
     163 + 
     164 +#### Write progressively
     165 + 
     166 +You can choose to progressively write to the output (instead of everything at the end, which would prevent
     167 +losing the results if something goes wrong). To do so you have to use `--continuous_write True`, just as is:
     168 +```
     169 +python3 search.py "computer" --continuous_write True
     170 +```
     171 +You can then use the `tail -f` (tail follow) Unix command to actively watch or monitor the results of the scraping.
     172 + 
    27 173  ## 📝 License
    28 174  [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.fr.html)
    29 175   
    skipped 2 lines
  • ■ ■ ■ ■ ■ ■
    engines.py
     1 +ENGINES = {
     2 + "ahmia": "http://msydqstlz2kzerdg.onion",
     3 + "torch": "http://xmh57jrzrnw6insl.onion",
     4 + "torch1": "http://mkojmtnv22hpbfxk.onion",
     5 + "darksearchio": "http://darksearch.io",
     6 + "onionland": "http://3bbad7fauom4d6sgppalyqddsqbf5u5p56b5k5uk2zxsy3d6ey2jobad.onion",
     7 + "notevil": "http://hss3uro2hsxfogfq.onion",
     8 + "visitor": "http://visitorfi5kl7q7i.onion",
     9 + "darksearchenginer": "http://7pwy57iklvt6lyhe.onion",
     10 + "phobos": "http://phobosxilamwcg75xt22id7aywkzol6q6rfl2flipcqoc4e4ahima5id.onion",
     11 + "onionsearchserver": "http://oss7wrm7xvoub77o.onion",
     12 + "grams": "http://grams7enqfy4nieo.onion",
     13 + "grams1": "http://grams7ebnju7gwjl.onion",
     14 + "candle": "http://gjobjn7ievumcq6z.onion",
     15 + "torsearchengine": "http://searchcoaupi3csb.onion",
     16 + "torsearchengine1": "http://searcherc3uwk535.onion",
     17 + "torgle": "http://torglejzid2cyoqt.onion",
     18 + "torgle1": "http://torgle5fj664v7pf.onion",
     19 + "onionsearchengine": "http://onionf4j3fwqpeo5.onion",
     20 + "tordex": "http://tordex7iie7z2wcg.onion",
     21 + "tor66": "http://tor66sezptuu2nta.onion",
     22 + "tormax": "http://tormaxunodsbvtgo.onion",
     23 + "haystack": "http://haystakvxad7wbk5.onion",
     24 + "multivac": "http://multivacigqzqqon.onion",
     25 + "evosearch": "http://evo7no6twwwrm63c.onion",
     26 + "oneirun": "http://oneirunda366dmfm.onion",
     27 + "deeplink": "http://deeplinkdeatbml7.onion",
     28 +}
     29 + 
  • ■ ■ ■ ■ ■
    requirements.txt
     1 +requests
    1 2  beautifulsoup4
    2 3  tqdm
    3 4  argparse
    4  - 
     5 +html5lib
  • ■ ■ ■ ■ ■ ■
    search.py
    1  -import requests,json
    2  -from bs4 import BeautifulSoup
    3 1  import argparse
     2 +import csv
     3 +import math
     4 +import re
     5 +import time
     6 +from datetime import datetime
     7 +from functools import reduce
     8 +from random import choice
     9 + 
     10 +from multiprocessing import Pool, cpu_count, current_process, freeze_support
    4 11  from tqdm import tqdm
    5  -parser = argparse.ArgumentParser()
    6  -required = parser.add_argument_group('required arguments')
     12 + 
     13 +import requests
     14 +import urllib.parse as urlparse
     15 +from urllib.parse import parse_qs
     16 +from urllib.parse import quote
     17 +from urllib.parse import unquote
     18 +from bs4 import BeautifulSoup
     19 +from urllib3.exceptions import ProtocolError
     20 + 
     21 +import engines
     22 + 
     23 +desktop_agents = [
     24 + 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
     25 + 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
     26 + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
     27 + 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
     28 + 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) '
     29 + 'AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
     30 + 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
     31 + 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) '
     32 + 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
     33 + 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) '
     34 + 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
     35 + 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
     36 + 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
     37 + 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'
     38 +]
     39 + 
     40 +supported_engines = engines.ENGINES
     41 + 
     42 +available_csv_fields = [
     43 + "engine",
     44 + "name",
     45 + "link",
     46 + "domain"
     47 +]
     48 + 
     49 + 
     50 +def print_epilog():
     51 + epilog = "Available CSV fields: \n\t"
     52 + for f in available_csv_fields:
     53 + epilog += " {}".format(f)
     54 + epilog += "\n"
     55 + epilog += "Supported engines: \n\t"
     56 + for e in supported_engines.keys():
     57 + epilog += " {}".format(e)
     58 + return epilog
     59 + 
     60 + 
     61 +parser = argparse.ArgumentParser(epilog=print_epilog(), formatter_class=argparse.RawTextHelpFormatter)
    7 62  parser.add_argument("--proxy", default='localhost:9050', type=str, help="Set Tor proxy (default: 127.0.0.1:9050)")
    8  -parser.add_argument("--output", default='output.txt', type=str, help="Output File (default: output.txt)")
     63 +parser.add_argument("--output", default='output_$SEARCH_$DATE.txt', type=str,
     64 + help="Output File (default: output_$SEARCH_$DATE.txt), where $SEARCH is replaced by the first "
     65 + "chars of the search string and $DATE is replaced by the datetime")
     66 +parser.add_argument("--continuous_write", type=bool, default=False,
     67 + help="Write progressively to output file (default: False)")
     68 +parser.add_argument("search", type=str, help="The search string or phrase")
     69 +parser.add_argument("--limit", type=int, default=0, help="Set a max number of pages per engine to load")
     70 +parser.add_argument("--engines", type=str, action='append', help='Engines to request (default: full list)', nargs="*")
     71 +parser.add_argument("--exclude", type=str, action='append', help='Engines to exclude (default: none)', nargs="*")
     72 +parser.add_argument("--fields", type=str, action='append',
     73 + help='Fields to output to csv file (default: engine name link), available fields are shown below',
     74 + nargs="*")
     75 +parser.add_argument("--field_delimiter", type=str, default=",", help='Delimiter for the CSV fields')
     76 +parser.add_argument("--mp_units", type=int, default=(cpu_count() - 1), help="Number of processing units (default: "
     77 + "core number minus 1)")
    9 78   
    10  -parser.add_argument("--search",type=str, help="search")
    11 79  args = parser.parse_args()
    12 80  proxies = {'http': 'socks5h://{}'.format(args.proxy), 'https': 'socks5h://{}'.format(args.proxy)}
     81 +filename = args.output
     82 +field_delim = ","
     83 +if args.field_delimiter and len(args.field_delimiter) == 1:
     84 + field_delim = args.field_delimiter
     85 + 
     86 + 
     87 +def random_headers():
     88 + return {'User-Agent': choice(desktop_agents),
     89 + 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
     90 + 
    13 91   
    14 92  def clear(toclear):
    15  - return(toclear.replace("\n","").replace(" ",""))
    16  -def clearn(toclear):
    17  - return(toclear.replace("\n"," "))
     93 + str = toclear.replace("\n", " ")
     94 + str = ' '.join(str.split())
     95 + return str
     96 + 
     97 + 
     98 +def get_parameter(url, parameter_name):
     99 + parsed = urlparse.urlparse(url)
     100 + return parse_qs(parsed.query)[parameter_name][0]
     101 + 
     102 + 
     103 +def get_proc_pos():
     104 + return (current_process()._identity[0]) - 1
     105 + 
     106 + 
     107 +def get_tqdm_desc(e_name, pos):
     108 + return "%20s (#%d)" % (e_name, pos)
     109 + 
     110 + 
     111 +def ahmia(searchstr):
     112 + results = []
     113 + ahmia_url = supported_engines['ahmia'] + "/search/?q={}"
     114 + 
     115 + pos = get_proc_pos()
     116 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("Ahmia", pos), position=pos) as progress_bar:
     117 + response = requests.get(ahmia_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
     118 + soup = BeautifulSoup(response.text, 'html5lib')
     119 + results = link_finder("ahmia", soup)
     120 + progress_bar.update()
     121 + 
     122 + return results
     123 + 
     124 + 
     125 +def torch(searchstr):
     126 + results = []
     127 + torch_url = supported_engines['torch'] + "/4a1f6b371c/search.cgi?cmd=Search!&np={}&q={}"
     128 + results_per_page = 10
     129 + max_nb_page = 100
     130 + if args.limit != 0:
     131 + max_nb_page = args.limit
     132 + 
     133 + with requests.Session() as s:
     134 + s.proxies = proxies
     135 + s.headers = random_headers()
     136 + 
     137 + req = s.get(torch_url.format(0, quote(searchstr)))
     138 + soup = BeautifulSoup(req.text, 'html5lib')
     139 + 
     140 + page_number = 1
     141 + for i in soup.find("table", attrs={"width": "100%"}).find_all("small"):
     142 + if i.get_text() is not None and "of" in i.get_text():
     143 + page_number = math.ceil(float(clear(i.get_text().split("-")[1].split("of")[1])) / results_per_page)
     144 + if page_number > max_nb_page:
     145 + page_number = max_nb_page
     146 + 
     147 + pos = get_proc_pos()
     148 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("TORCH", pos), position=pos) as progress_bar:
     149 + 
     150 + results = link_finder("torch", soup)
     151 + progress_bar.update()
     152 + 
     153 + # Usually range is 2 to n+1, but TORCH behaves differently
     154 + for n in range(1, page_number):
     155 + req = s.get(torch_url.format(n, quote(searchstr)))
     156 + soup = BeautifulSoup(req.text, 'html5lib')
     157 + results = results + link_finder("torch", soup)
     158 + progress_bar.update()
     159 + 
     160 + return results
     161 + 
     162 + 
     163 +def torch1(searchstr):
     164 + results = []
     165 + torch1_url = supported_engines['torch1'] + "/search?q={}&cmd=Search!"
     166 + 
     167 + pos = get_proc_pos()
     168 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("TORCH 1", pos), position=pos) as progress_bar:
     169 + response = requests.get(torch1_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
     170 + soup = BeautifulSoup(response.text, 'html5lib')
     171 + results = link_finder("torch1", soup)
     172 + progress_bar.update()
    18 173   
    19  -def scrape():
    20  - result = {}
    21  - ahmia = "http://msydqstlz2kzerdg.onion/search/?q="+args.search
    22  - response = requests.get(ahmia, proxies=proxies)
    23  - #print(response)
    24  - soup = BeautifulSoup(response.text, 'html.parser')
    25  - result['ahmia'] = []
    26  - #pageNumber = clear(soup.find("span", id="pageResultEnd").get_text())
    27  - for i in tqdm(soup.findAll('li', attrs = {'class' : 'result'}),desc="Ahmia"):
    28  - i = i.find('h4')
    29  - result['ahmia'].append({"name":clear(i.get_text()),"link":i.find('a')['href'].replace("/search/search/redirect?search_term=search&redirect_url=","")})
     174 + return results
    30 175   
    31  - urlTorchNumber = "http://xmh57jrzrnw6insl.onion/4a1f6b371c/search.cgi?cmd=Search!&np=1&q="
    32  - req = requests.get(urlTorchNumber+args.search,proxies=proxies)
    33  - soup = BeautifulSoup(req.text, 'html.parser')
    34  - result['urlTorch'] = []
    35  - pageNumber = ""
    36  - for i in soup.find("table",attrs={"width":"100%"}).findAll("small"):
    37  - if("of"in i.get_text()):
    38  - pageNumber = i.get_text()
    39  - pageNumber = round(float(clear(pageNumber.split("-")[1].split("of")[1]))/10)
    40  - if(pageNumber>99):
    41  - pageNumber=99
    42  - result['urlTorch'] = []
    43  - for n in tqdm(range(1,pageNumber+1),desc="Torch"):
    44  - urlTorch = "http://xmh57jrzrnw6insl.onion/4a1f6b371c/search.cgi?cmd=Search!&np={}&q={}".format(n,args.search)
    45  - #print(urlTorch)
    46  - try:
    47  - req = requests.get(urlTorchNumber+args.search,proxies=proxies)
    48  - soup = BeautifulSoup(req.text, 'html.parser')
    49  - for i in soup.findAll('dl'):
    50  - result['urlTorch'].append({"name":clear(i.find('a').get_text()),"link":i.find('a')['href']})
    51  - except:
    52  - pass
     176 + 
     177 +def darksearchio(searchstr):
     178 + results = []
     179 + darksearchio_url = supported_engines['darksearchio'] + "/api/search?query={}&page={}"
     180 + max_nb_page = 30
     181 + if args.limit != 0:
     182 + max_nb_page = args.limit
     183 + 
     184 + with requests.Session() as s:
     185 + s.proxies = proxies
     186 + s.headers = random_headers()
     187 + resp = s.get(darksearchio_url.format(quote(searchstr), 1))
    53 188   
    54  - darksearchnumber = "http://darksearch.io/api/search?query="
    55  - req = requests.get(darksearchnumber+args.search,proxies=proxies)
    56  - cookies = req.cookies
    57  - if(req.status_code==200):
    58  - result['darksearch']=[]
    59  - #print(req)
    60  - req = req.json()
    61  - if(req['last_page']>30):
    62  - pageNumber=30
     189 + page_number = 1
     190 + if resp.status_code == 200:
     191 + resp = resp.json()
     192 + if 'last_page' in resp:
     193 + page_number = resp['last_page']
     194 + if page_number > max_nb_page:
     195 + page_number = max_nb_page
    63 196   else:
    64  - pageNumber=req['last_page']
    65  - #print(pageNumber)
    66  - for i in tqdm(range(1,pageNumber+1),desc="Darksearch io"):
    67  - #print(i)
    68  - darksearch = "http://darksearch.io/api/search?query={}&page=".format(args.search)
    69  - req = requests.get(darksearch+str(pageNumber),proxies=proxies,cookies=cookies)
    70  - if(req.status_code==200):
    71  - for r in req.json()['data']:
    72  - result['darksearch'].append({"name":r["title"],"link":r["link"]})
     197 + return
     198 + 
     199 + pos = get_proc_pos()
     200 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("DarkSearch (.io)", pos), position=pos) \
     201 + as progress_bar:
     202 + 
     203 + results = link_finder("darksearchio", resp['data'])
     204 + progress_bar.update()
     205 + 
     206 + for n in range(2, page_number + 1):
     207 + resp = s.get(darksearchio_url.format(quote(searchstr), n))
     208 + if resp.status_code == 200:
     209 + resp = resp.json()
     210 + results = results + link_finder("darksearchio", resp['data'])
     211 + progress_bar.update()
     212 + else:
     213 + # Current page results will be lost but we will try to continue after a short sleep
     214 + time.sleep(1)
     215 + 
     216 + return results
     217 + 
     218 + 
     219 +def onionland(searchstr):
     220 + results = []
     221 + onionlandv3_url = supported_engines['onionland'] + "/search?q={}&page={}"
     222 + max_nb_page = 100
     223 + if args.limit != 0:
     224 + max_nb_page = args.limit
     225 + 
     226 + with requests.Session() as s:
     227 + s.proxies = proxies
     228 + s.headers = random_headers()
     229 + 
     230 + resp = s.get(onionlandv3_url.format(quote(searchstr), 1))
     231 + soup = BeautifulSoup(resp.text, 'html5lib')
     232 + 
     233 + page_number = 1
     234 + for i in soup.find_all('div', attrs={"class": "search-status"}):
     235 + approx_re = re.match(r"About ([,0-9]+) result(.*)",
     236 + clear(i.find('div', attrs={'class': "col-sm-12"}).get_text()))
     237 + if approx_re is not None:
     238 + nb_res = int((approx_re.group(1)).replace(",", ""))
     239 + results_per_page = 19
     240 + page_number = math.ceil(nb_res / results_per_page)
     241 + if page_number > max_nb_page:
     242 + page_number = max_nb_page
     243 + 
     244 + pos = get_proc_pos()
     245 + with tqdm(total=max_nb_page, initial=0, desc=get_tqdm_desc("OnionLand", pos), position=pos) as progress_bar:
     246 + 
     247 + results = link_finder("onionland", soup)
     248 + progress_bar.update()
     249 + 
     250 + for n in range(2, page_number + 1):
     251 + resp = s.get(onionlandv3_url.format(quote(searchstr), n))
     252 + soup = BeautifulSoup(resp.text, 'html5lib')
     253 + ret = link_finder("onionland", soup)
     254 + if len(ret) == 0:
     255 + break
     256 + results = results + ret
     257 + progress_bar.update()
     258 + 
     259 + return results
     260 + 
     261 + 
     262 +def notevil(searchstr):
     263 + results = []
     264 + notevil_url1 = supported_engines['notevil'] + "/index.php?q={}"
     265 + notevil_url2 = supported_engines['notevil'] + "/index.php?q={}&hostLimit=20&start={}&numRows={}&template=0"
     266 + max_nb_page = 20
     267 + if args.limit != 0:
     268 + max_nb_page = args.limit
     269 + 
     270 + # Do not use requests.Session() here (by experience less results would be got)
     271 + req = requests.get(notevil_url1.format(quote(searchstr)), proxies=proxies, headers=random_headers())
     272 + soup = BeautifulSoup(req.text, 'html5lib')
     273 + 
     274 + page_number = 1
     275 + last_div = soup.find("div", attrs={"style": "text-align:center"}).find("div", attrs={"style": "text-align:center"})
     276 + if last_div is not None:
     277 + for i in last_div.find_all("a"):
     278 + page_number = int(i.get_text())
     279 + if page_number > max_nb_page:
     280 + page_number = max_nb_page
     281 + 
     282 + pos = get_proc_pos()
     283 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("not Evil", pos), position=pos) as progress_bar:
     284 + num_rows = 20
     285 + results = link_finder("notevil", soup)
     286 + progress_bar.update()
     287 + 
     288 + for n in range(2, page_number + 1):
     289 + start = (int(n - 1) * num_rows)
     290 + req = requests.get(notevil_url2.format(quote(searchstr), start, num_rows),
     291 + proxies=proxies,
     292 + headers=random_headers())
     293 + soup = BeautifulSoup(req.text, 'html5lib')
     294 + results = results + link_finder("notevil", soup)
     295 + progress_bar.update()
     296 + time.sleep(1)
     297 + 
     298 + return results
     299 + 
     300 + 
     301 +def visitor(searchstr):
     302 + results = []
     303 + visitor_url = supported_engines['visitor'] + "/search/?q={}&page={}"
     304 + max_nb_page = 30
     305 + if args.limit != 0:
     306 + max_nb_page = args.limit
     307 + 
     308 + pos = get_proc_pos()
     309 + with tqdm(total=max_nb_page, initial=0, desc=get_tqdm_desc("VisiTOR", pos), position=pos) as progress_bar:
     310 + continue_processing = True
     311 + page_to_request = 1
     312 + 
     313 + with requests.Session() as s:
     314 + s.proxies = proxies
     315 + s.headers = random_headers()
     316 + 
     317 + while continue_processing:
     318 + resp = s.get(visitor_url.format(quote(searchstr), page_to_request))
     319 + soup = BeautifulSoup(resp.text, 'html5lib')
     320 + results = results + link_finder("visitor", soup)
     321 + progress_bar.update()
     322 + 
     323 + next_page = soup.find('a', text="Next »")
     324 + if next_page is None or page_to_request >= max_nb_page:
     325 + continue_processing = False
     326 + 
     327 + page_to_request += 1
     328 + 
     329 + return results
     330 + 
     331 + 
     332 +def darksearchenginer(searchstr):
     333 + results = []
     334 + darksearchenginer_url = supported_engines['darksearchenginer']
     335 + max_nb_page = 20
     336 + if args.limit != 0:
     337 + max_nb_page = args.limit
     338 + page_number = 1
     339 + 
     340 + with requests.Session() as s:
     341 + s.proxies = proxies
     342 + s.headers = random_headers()
     343 + 
     344 + # Note that this search engine is very likely to timeout
     345 + resp = s.post(darksearchenginer_url, data={"search[keyword]": searchstr, "page": page_number})
     346 + soup = BeautifulSoup(resp.text, 'html5lib')
     347 + 
     348 + pages_input = soup.find_all("input", attrs={"name": "page"})
     349 + for i in pages_input:
     350 + page_number = int(i['value'])
     351 + if page_number > max_nb_page:
     352 + page_number = max_nb_page
     353 + 
     354 + pos = get_proc_pos()
     355 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Dark Search Enginer", pos), position=pos) \
     356 + as progress_bar:
     357 + 
     358 + results = link_finder("darksearchenginer", soup)
     359 + progress_bar.update()
     360 + 
     361 + for n in range(2, page_number + 1):
     362 + resp = s.post(darksearchenginer_url, data={"search[keyword]": searchstr, "page": str(n)})
     363 + soup = BeautifulSoup(resp.text, 'html5lib')
     364 + results = results + link_finder("darksearchenginer", soup)
     365 + progress_bar.update()
     366 + 
     367 + return results
     368 + 
     369 + 
     370 +def phobos(searchstr):
     371 + results = []
     372 + phobos_url = supported_engines['phobos'] + "/search?query={}&p={}"
     373 + max_nb_page = 100
     374 + if args.limit != 0:
     375 + max_nb_page = args.limit
     376 + 
     377 + with requests.Session() as s:
     378 + s.proxies = proxies
     379 + s.headers = random_headers()
     380 + 
     381 + resp = s.get(phobos_url.format(quote(searchstr), 1), proxies=proxies, headers=random_headers())
     382 + soup = BeautifulSoup(resp.text, 'html5lib')
     383 + 
     384 + page_number = 1
     385 + pages = soup.find("div", attrs={"class": "pages"}).find_all('a')
     386 + if pages is not None:
     387 + for i in pages:
     388 + page_number = int(i.get_text())
     389 + if page_number > max_nb_page:
     390 + page_number = max_nb_page
     391 + 
     392 + pos = get_proc_pos()
     393 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Phobos", pos), position=pos) as progress_bar:
     394 + results = link_finder("phobos", soup)
     395 + progress_bar.update()
     396 + 
     397 + for n in range(2, page_number + 1):
     398 + resp = s.get(phobos_url.format(quote(searchstr), n), proxies=proxies, headers=random_headers())
     399 + soup = BeautifulSoup(resp.text, 'html5lib')
     400 + results = results + link_finder("phobos", soup)
     401 + progress_bar.update()
     402 + 
     403 + return results
     404 + 
     405 + 
     406 +def onionsearchserver(searchstr):
     407 + results = []
     408 + onionsearchserver_url1 = supported_engines['onionsearchserver'] + "/oss/"
     409 + onionsearchserver_url2 = None
     410 + results_per_page = 10
     411 + max_nb_page = 100
     412 + if args.limit != 0:
     413 + max_nb_page = args.limit
     414 + 
     415 + with requests.Session() as s:
     416 + s.proxies = proxies
     417 + s.headers = random_headers()
     418 + 
     419 + resp = s.get(onionsearchserver_url1)
     420 + soup = BeautifulSoup(resp.text, 'html5lib')
     421 + for i in soup.find_all('iframe', attrs={"style": "display:none;"}):
     422 + onionsearchserver_url2 = i['src'] + "{}&page={}"
     423 + 
     424 + if onionsearchserver_url2 is None:
     425 + return results
     426 + 
     427 + resp = s.get(onionsearchserver_url2.format(quote(searchstr), 1))
     428 + soup = BeautifulSoup(resp.text, 'html5lib')
     429 + 
     430 + page_number = 1
     431 + pages = soup.find_all("div", attrs={"class": "osscmnrdr ossnumfound"})
     432 + if pages is not None and not str(pages[0].get_text()).startswith("No"):
     433 + total_results = float(str.split(clear(pages[0].get_text()))[0])
     434 + page_number = math.ceil(total_results / results_per_page)
     435 + if page_number > max_nb_page:
     436 + page_number = max_nb_page
     437 + 
     438 + pos = get_proc_pos()
     439 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Onion Search Server", pos), position=pos) \
     440 + as progress_bar:
     441 + 
     442 + results = link_finder("onionsearchserver", soup)
     443 + progress_bar.update()
     444 + 
     445 + for n in range(2, page_number + 1):
     446 + resp = s.get(onionsearchserver_url2.format(quote(searchstr), n))
     447 + soup = BeautifulSoup(resp.text, 'html5lib')
     448 + results = results + link_finder("onionsearchserver", soup)
     449 + progress_bar.update()
     450 + 
     451 + return results
     452 + 
     453 + 
     454 +def grams(searchstr):
     455 + results = []
     456 + # No multi pages handling as it is very hard to get many results on this engine
     457 + grams_url1 = supported_engines['grams']
     458 + grams_url2 = supported_engines['grams'] + "/results"
     459 + 
     460 + with requests.Session() as s:
     461 + s.proxies = proxies
     462 + s.headers = random_headers()
     463 + 
     464 + resp = s.get(grams_url1)
     465 + soup = BeautifulSoup(resp.text, 'html5lib')
     466 + token = soup.find('input', attrs={'name': '_token'})['value']
     467 + 
     468 + pos = get_proc_pos()
     469 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("Grams", pos), position=pos) as progress_bar:
     470 + resp = s.post(grams_url2, data={"req": searchstr, "_token": token})
     471 + soup = BeautifulSoup(resp.text, 'html5lib')
     472 + results = link_finder("grams", soup)
     473 + progress_bar.update()
     474 + 
     475 + return results
     476 + 
     477 + 
     478 +def candle(searchstr):
     479 + results = []
     480 + candle_url = supported_engines['candle'] + "/?q={}"
     481 + 
     482 + pos = get_proc_pos()
     483 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("Candle", pos), position=pos) as progress_bar:
     484 + response = requests.get(candle_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
     485 + soup = BeautifulSoup(response.text, 'html5lib')
     486 + results = link_finder("candle", soup)
     487 + progress_bar.update()
     488 + 
     489 + return results
     490 + 
     491 + 
     492 +def torsearchengine(searchstr):
     493 + results = []
     494 + torsearchengine_url = supported_engines['torsearchengine'] + "/search/move/?q={}&pn={}&num=10&sdh=&"
     495 + max_nb_page = 100
     496 + if args.limit != 0:
     497 + max_nb_page = args.limit
     498 + 
     499 + with requests.Session() as s:
     500 + s.proxies = proxies
     501 + s.headers = random_headers()
     502 + 
     503 + resp = s.get(torsearchengine_url.format(quote(searchstr), 1))
     504 + soup = BeautifulSoup(resp.text, 'html5lib')
     505 + 
     506 + page_number = 1
     507 + for i in soup.find_all('div', attrs={"id": "subheader"}):
     508 + if i.get_text() is not None and "of" in i.get_text():
     509 + total_results = int(i.find('p').find_all('b')[2].get_text().replace(",", ""))
     510 + results_per_page = 10
     511 + page_number = math.ceil(total_results / results_per_page)
     512 + if page_number > max_nb_page:
     513 + page_number = max_nb_page
     514 + 
     515 + pos = get_proc_pos()
     516 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Tor Search Engine", pos), position=pos) \
     517 + as progress_bar:
     518 + 
     519 + results = link_finder("torsearchengine", soup)
     520 + progress_bar.update()
     521 + 
     522 + for n in range(2, page_number + 1):
     523 + resp = s.get(torsearchengine_url.format(quote(searchstr), n))
     524 + soup = BeautifulSoup(resp.text, 'html5lib')
     525 + results = results + link_finder("torsearchengine", soup)
     526 + progress_bar.update()
     527 + 
     528 + return results
     529 + 
     530 + 
     531 +def torgle(searchstr):
     532 + results = []
     533 + torgle_url = supported_engines['torgle'] + "/search.php?term={}"
     534 + 
     535 + pos = get_proc_pos()
     536 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("Torgle", pos), position=pos) as progress_bar:
     537 + response = requests.get(torgle_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
     538 + soup = BeautifulSoup(response.text, 'html5lib')
     539 + results = link_finder("torgle", soup)
     540 + progress_bar.update()
     541 + 
     542 + return results
     543 + 
     544 + 
     545 +def onionsearchengine(searchstr):
     546 + results = []
     547 + onionsearchengine_url = supported_engines['onionsearchengine'] + "/search.php?search={}&submit=Search&page={}"
     548 + # same as onionsearchengine_url = "http://5u56fjmxu63xcmbk.onion/search.php?search={}&submit=Search&page={}"
     549 + max_nb_page = 100
     550 + if args.limit != 0:
     551 + max_nb_page = args.limit
     552 + 
     553 + with requests.Session() as s:
     554 + s.proxies = proxies
     555 + s.headers = random_headers()
     556 + 
     557 + resp = s.get(onionsearchengine_url.format(quote(searchstr), 1))
     558 + soup = BeautifulSoup(resp.text, 'html5lib')
     559 + 
     560 + page_number = 1
     561 + approx_re = re.search(r"\s([0-9]+)\sresult[s]?\sfound\s!.*", clear(soup.find('body').get_text()))
     562 + if approx_re is not None:
     563 + nb_res = int(approx_re.group(1))
     564 + results_per_page = 9
     565 + page_number = math.ceil(float(nb_res / results_per_page))
     566 + if page_number > max_nb_page:
     567 + page_number = max_nb_page
     568 + 
     569 + pos = get_proc_pos()
     570 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Onion Search Engine", pos), position=pos) \
     571 + as progress_bar:
     572 + 
     573 + results = link_finder("onionsearchengine", soup)
     574 + progress_bar.update()
     575 + 
     576 + for n in range(2, page_number + 1):
     577 + resp = s.get(onionsearchengine_url.format(quote(searchstr), n))
     578 + soup = BeautifulSoup(resp.text, 'html5lib')
     579 + results = results + link_finder("onionsearchengine", soup)
     580 + progress_bar.update()
     581 + 
     582 + return results
     583 + 
     584 + 
     585 +def tordex(searchstr):
     586 + results = []
     587 + tordex_url = supported_engines['tordex'] + "/search?query={}&page={}"
     588 + max_nb_page = 100
     589 + if args.limit != 0:
     590 + max_nb_page = args.limit
     591 + 
     592 + with requests.Session() as s:
     593 + s.proxies = proxies
     594 + s.headers = random_headers()
     595 + 
     596 + resp = s.get(tordex_url.format(quote(searchstr), 1))
     597 + soup = BeautifulSoup(resp.text, 'html5lib')
     598 + 
     599 + page_number = 1
     600 + pages = soup.find_all("li", attrs={"class": "page-item"})
     601 + if pages is not None:
     602 + for i in pages:
     603 + if i.get_text() != "...":
     604 + page_number = int(i.get_text())
     605 + if page_number > max_nb_page:
     606 + page_number = max_nb_page
    73 607   
     608 + pos = get_proc_pos()
     609 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Tordex", pos), position=pos) as progress_bar:
     610 + 
     611 + results = link_finder("tordex", soup)
     612 + progress_bar.update()
     613 + 
     614 + for n in range(2, page_number + 1):
     615 + resp = s.get(tordex_url.format(quote(searchstr), n))
     616 + soup = BeautifulSoup(resp.text, 'html5lib')
     617 + results = results + link_finder("tordex", soup)
     618 + progress_bar.update()
     619 + 
     620 + return results
     621 + 
     622 + 
     623 +def tor66(searchstr):
     624 + results = []
     625 + tor66_url = supported_engines['tor66'] + "/search?q={}&sorttype=rel&page={}"
     626 + max_nb_page = 30
     627 + if args.limit != 0:
     628 + max_nb_page = args.limit
     629 + 
     630 + with requests.Session() as s:
     631 + s.proxies = proxies
     632 + s.headers = random_headers()
     633 + 
     634 + resp = s.get(tor66_url.format(quote(searchstr), 1))
     635 + soup = BeautifulSoup(resp.text, 'html5lib')
     636 + 
     637 + page_number = 1
     638 + approx_re = re.search(r"\.Onion\ssites\sfound\s:\s([0-9]+)", resp.text)
     639 + if approx_re is not None:
     640 + nb_res = int(approx_re.group(1))
     641 + results_per_page = 20
     642 + page_number = math.ceil(float(nb_res / results_per_page))
     643 + if page_number > max_nb_page:
     644 + page_number = max_nb_page
     645 + 
     646 + pos = get_proc_pos()
     647 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Tor66", pos), position=pos) as progress_bar:
     648 + 
     649 + results = link_finder("tor66", soup)
     650 + progress_bar.update()
     651 + 
     652 + for n in range(2, page_number + 1):
     653 + resp = s.get(tor66_url.format(quote(searchstr), n))
     654 + soup = BeautifulSoup(resp.text, 'html5lib')
     655 + results = results + link_finder("tor66", soup)
     656 + progress_bar.update()
     657 + 
     658 + return results
     659 + 
     660 + 
     661 +def tormax(searchstr):
     662 + results = []
     663 + tormax_url = supported_engines['tormax'] + "/tormax/search?q={}"
     664 + 
     665 + pos = get_proc_pos()
     666 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("Tormax", pos), position=pos) as progress_bar:
     667 + response = requests.get(tormax_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
     668 + soup = BeautifulSoup(response.text, 'html5lib')
     669 + results = link_finder("tormax", soup)
     670 + progress_bar.update()
     671 + 
     672 + return results
     673 + 
     674 + 
     675 +def haystack(searchstr):
     676 + results = []
     677 + haystack_url = supported_engines['haystack'] + "/?q={}&offset={}"
     678 + # At the 52nd page, it timeouts 100% of the time
     679 + max_nb_page = 50
     680 + if args.limit != 0:
     681 + max_nb_page = args.limit
     682 + offset_coeff = 20
     683 + 
     684 + with requests.Session() as s:
     685 + s.proxies = proxies
     686 + s.headers = random_headers()
     687 + 
     688 + req = s.get(haystack_url.format(quote(searchstr), 0))
     689 + soup = BeautifulSoup(req.text, 'html5lib')
     690 + 
     691 + pos = get_proc_pos()
     692 + with tqdm(total=max_nb_page, initial=0, desc=get_tqdm_desc("Haystack", pos), position=pos) as progress_bar:
     693 + continue_processing = True
     694 + ret = link_finder("haystack", soup)
     695 + results = results + ret
     696 + progress_bar.update()
     697 + if len(ret) == 0:
     698 + continue_processing = False
     699 + 
     700 + it = 1
     701 + while continue_processing:
     702 + offset = int(it * offset_coeff)
     703 + req = s.get(haystack_url.format(quote(searchstr), offset))
     704 + soup = BeautifulSoup(req.text, 'html5lib')
     705 + ret = link_finder("haystack", soup)
     706 + results = results + ret
     707 + progress_bar.update()
     708 + it += 1
     709 + if it >= max_nb_page or len(ret) == 0:
     710 + continue_processing = False
     711 + 
     712 + return results
     713 + 
     714 + 
     715 +def multivac(searchstr):
     716 + results = []
     717 + multivac_url = supported_engines['multivac'] + "/?q={}&page={}"
     718 + max_nb_page = 10
     719 + if args.limit != 0:
     720 + max_nb_page = args.limit
     721 + 
     722 + with requests.Session() as s:
     723 + s.proxies = proxies
     724 + s.headers = random_headers()
     725 + 
     726 + page_to_request = 1
     727 + req = s.get(multivac_url.format(quote(searchstr), page_to_request))
     728 + soup = BeautifulSoup(req.text, 'html5lib')
     729 + 
     730 + pos = get_proc_pos()
     731 + with tqdm(total=max_nb_page, initial=0, desc=get_tqdm_desc("Multivac", pos), position=pos) as progress_bar:
     732 + continue_processing = True
     733 + ret = link_finder("multivac", soup)
     734 + results = results + ret
     735 + progress_bar.update()
     736 + if len(ret) == 0 or page_to_request >= max_nb_page:
     737 + continue_processing = False
     738 + 
     739 + while continue_processing:
     740 + page_to_request += 1
     741 + req = s.get(multivac_url.format(quote(searchstr), page_to_request))
     742 + soup = BeautifulSoup(req.text, 'html5lib')
     743 + ret = link_finder("multivac", soup)
     744 + results = results + ret
     745 + progress_bar.update()
     746 + if len(ret) == 0 or page_to_request >= max_nb_page:
     747 + continue_processing = False
     748 + 
     749 + return results
     750 + 
     751 + 
     752 +def evosearch(searchstr):
     753 + results = []
     754 + evosearch_url = supported_engines['evosearch'] + "/evo/search.php?" \
     755 + "query={}&" \
     756 + "start={}&" \
     757 + "search=1&type=and&mark=bold+text&" \
     758 + "results={}"
     759 + results_per_page = 50
     760 + max_nb_page = 30
     761 + if args.limit != 0:
     762 + max_nb_page = args.limit
     763 + 
     764 + with requests.Session() as s:
     765 + s.proxies = proxies
     766 + s.headers = random_headers()
     767 + 
     768 + req = s.get(evosearch_url.format(quote(searchstr), 1, results_per_page))
     769 + soup = BeautifulSoup(req.text, 'html5lib')
     770 + 
     771 + page_number = 1
     772 + i = soup.find("p", attrs={"class": "cntr"})
     773 + if i is not None:
     774 + if i.get_text() is not None and "of" in i.get_text():
     775 + nb_res = float(clear(str.split(i.get_text().split("-")[1].split("of")[1])[0]))
     776 + page_number = math.ceil(nb_res / results_per_page)
     777 + if page_number > max_nb_page:
     778 + page_number = max_nb_page
     779 + 
     780 + pos = get_proc_pos()
     781 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Evo Search", pos), position=pos) as progress_bar:
     782 + results = link_finder("evosearch", soup)
     783 + progress_bar.update()
     784 + 
     785 + for n in range(2, page_number + 1):
     786 + resp = s.get(evosearch_url.format(quote(searchstr), n, results_per_page))
     787 + soup = BeautifulSoup(resp.text, 'html5lib')
     788 + results = results + link_finder("evosearch", soup)
     789 + progress_bar.update()
     790 + 
     791 + return results
     792 + 
     793 + 
     794 +def oneirun(searchstr):
     795 + results = []
     796 + oneirun_url = supported_engines['oneirun'] + "/Home/IndexEn"
     797 + 
     798 + with requests.Session() as s:
     799 + s.proxies = proxies
     800 + s.headers = random_headers()
     801 + 
     802 + resp = s.get(oneirun_url)
     803 + soup = BeautifulSoup(resp.text, 'html5lib')
     804 + token = soup.find('input', attrs={"name": "__RequestVerificationToken"})['value']
     805 + 
     806 + pos = get_proc_pos()
     807 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("Oneirun", pos), position=pos) as progress_bar:
     808 + response = s.post(oneirun_url.format(quote(searchstr)), data={
     809 + "searchString": searchstr,
     810 + "__RequestVerificationToken": token
     811 + })
     812 + soup = BeautifulSoup(response.text, 'html5lib')
     813 + results = link_finder("oneirun", soup)
     814 + progress_bar.update()
     815 + 
     816 + return results
     817 + 
     818 + 
     819 +def deeplink(searchstr):
     820 + results = []
     821 + deeplink_url1 = supported_engines['deeplink'] + "/index.php"
     822 + deeplink_url2 = supported_engines['deeplink'] + "/?search={}&type=verified"
     823 + 
     824 + with requests.Session() as s:
     825 + s.proxies = proxies
     826 + s.headers = random_headers()
     827 + s.get(deeplink_url1)
     828 + 
     829 + pos = get_proc_pos()
     830 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("DeepLink", pos), position=pos) as progress_bar:
     831 + response = s.get(deeplink_url2.format(quote(searchstr)))
     832 + soup = BeautifulSoup(response.text, 'html5lib')
     833 + results = link_finder("deeplink", soup)
     834 + progress_bar.update()
     835 + 
     836 + return results
     837 + 
     838 + 
     839 +def torsearchengine1(searchstr):
     840 + results = []
     841 + torsearchengine1_url1 = supported_engines['torsearchengine1']
     842 + torsearchengine1_url2 = supported_engines['torsearchengine1'] + "/index.php"
     843 + 
     844 + with requests.Session() as s:
     845 + s.proxies = proxies
     846 + s.headers = random_headers()
     847 + s.get(torsearchengine1_url1)
     848 + 
     849 + pos = get_proc_pos()
     850 + with tqdm(total=1, initial=0, desc=get_tqdm_desc("TOR Search Engine 1", pos), position=pos) as progress_bar:
     851 + response = s.post(torsearchengine1_url2, {'search': searchstr, 'search2': ''})
     852 + soup = BeautifulSoup(response.text, 'html5lib')
     853 + results = link_finder("torsearchengine1", soup)
     854 + progress_bar.update()
     855 + 
     856 + return results
     857 + 
     858 + 
     859 +def torgle1(searchstr):
     860 + results = []
     861 + torgle1_url = supported_engines['torgle1'] + "/torgle/index-frame.php?query={}&search=1&engine-ver=2&isframe=0{}"
     862 + results_per_page = 10
     863 + max_nb_page = 30
     864 + if args.limit != 0:
     865 + max_nb_page = args.limit
     866 + 
     867 + with requests.Session() as s:
     868 + s.proxies = proxies
     869 + s.headers = random_headers()
     870 + 
     871 + resp = s.get(torgle1_url.format(quote(searchstr), ""))
     872 + soup = BeautifulSoup(resp.text, 'html5lib')
     873 + 
     874 + page_number = 1
     875 + i = soup.find('div', attrs={"id": "result_report"})
     876 + if i is not None:
     877 + if i.get_text() is not None and "of" in i.get_text():
     878 + res_re = re.match(r".*of\s([0-9]+)\s.*", clear(i.get_text()))
     879 + total_results = int(res_re.group(1))
     880 + page_number = math.ceil(total_results / results_per_page)
     881 + if page_number > max_nb_page:
     882 + page_number = max_nb_page
     883 + 
     884 + pos = get_proc_pos()
     885 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Torgle 1", pos), position=pos) as progress_bar:
     886 + results = link_finder("torgle1", soup)
     887 + progress_bar.update()
     888 + 
     889 + for n in range(2, page_number + 1):
     890 + start_page_param = "&start={}".format(n)
     891 + resp = s.get(torgle1_url.format(quote(searchstr), start_page_param))
     892 + soup = BeautifulSoup(resp.text, 'html5lib')
     893 + results = results + link_finder("torgle1", soup)
     894 + progress_bar.update()
     895 + 
     896 + return results
     897 + 
     898 + 
     899 +def grams1(searchstr):
     900 + results = []
     901 + grams1_url = supported_engines['grams1'] + "/results/index.php?page={}&searchstr={}"
     902 + results_per_page = 25
     903 + max_nb_page = 30
     904 + if args.limit != 0:
     905 + max_nb_page = args.limit
     906 + 
     907 + with requests.Session() as s:
     908 + s.proxies = proxies
     909 + s.headers = random_headers()
     910 + 
     911 + resp = s.get(grams1_url.format(1, quote(searchstr)))
     912 + soup = BeautifulSoup(resp.text, 'html5lib')
     913 + 
     914 + page_number = 1
     915 + pages = soup.find_all('div', attrs={"class": "result-text"})
     916 + if pages is not None:
     917 + res_re = re.match(r"About ([0-9]+) result(.*)", clear(pages[0].get_text()))
     918 + total_results = int(res_re.group(1))
     919 + page_number = math.ceil(total_results / results_per_page)
     920 + if page_number > max_nb_page:
     921 + page_number = max_nb_page
     922 + 
     923 + pos = get_proc_pos()
     924 + with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Grams 1", pos), position=pos) as progress_bar:
     925 + results = link_finder("grams1", soup)
     926 + progress_bar.update()
     927 + 
     928 + for n in range(2, page_number + 1):
     929 + resp = s.get(grams1_url.format(n, quote(searchstr)))
     930 + soup = BeautifulSoup(resp.text, 'html5lib')
     931 + results = results + link_finder("grams1", soup)
     932 + progress_bar.update()
     933 + 
     934 + return results
     935 + 
     936 + 
     937 +def get_domain_from_url(link):
     938 + fqdn_re = r"^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|\[[a-z0-9\-._~%!$&'()*+,;=:]+\])"
     939 + domain_re = re.match(fqdn_re, link)
     940 + if domain_re is not None:
     941 + if domain_re.lastindex == 2:
     942 + return domain_re.group(2)
     943 + return None
     944 + 
     945 + 
     946 +def write_to_csv(csv_writer, fields):
     947 + line_to_write = []
     948 + if args.fields and len(args.fields) > 0:
     949 + for f in args.fields[0]:
     950 + if f in fields:
     951 + line_to_write.append(fields[f])
     952 + if f == "domain":
     953 + domain = get_domain_from_url(fields['link'])
     954 + line_to_write.append(domain)
     955 + csv_writer.writerow(line_to_write)
    74 956   else:
    75  - print("Rate limit darksearch.io !")
     957 + # Default output mode
     958 + line_to_write.append(fields['engine'])
     959 + line_to_write.append(fields['name'])
     960 + line_to_write.append(fields['link'])
     961 + csv_writer.writerow(line_to_write)
     962 + 
     963 + 
     964 +def link_finder(engine_str, data_obj):
     965 + global filename
     966 + name = ""
     967 + link = ""
     968 + csv_file = None
     969 + found_links = []
     970 + 
     971 + if args.continuous_write:
     972 + csv_file = open(filename, 'a', newline='')
     973 + 
     974 + def add_link():
     975 + found_links.append({"engine": engine_str, "name": name, "link": link})
     976 + 
     977 + if args.continuous_write and csv_file.writable():
     978 + csv_writer = csv.writer(csv_file, delimiter=field_delim, quoting=csv.QUOTE_ALL)
     979 + fields = {"engine": engine_str, "name": name, "link": link}
     980 + write_to_csv(csv_writer, fields)
     981 + 
     982 + if engine_str == "ahmia":
     983 + for r in data_obj.select('li.result h4'):
     984 + name = clear(r.get_text())
     985 + link = r.find('a')['href'].split('redirect_url=')[1]
     986 + add_link()
     987 + 
     988 + if engine_str == "candle":
     989 + for r in data_obj.select("body h2 a"):
     990 + if str(r['href']).startswith("http"):
     991 + name = clear(r.get_text())
     992 + link = clear(r['href'])
     993 + add_link()
     994 + 
     995 + if engine_str == "darksearchenginer":
     996 + for r in data_obj.select('.table-responsive a'):
     997 + name = clear(r.get_text())
     998 + link = clear(r['href'])
     999 + add_link()
     1000 + 
     1001 + if engine_str == "darksearchio":
     1002 + for r in data_obj:
     1003 + name = clear(r["title"])
     1004 + link = clear(r["link"])
     1005 + add_link()
     1006 + 
     1007 + if engine_str == "deeplink":
     1008 + for tr in data_obj.find_all('tr'):
     1009 + cels = tr.find_all('td')
     1010 + if cels is not None and len(cels) == 4:
     1011 + name = clear(cels[1].get_text())
     1012 + link = clear(cels[0].find('a')['href'])
     1013 + add_link()
     1014 + 
     1015 + if engine_str == "evosearch":
     1016 + for r in data_obj.select("#results .title a"):
     1017 + name = clear(r.get_text())
     1018 + link = get_parameter(r['href'], 'url')
     1019 + add_link()
     1020 + 
     1021 + if engine_str == "grams":
     1022 + for i in data_obj.find_all("div", attrs={"class": "media-body"}):
     1023 + if not i.find('span'):
     1024 + for r in i.select(".searchlinks a"):
     1025 + name = clear(r.get_text())
     1026 + link = clear(r['href'])
     1027 + add_link()
     1028 + 
     1029 + if engine_str == "grams1":
     1030 + for r in data_obj.select(".searchlinks a"):
     1031 + name = clear(r.get_text())
     1032 + link = clear(r['href'])
     1033 + add_link()
     1034 + 
     1035 + if engine_str == "haystack":
     1036 + for r in data_obj.select(".result b a"):
     1037 + name = clear(r.get_text())
     1038 + link = get_parameter(r['href'], 'url')
     1039 + add_link()
     1040 + 
     1041 + if engine_str == "multivac":
     1042 + for r in data_obj.select("dl dt a"):
     1043 + if r['href'] != "":
     1044 + name = clear(r.get_text())
     1045 + link = clear(r['href'])
     1046 + add_link()
     1047 + else:
     1048 + break
     1049 + 
     1050 + if engine_str == "notevil":
     1051 + for r in data_obj.select('#content > div > p > a:not([target])'):
     1052 + name = clear(r.get_text())
     1053 + link = get_parameter(r['href'], 'url')
     1054 + add_link()
     1055 + 
     1056 + if engine_str == "oneirun":
     1057 + for td in data_obj.find_all('td', attrs={"style": "vertical-align: top;"}):
     1058 + name = clear(td.find('h5').get_text())
     1059 + link = clear(td.find('a')['href'])
     1060 + add_link()
     1061 + 
     1062 + if engine_str == "onionland":
     1063 + for r in data_obj.select('.result-block .title a'):
     1064 + if not r['href'].startswith('/ads/'):
     1065 + name = clear(r.get_text())
     1066 + link = unquote(unquote(get_parameter(r['href'], 'l')))
     1067 + add_link()
     1068 + 
     1069 + if engine_str == "onionsearchengine":
     1070 + for r in data_obj.select("table a b"):
     1071 + name = clear(r.get_text())
     1072 + link = get_parameter(r.parent['href'], 'u')
     1073 + add_link()
     1074 + 
     1075 + if engine_str == "onionsearchserver":
     1076 + for r in data_obj.select('.osscmnrdr.ossfieldrdr1 a'):
     1077 + name = clear(r.get_text())
     1078 + link = clear(r['href'])
     1079 + add_link()
     1080 + 
     1081 + if engine_str == "phobos":
     1082 + for r in data_obj.select('.serp .titles'):
     1083 + name = clear(r.get_text())
     1084 + link = clear(r['href'])
     1085 + add_link()
     1086 + 
     1087 + if engine_str == "tor66":
     1088 + for i in data_obj.find('hr').find_all_next('b'):
     1089 + if i.find('a'):
     1090 + name = clear(i.find('a').get_text())
     1091 + link = clear(i.find('a')['href'])
     1092 + add_link()
     1093 + 
     1094 + if engine_str == "torch":
     1095 + for r in data_obj.select("dl > dt > a"):
     1096 + name = clear(r.get_text())
     1097 + link = clear(r['href'])
     1098 + add_link()
     1099 + 
     1100 + if engine_str == "torch1":
     1101 + for r in data_obj.select("dl > dt > a"):
     1102 + name = clear(r.get_text())
     1103 + link = clear(r['href'])
     1104 + add_link()
     1105 + 
     1106 + if engine_str == "tordex":
     1107 + for r in data_obj.select('.container h5 a'):
     1108 + name = clear(r.get_text())
     1109 + link = clear(r['href'])
     1110 + add_link()
     1111 + 
     1112 + if engine_str == "torgle":
     1113 + for i in data_obj.find_all('ul', attrs={"id": "page"}):
     1114 + for j in i.find_all('a'):
     1115 + if str(j.get_text()).startswith("http"):
     1116 + link = clear(j.get_text())
     1117 + else:
     1118 + name = clear(j.get_text())
     1119 + add_link()
     1120 + 
     1121 + if engine_str == "torgle1":
     1122 + for r in data_obj.select("#results a.title"):
     1123 + name = clear(r.get_text())
     1124 + link = clear(r['href'])
     1125 + add_link()
     1126 + 
     1127 + if engine_str == "tormax":
     1128 + for r in data_obj.select("#search-results article a.title"):
     1129 + name = clear(r.get_text())
     1130 + link = clear(r.find_next_sibling('div', {'class': 'url'}).get_text())
     1131 + add_link()
     1132 + 
     1133 + if engine_str == "torsearchengine":
     1134 + for i in data_obj.find_all('h3', attrs={'class': 'title text-truncate'}):
     1135 + name = clear(i.find('a').get_text())
     1136 + link = i.find('a')['data-uri']
     1137 + add_link()
     1138 + 
     1139 + if engine_str == "torsearchengine1":
     1140 + for r in data_obj.find_all('span', {'style': 'font-size:1.2em;font-weight:bold;color:#1a0dab'}):
     1141 + name = clear(r.get_text())
     1142 + link = r.find_next_sibling('a')['href']
     1143 + add_link()
     1144 + 
     1145 + if engine_str == "visitor":
     1146 + for r in data_obj.select(".hs_site h3 a"):
     1147 + name = clear(r.get_text())
     1148 + link = clear(r['href'])
     1149 + add_link()
     1150 + 
     1151 + if args.continuous_write and not csv_file.closed:
     1152 + csv_file.close()
     1153 + 
     1154 + return found_links
     1155 + 
     1156 + 
     1157 +def run_method(method_name_and_argument):
     1158 + method_name = method_name_and_argument.split(':')[0]
     1159 + argument = method_name_and_argument.split(':')[1]
     1160 + ret = []
     1161 + try:
     1162 + ret = globals()[method_name](argument)
     1163 + except ConnectionError:
     1164 + print("Error: unable to connect")
     1165 + except OSError:
     1166 + print("Error: unable to connect")
     1167 + except ProtocolError:
     1168 + print("Error: unable to connect")
     1169 + return ret
     1170 + 
     1171 + 
     1172 +def scrape():
     1173 + global filename
     1174 + 
     1175 + start_time = datetime.now()
     1176 + 
     1177 + # Building the filename
     1178 + filename = str(filename).replace("$DATE", start_time.strftime("%Y%m%d%H%M%S"))
     1179 + search = str(args.search).replace(" ", "")
     1180 + if len(search) > 10:
     1181 + search = search[0:9]
     1182 + filename = str(filename).replace("$SEARCH", search)
     1183 + 
     1184 + func_args = []
     1185 + stats_dict = {}
     1186 + if args.engines and len(args.engines) > 0:
     1187 + eng = args.engines[0]
     1188 + for e in eng:
     1189 + try:
     1190 + if not (args.exclude and len(args.exclude) > 0 and e in args.exclude[0]):
     1191 + func_args.append("{}:{}".format(e, args.search))
     1192 + stats_dict[e] = 0
     1193 + except KeyError:
     1194 + print("Error: search engine {} not in the list of supported engines".format(e))
     1195 + else:
     1196 + for e in supported_engines.keys():
     1197 + if not (args.exclude and len(args.exclude) > 0 and e in args.exclude[0]):
     1198 + func_args.append("{}:{}".format(e, args.search))
     1199 + stats_dict[e] = 0
     1200 + 
     1201 + # Doing multiprocessing
     1202 + units = min((cpu_count() - 1), len(func_args))
     1203 + if args.mp_units and args.mp_units > 0:
     1204 + units = min(args.mp_units, len(func_args))
     1205 + print("search.py started with {} processing units...".format(units))
     1206 + freeze_support()
     1207 + 
     1208 + results = {}
     1209 + with Pool(units, initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),)) as p:
     1210 + results_map = p.map(run_method, func_args)
     1211 + results = reduce(lambda a, b: a + b if b is not None else a, results_map)
    76 1212   
    77  - result['onionland'] = []
    78  - for n in tqdm(range(1,400),desc="OnionLand"):
    79  - onionland = "http://3bbaaaccczcbdddz.onion/search?q={}&page={}".format(args.search,n)
    80  - #print(urlTorch)
    81  - req = requests.get(onionland,proxies=proxies)
    82  - if(req.status_code==200):
    83  - soup = BeautifulSoup(req.text, 'html.parser')
    84  - for i in soup.findAll('div',attrs={"class":"result-block"}):
    85  - if('''<span class="label-ad">Ad</span>''' not in i):
    86  - #print({"name":i.find('div',attrs={'class':"title"}).get_text(),"link":clear(i.find('div',attrs={'class':"link"}).get_text())})
    87  - result['onionland'].append({"name":i.find('div',attrs={'class':"title"}).get_text(),"link":clear(i.find('div',attrs={'class':"link"}).get_text())})
    88  - else:
    89  - break
     1213 + stop_time = datetime.now()
     1214 + 
     1215 + if not args.continuous_write:
     1216 + with open(filename, 'w', newline='') as csv_file:
     1217 + csv_writer = csv.writer(csv_file, delimiter=field_delim, quoting=csv.QUOTE_ALL)
     1218 + for r in results:
     1219 + write_to_csv(csv_writer, r)
     1220 + 
     1221 + total = 0
     1222 + print("\nReport:")
     1223 + print(" Execution time: %s seconds" % (stop_time - start_time))
     1224 + print(" Results per engine:")
     1225 + for r in results:
     1226 + stats_dict[r['engine']] += 1
     1227 + for s in stats_dict:
     1228 + n = stats_dict[s]
     1229 + print(" {}: {}".format(s, str(n)))
     1230 + total += n
     1231 + print(" Total: {} links written to {}".format(str(total), filename))
    90 1232   
    91  - print("Ahmia : " + str(len(result['ahmia'])))
    92  - print("Torch : "+str(len(result['urlTorch'])))
    93  - print("Darksearch io : "+str(len(result['darksearch'])))
    94  - print("Onionland : "+str(len(result['onionland'])))
    95  - print("Total of {} links !\nExported to {}".format(str(len(result['ahmia'])+len(result['urlTorch'])+len(result['darksearch'])+len(result['onionland'])),args.output))
    96  - f= open(args.output,"w+")
    97  - for i in result['urlTorch']:
    98  - f.write("name : {} link: {}\n".format(clearn(i["name"]),i["link"]))
    99  - for i in result['onionland']:
    100  - f.write("name: {} link : {}\n".format(clearn(i["name"]),i["link"]))
    101  - for i in result['ahmia']:
    102  - f.write("name : {} link : {}\n".format(clearn(i["name"]),i["link"]))
    103  - for i in result['darksearch']:
    104  - f.write("name : {} link : {}\n".format(clearn(i["name"]),i["link"]))
    105 1233   
    106  - f.close()
    107  -scrape()
     1234 +if __name__ == "__main__":
     1235 + scrape()
    108 1236   
Please wait...
Page is in error, reload to recover