STRLCPY/OnionSearch

Merge pull request #1 from Gobarigo/master
```
Adding new engines and new features
```
megadose committed with GitHub 4 years ago

f4270af8

2 parents
727893cf
fb9c8a4f

Revision indexing in progress... (symbol navigation in revisions will be accurate after indexed)

■ ■ ■ ■ ■ ■ ■

README.md

		skipped 1 lines
2	2		## Educational purposes only
3	3		[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
4	4
5		-	OnionSearch is a script that scrapes urls on different .onion search engines. In 30 minutes you get 10,000 unique urls.
	5	+	OnionSearch is a Python3 script that scrapes urls on different ".onion" search engines.
	6	+
	7	+	In 30 minutes you get thousands of unique urls.
	8	+
6	9		## 💡 Prerequisite
7	10		[Python 3](https://www.python.org/download/releases/3.0/)
8	11
9		-	## �� Search engines used
	12	+	## �� Currently supported Search engines
10	13		- Ahmia
11		-	- Torch
	14	+	- TORCH (x2)
12	15		- Darksearch io
13	16		- OnionLand
	17	+	- not Evil
	18	+	- VisiTOR
	19	+	- Dark Search Enginer
	20	+	- Phobos
	21	+	- Onion Search Server
	22	+	- Grams (x2)
	23	+	- Candle
	24	+	- Tor Search Engine (x2)
	25	+	- Torgle (x2)
	26	+	- Onion Search Engine
	27	+	- Tordex
	28	+	- Tor66
	29	+	- Tormax
	30	+	- Haystack
	31	+	- Multivac
	32	+	- Evo Search
	33	+	- Oneirun
	34	+	- DeepLink
14	35
15	36		## 🛠️ Installation
	37	+
16	38		```
17	39		git clone https://github.com/megadose/OnionSearch.git
18	40		cd OnionSearch
19	41		pip3 install -r requirements.txt
	42	+	pip3 install 'urllib3[socks]'
20	43		python3 search.py -h
21	44		```
	45	+
22	46		## 📈 Usage
	47	+
	48	+	Help:
23	49		```
24		-	python3 search.py [-h] --search "search" [--proxy 127.0.0.1:1337] [--output mylinks.txt]
25		-	python3 search.py --search "computer" --output computer.txt
	50	+	usage: search.py [-h] [--proxy PROXY] [--output OUTPUT]
	51	+	[--continuous_write CONTINUOUS_WRITE] [--limit LIMIT]
	52	+	[--engines [ENGINES [ENGINES ...]]]
	53	+	[--exclude [EXCLUDE [EXCLUDE ...]]]
	54	+	[--fields [FIELDS [FIELDS ...]]]
	55	+	[--field_delimiter FIELD_DELIMITER] [--mp_units MP_UNITS]
	56	+	search
	57	+
	58	+	positional arguments:
	59	+	search The search string or phrase
	60	+
	61	+	optional arguments:
	62	+	-h, --help show this help message and exit
	63	+	--proxy PROXY Set Tor proxy (default: 127.0.0.1:9050)
	64	+	--output OUTPUT Output File (default: output_$SEARCH_$DATE.txt), where $SEARCH is replaced by the first chars of the search string and $DATE is replaced by the datetime
	65	+	--continuous_write CONTINUOUS_WRITE
	66	+	Write progressively to output file (default: False)
	67	+	--limit LIMIT Set a max number of pages per engine to load
	68	+	--engines [ENGINES [ENGINES ...]]
	69	+	Engines to request (default: full list)
	70	+	--exclude [EXCLUDE [EXCLUDE ...]]
	71	+	Engines to exclude (default: none)
	72	+	--fields [FIELDS [FIELDS ...]]
	73	+	Fields to output to csv file (default: engine name link), available fields are shown below
	74	+	--field_delimiter FIELD_DELIMITER
	75	+	Delimiter for the CSV fields
	76	+	--mp_units MP_UNITS Number of processing units (default: core number minus 1)
	77	+
	78	+	[...]
26	79		```
	80	+
	81	+	### Multi-processing behaviour
	82	+
	83	+	By default, the script will run with the parameter `mp_units = cpu_count() - 1`. It means if you have a machine with 4 cores,
	84	+	it will run 3 scraping functions in parallel. You can force `mp_units` to any value but it is recommended to leave to default.
	85	+	You may want to set it to 1 to run all requests sequentially (disabling multi-processing feature).
	86	+
	87	+	Please note that continuous writing to csv file has not been heavily tested with multiprocessing feature and therefore
	88	+	may not work as expected.
	89	+
	90	+	Please also note that the progress bars may not be properly displayed when `mp_units` is greater than 1.
	91	+	It does not affect the results, so don't worry.
	92	+
	93	+	### Examples
	94	+
	95	+	To request all the engines for the word "computer":
	96	+	```
	97	+	python3 search.py "computer"
	98	+	```
	99	+
	100	+	To request all the engines excepted "Ahmia" and "Candle" for the word "computer":
	101	+	```
	102	+	python3 search.py "computer" --exclude ahmia candle
	103	+	```
	104	+
	105	+	To request only "Tor66", "DeepLink" and "Phobos" for the word "computer":
	106	+	```
	107	+	python3 search.py "computer" --engines tor66 deeplink phobos
	108	+	```
	109	+
	110	+	The same as previously but limiting to 3 the number of pages to load per engine:
	111	+	```
	112	+	python3 search.py "computer" --engines tor66 deeplink phobos --limit 3
	113	+	```
	114	+
	115	+	Please kindly note that the list of supported engines (and their keys) is given in the script help (-h).
	116	+
	117	+
	118	+	### Output
	119	+
	120	+	#### Default output
	121	+
	122	+	By default, the file is written at the end of the process. The file will be csv formatted, containing the following columns:
	123	+	```
	124	+	"engine","name of the link","url"
	125	+	```
	126	+
	127	+	#### Customizing the output fields
	128	+
	129	+	You can customize what will be flush in the output file by using the parameters `--fields` and `--field_delimiter`.
	130	+
	131	+	`--fields` allows you to add, remove, re-order the output fields. The default mode is show just below. Instead, you can for instance
	132	+	choose to output:
	133	+	```
	134	+	"engine","name of the link","url","domain"
	135	+	```
	136	+	by setting `--fields engine name link domain`.
	137	+
	138	+	Or even, you can choose to output:
	139	+	```
	140	+	"engine","domain"
	141	+	```
	142	+	by setting `--fields engine domain`.
	143	+
	144	+	These are examples but there are many possibilities.
	145	+
	146	+	Finally, you can also choose to modify the CSV delimiter (comma by default), for instance: `--field_delimiter ";"`.
	147	+
	148	+	#### Changing filename
	149	+
	150	+	The filename will be set by default to `output_$DATE_$SEARCH.txt`, where $DATE represents the current datetime and $SEARCH the first
	151	+	characters of the search string.
	152	+
	153	+	You can modify this filename by using `--output` when running the script, for instance:
	154	+	```
	155	+	python3 search.py "computer" --output "\$DATE.csv"
	156	+	python3 search.py "computer" --output output.txt
	157	+	python3 search.py "computer" --output "\$DATE_\$SEARCH.csv"
	158	+	...
	159	+	```
	160	+	(Note that it might be necessary to escape the dollar character.)
	161	+
	162	+	In the csv file produced, the name and url strings are sanitized as much as possible, but there might still be some problems...
	163	+
	164	+	#### Write progressively
	165	+
	166	+	You can choose to progressively write to the output (instead of everything at the end, which would prevent
	167	+	losing the results if something goes wrong). To do so you have to use `--continuous_write True`, just as is:
	168	+	```
	169	+	python3 search.py "computer" --continuous_write True
	170	+	```
	171	+	You can then use the `tail -f` (tail follow) Unix command to actively watch or monitor the results of the scraping.
	172	+
27	173		## 📝 License
28	174		[GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.fr.html)
29	175
		skipped 2 lines

■ ■ ■ ■ ■ ■

engines.py

1	+	ENGINES = {
2	+	"ahmia": "http://msydqstlz2kzerdg.onion",
3	+	"torch": "http://xmh57jrzrnw6insl.onion",
4	+	"torch1": "http://mkojmtnv22hpbfxk.onion",
5	+	"darksearchio": "http://darksearch.io",
6	+	"onionland": "http://3bbad7fauom4d6sgppalyqddsqbf5u5p56b5k5uk2zxsy3d6ey2jobad.onion",
7	+	"notevil": "http://hss3uro2hsxfogfq.onion",
8	+	"visitor": "http://visitorfi5kl7q7i.onion",
9	+	"darksearchenginer": "http://7pwy57iklvt6lyhe.onion",
10	+	"phobos": "http://phobosxilamwcg75xt22id7aywkzol6q6rfl2flipcqoc4e4ahima5id.onion",
11	+	"onionsearchserver": "http://oss7wrm7xvoub77o.onion",
12	+	"grams": "http://grams7enqfy4nieo.onion",
13	+	"grams1": "http://grams7ebnju7gwjl.onion",
14	+	"candle": "http://gjobjn7ievumcq6z.onion",
15	+	"torsearchengine": "http://searchcoaupi3csb.onion",
16	+	"torsearchengine1": "http://searcherc3uwk535.onion",
17	+	"torgle": "http://torglejzid2cyoqt.onion",
18	+	"torgle1": "http://torgle5fj664v7pf.onion",
19	+	"onionsearchengine": "http://onionf4j3fwqpeo5.onion",
20	+	"tordex": "http://tordex7iie7z2wcg.onion",
21	+	"tor66": "http://tor66sezptuu2nta.onion",
22	+	"tormax": "http://tormaxunodsbvtgo.onion",
23	+	"haystack": "http://haystakvxad7wbk5.onion",
24	+	"multivac": "http://multivacigqzqqon.onion",
25	+	"evosearch": "http://evo7no6twwwrm63c.onion",
26	+	"oneirun": "http://oneirunda366dmfm.onion",
27	+	"deeplink": "http://deeplinkdeatbml7.onion",
28	+	}
29	+

■ ■ ■ ■ ■ ■

requirements.txt

1 + requests
1 2 beautifulsoup4
2 3 tqdm
3 4 argparse
4 -
5 + html5lib

All occurrences

■ ■ ■ ■ ■ ■ ■

search.py

1		-	import requests,json
2		-	from bs4 import BeautifulSoup
3	1		import argparse
	2	+	import csv
	3	+	import math
	4	+	import re
	5	+	import time
	6	+	from datetime import datetime
	7	+	from functools import reduce
	8	+	from random import choice
	9	+
	10	+	from multiprocessing import Pool, cpu_count, current_process, freeze_support
4	11		from tqdm import tqdm
5		-	parser = argparse.ArgumentParser()
6		-	required = parser.add_argument_group('required arguments')
	12	+
	13	+	import requests
	14	+	import urllib.parse as urlparse
	15	+	from urllib.parse import parse_qs
	16	+	from urllib.parse import quote
	17	+	from urllib.parse import unquote
	18	+	from bs4 import BeautifulSoup
	19	+	from urllib3.exceptions import ProtocolError
	20	+
	21	+	import engines
	22	+
	23	+	desktop_agents = [
	24	+	'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
	25	+	'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
	26	+	'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
	27	+	'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
	28	+	'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) '
	29	+	'AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
	30	+	'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
	31	+	'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) '
	32	+	'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
	33	+	'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) '
	34	+	'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
	35	+	'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
	36	+	'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
	37	+	'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'
	38	+	]
	39	+
	40	+	supported_engines = engines.ENGINES
	41	+
	42	+	available_csv_fields = [
	43	+	"engine",
	44	+	"name",
	45	+	"link",
	46	+	"domain"
	47	+	]
	48	+
	49	+
	50	+	def print_epilog():
	51	+	epilog = "Available CSV fields: \n\t"
	52	+	for f in available_csv_fields:
	53	+	epilog += " {}".format(f)
	54	+	epilog += "\n"
	55	+	epilog += "Supported engines: \n\t"
	56	+	for e in supported_engines.keys():
	57	+	epilog += " {}".format(e)
	58	+	return epilog
	59	+
	60	+
	61	+	parser = argparse.ArgumentParser(epilog=print_epilog(), formatter_class=argparse.RawTextHelpFormatter)
7	62		parser.add_argument("--proxy", default='localhost:9050', type=str, help="Set Tor proxy (default: 127.0.0.1:9050)")
8		-	parser.add_argument("--output", default='output.txt', type=str, help="Output File (default: output.txt)")
	63	+	parser.add_argument("--output", default='output_$SEARCH_$DATE.txt', type=str,
	64	+	help="Output File (default: output_$SEARCH_$DATE.txt), where $SEARCH is replaced by the first "
	65	+	"chars of the search string and $DATE is replaced by the datetime")
	66	+	parser.add_argument("--continuous_write", type=bool, default=False,
	67	+	help="Write progressively to output file (default: False)")
	68	+	parser.add_argument("search", type=str, help="The search string or phrase")
	69	+	parser.add_argument("--limit", type=int, default=0, help="Set a max number of pages per engine to load")
	70	+	parser.add_argument("--engines", type=str, action='append', help='Engines to request (default: full list)', nargs="*")
	71	+	parser.add_argument("--exclude", type=str, action='append', help='Engines to exclude (default: none)', nargs="*")
	72	+	parser.add_argument("--fields", type=str, action='append',
	73	+	help='Fields to output to csv file (default: engine name link), available fields are shown below',
	74	+	nargs="*")
	75	+	parser.add_argument("--field_delimiter", type=str, default=",", help='Delimiter for the CSV fields')
	76	+	parser.add_argument("--mp_units", type=int, default=(cpu_count() - 1), help="Number of processing units (default: "
	77	+	"core number minus 1)")
9	78
10		-	parser.add_argument("--search",type=str, help="search")
11	79		args = parser.parse_args()
12	80		proxies = {'http': 'socks5h://{}'.format(args.proxy), 'https': 'socks5h://{}'.format(args.proxy)}
	81	+	filename = args.output
	82	+	field_delim = ","
	83	+	if args.field_delimiter and len(args.field_delimiter) == 1:
	84	+	field_delim = args.field_delimiter
	85	+
	86	+
	87	+	def random_headers():
	88	+	return {'User-Agent': choice(desktop_agents),
	89	+	'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8'}
	90	+
13	91
14	92		def clear(toclear):
15		-	return(toclear.replace("\n","").replace(" ",""))
16		-	def clearn(toclear):
17		-	return(toclear.replace("\n"," "))
	93	+	str = toclear.replace("\n", " ")
	94	+	str = ' '.join(str.split())
	95	+	return str
	96	+
	97	+
	98	+	def get_parameter(url, parameter_name):
	99	+	parsed = urlparse.urlparse(url)
	100	+	return parse_qs(parsed.query)[parameter_name][0]
	101	+
	102	+
	103	+	def get_proc_pos():
	104	+	return (current_process()._identity[0]) - 1
	105	+
	106	+
	107	+	def get_tqdm_desc(e_name, pos):
	108	+	return "%20s (#%d)" % (e_name, pos)
	109	+
	110	+
	111	+	def ahmia(searchstr):
	112	+	results = []
	113	+	ahmia_url = supported_engines['ahmia'] + "/search/?q={}"
	114	+
	115	+	pos = get_proc_pos()
	116	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("Ahmia", pos), position=pos) as progress_bar:
	117	+	response = requests.get(ahmia_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
	118	+	soup = BeautifulSoup(response.text, 'html5lib')
	119	+	results = link_finder("ahmia", soup)
	120	+	progress_bar.update()
	121	+
	122	+	return results
	123	+
	124	+
	125	+	def torch(searchstr):
	126	+	results = []
	127	+	torch_url = supported_engines['torch'] + "/4a1f6b371c/search.cgi?cmd=Search!&np={}&q={}"
	128	+	results_per_page = 10
	129	+	max_nb_page = 100
	130	+	if args.limit != 0:
	131	+	max_nb_page = args.limit
	132	+
	133	+	with requests.Session() as s:
	134	+	s.proxies = proxies
	135	+	s.headers = random_headers()
	136	+
	137	+	req = s.get(torch_url.format(0, quote(searchstr)))
	138	+	soup = BeautifulSoup(req.text, 'html5lib')
	139	+
	140	+	page_number = 1
	141	+	for i in soup.find("table", attrs={"width": "100%"}).find_all("small"):
	142	+	if i.get_text() is not None and "of" in i.get_text():
	143	+	page_number = math.ceil(float(clear(i.get_text().split("-")[1].split("of")[1])) / results_per_page)
	144	+	if page_number > max_nb_page:
	145	+	page_number = max_nb_page
	146	+
	147	+	pos = get_proc_pos()
	148	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("TORCH", pos), position=pos) as progress_bar:
	149	+
	150	+	results = link_finder("torch", soup)
	151	+	progress_bar.update()
	152	+
	153	+	# Usually range is 2 to n+1, but TORCH behaves differently
	154	+	for n in range(1, page_number):
	155	+	req = s.get(torch_url.format(n, quote(searchstr)))
	156	+	soup = BeautifulSoup(req.text, 'html5lib')
	157	+	results = results + link_finder("torch", soup)
	158	+	progress_bar.update()
	159	+
	160	+	return results
	161	+
	162	+
	163	+	def torch1(searchstr):
	164	+	results = []
	165	+	torch1_url = supported_engines['torch1'] + "/search?q={}&cmd=Search!"
	166	+
	167	+	pos = get_proc_pos()
	168	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("TORCH 1", pos), position=pos) as progress_bar:
	169	+	response = requests.get(torch1_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
	170	+	soup = BeautifulSoup(response.text, 'html5lib')
	171	+	results = link_finder("torch1", soup)
	172	+	progress_bar.update()
18	173
19		-	def scrape():
20		-	result = {}
21		-	ahmia = "http://msydqstlz2kzerdg.onion/search/?q="+args.search
22		-	response = requests.get(ahmia, proxies=proxies)
23		-	#print(response)
24		-	soup = BeautifulSoup(response.text, 'html.parser')
25		-	result['ahmia'] = []
26		-	#pageNumber = clear(soup.find("span", id="pageResultEnd").get_text())
27		-	for i in tqdm(soup.findAll('li', attrs = {'class' : 'result'}),desc="Ahmia"):
28		-	i = i.find('h4')
29		-	result['ahmia'].append({"name":clear(i.get_text()),"link":i.find('a')['href'].replace("/search/search/redirect?search_term=search&redirect_url=","")})
	174	+	return results
30	175
31		-	urlTorchNumber = "http://xmh57jrzrnw6insl.onion/4a1f6b371c/search.cgi?cmd=Search!&np=1&q="
32		-	req = requests.get(urlTorchNumber+args.search,proxies=proxies)
33		-	soup = BeautifulSoup(req.text, 'html.parser')
34		-	result['urlTorch'] = []
35		-	pageNumber = ""
36		-	for i in soup.find("table",attrs={"width":"100%"}).findAll("small"):
37		-	if("of"in i.get_text()):
38		-	pageNumber = i.get_text()
39		-	pageNumber = round(float(clear(pageNumber.split("-")[1].split("of")[1]))/10)
40		-	if(pageNumber>99):
41		-	pageNumber=99
42		-	result['urlTorch'] = []
43		-	for n in tqdm(range(1,pageNumber+1),desc="Torch"):
44		-	urlTorch = "http://xmh57jrzrnw6insl.onion/4a1f6b371c/search.cgi?cmd=Search!&np={}&q={}".format(n,args.search)
45		-	#print(urlTorch)
46		-	try:
47		-	req = requests.get(urlTorchNumber+args.search,proxies=proxies)
48		-	soup = BeautifulSoup(req.text, 'html.parser')
49		-	for i in soup.findAll('dl'):
50		-	result['urlTorch'].append({"name":clear(i.find('a').get_text()),"link":i.find('a')['href']})
51		-	except:
52		-	pass
	176	+
	177	+	def darksearchio(searchstr):
	178	+	results = []
	179	+	darksearchio_url = supported_engines['darksearchio'] + "/api/search?query={}&page={}"
	180	+	max_nb_page = 30
	181	+	if args.limit != 0:
	182	+	max_nb_page = args.limit
	183	+
	184	+	with requests.Session() as s:
	185	+	s.proxies = proxies
	186	+	s.headers = random_headers()
	187	+	resp = s.get(darksearchio_url.format(quote(searchstr), 1))
53	188
54		-	darksearchnumber = "http://darksearch.io/api/search?query="
55		-	req = requests.get(darksearchnumber+args.search,proxies=proxies)
56		-	cookies = req.cookies
57		-	if(req.status_code==200):
58		-	result['darksearch']=[]
59		-	#print(req)
60		-	req = req.json()
61		-	if(req['last_page']>30):
62		-	pageNumber=30
	189	+	page_number = 1
	190	+	if resp.status_code == 200:
	191	+	resp = resp.json()
	192	+	if 'last_page' in resp:
	193	+	page_number = resp['last_page']
	194	+	if page_number > max_nb_page:
	195	+	page_number = max_nb_page
63	196		else:
64		-	pageNumber=req['last_page']
65		-	#print(pageNumber)
66		-	for i in tqdm(range(1,pageNumber+1),desc="Darksearch io"):
67		-	#print(i)
68		-	darksearch = "http://darksearch.io/api/search?query={}&page=".format(args.search)
69		-	req = requests.get(darksearch+str(pageNumber),proxies=proxies,cookies=cookies)
70		-	if(req.status_code==200):
71		-	for r in req.json()['data']:
72		-	result['darksearch'].append({"name":r["title"],"link":r["link"]})
	197	+	return
	198	+
	199	+	pos = get_proc_pos()
	200	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("DarkSearch (.io)", pos), position=pos) \
	201	+	as progress_bar:
	202	+
	203	+	results = link_finder("darksearchio", resp['data'])
	204	+	progress_bar.update()
	205	+
	206	+	for n in range(2, page_number + 1):
	207	+	resp = s.get(darksearchio_url.format(quote(searchstr), n))
	208	+	if resp.status_code == 200:
	209	+	resp = resp.json()
	210	+	results = results + link_finder("darksearchio", resp['data'])
	211	+	progress_bar.update()
	212	+	else:
	213	+	# Current page results will be lost but we will try to continue after a short sleep
	214	+	time.sleep(1)
	215	+
	216	+	return results
	217	+
	218	+
	219	+	def onionland(searchstr):
	220	+	results = []
	221	+	onionlandv3_url = supported_engines['onionland'] + "/search?q={}&page={}"
	222	+	max_nb_page = 100
	223	+	if args.limit != 0:
	224	+	max_nb_page = args.limit
	225	+
	226	+	with requests.Session() as s:
	227	+	s.proxies = proxies
	228	+	s.headers = random_headers()
	229	+
	230	+	resp = s.get(onionlandv3_url.format(quote(searchstr), 1))
	231	+	soup = BeautifulSoup(resp.text, 'html5lib')
	232	+
	233	+	page_number = 1
	234	+	for i in soup.find_all('div', attrs={"class": "search-status"}):
	235	+	approx_re = re.match(r"About ([,0-9]+) result(.*)",
	236	+	clear(i.find('div', attrs={'class': "col-sm-12"}).get_text()))
	237	+	if approx_re is not None:
	238	+	nb_res = int((approx_re.group(1)).replace(",", ""))
	239	+	results_per_page = 19
	240	+	page_number = math.ceil(nb_res / results_per_page)
	241	+	if page_number > max_nb_page:
	242	+	page_number = max_nb_page
	243	+
	244	+	pos = get_proc_pos()
	245	+	with tqdm(total=max_nb_page, initial=0, desc=get_tqdm_desc("OnionLand", pos), position=pos) as progress_bar:
	246	+
	247	+	results = link_finder("onionland", soup)
	248	+	progress_bar.update()
	249	+
	250	+	for n in range(2, page_number + 1):
	251	+	resp = s.get(onionlandv3_url.format(quote(searchstr), n))
	252	+	soup = BeautifulSoup(resp.text, 'html5lib')
	253	+	ret = link_finder("onionland", soup)
	254	+	if len(ret) == 0:
	255	+	break
	256	+	results = results + ret
	257	+	progress_bar.update()
	258	+
	259	+	return results
	260	+
	261	+
	262	+	def notevil(searchstr):
	263	+	results = []
	264	+	notevil_url1 = supported_engines['notevil'] + "/index.php?q={}"
	265	+	notevil_url2 = supported_engines['notevil'] + "/index.php?q={}&hostLimit=20&start={}&numRows={}&template=0"
	266	+	max_nb_page = 20
	267	+	if args.limit != 0:
	268	+	max_nb_page = args.limit
	269	+
	270	+	# Do not use requests.Session() here (by experience less results would be got)
	271	+	req = requests.get(notevil_url1.format(quote(searchstr)), proxies=proxies, headers=random_headers())
	272	+	soup = BeautifulSoup(req.text, 'html5lib')
	273	+
	274	+	page_number = 1
	275	+	last_div = soup.find("div", attrs={"style": "text-align:center"}).find("div", attrs={"style": "text-align:center"})
	276	+	if last_div is not None:
	277	+	for i in last_div.find_all("a"):
	278	+	page_number = int(i.get_text())
	279	+	if page_number > max_nb_page:
	280	+	page_number = max_nb_page
	281	+
	282	+	pos = get_proc_pos()
	283	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("not Evil", pos), position=pos) as progress_bar:
	284	+	num_rows = 20
	285	+	results = link_finder("notevil", soup)
	286	+	progress_bar.update()
	287	+
	288	+	for n in range(2, page_number + 1):
	289	+	start = (int(n - 1) * num_rows)
	290	+	req = requests.get(notevil_url2.format(quote(searchstr), start, num_rows),
	291	+	proxies=proxies,
	292	+	headers=random_headers())
	293	+	soup = BeautifulSoup(req.text, 'html5lib')
	294	+	results = results + link_finder("notevil", soup)
	295	+	progress_bar.update()
	296	+	time.sleep(1)
	297	+
	298	+	return results
	299	+
	300	+
	301	+	def visitor(searchstr):
	302	+	results = []
	303	+	visitor_url = supported_engines['visitor'] + "/search/?q={}&page={}"
	304	+	max_nb_page = 30
	305	+	if args.limit != 0:
	306	+	max_nb_page = args.limit
	307	+
	308	+	pos = get_proc_pos()
	309	+	with tqdm(total=max_nb_page, initial=0, desc=get_tqdm_desc("VisiTOR", pos), position=pos) as progress_bar:
	310	+	continue_processing = True
	311	+	page_to_request = 1
	312	+
	313	+	with requests.Session() as s:
	314	+	s.proxies = proxies
	315	+	s.headers = random_headers()
	316	+
	317	+	while continue_processing:
	318	+	resp = s.get(visitor_url.format(quote(searchstr), page_to_request))
	319	+	soup = BeautifulSoup(resp.text, 'html5lib')
	320	+	results = results + link_finder("visitor", soup)
	321	+	progress_bar.update()
	322	+
	323	+	next_page = soup.find('a', text="Next »")
	324	+	if next_page is None or page_to_request >= max_nb_page:
	325	+	continue_processing = False
	326	+
	327	+	page_to_request += 1
	328	+
	329	+	return results
	330	+
	331	+
	332	+	def darksearchenginer(searchstr):
	333	+	results = []
	334	+	darksearchenginer_url = supported_engines['darksearchenginer']
	335	+	max_nb_page = 20
	336	+	if args.limit != 0:
	337	+	max_nb_page = args.limit
	338	+	page_number = 1
	339	+
	340	+	with requests.Session() as s:
	341	+	s.proxies = proxies
	342	+	s.headers = random_headers()
	343	+
	344	+	# Note that this search engine is very likely to timeout
	345	+	resp = s.post(darksearchenginer_url, data={"search[keyword]": searchstr, "page": page_number})
	346	+	soup = BeautifulSoup(resp.text, 'html5lib')
	347	+
	348	+	pages_input = soup.find_all("input", attrs={"name": "page"})
	349	+	for i in pages_input:
	350	+	page_number = int(i['value'])
	351	+	if page_number > max_nb_page:
	352	+	page_number = max_nb_page
	353	+
	354	+	pos = get_proc_pos()
	355	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Dark Search Enginer", pos), position=pos) \
	356	+	as progress_bar:
	357	+
	358	+	results = link_finder("darksearchenginer", soup)
	359	+	progress_bar.update()
	360	+
	361	+	for n in range(2, page_number + 1):
	362	+	resp = s.post(darksearchenginer_url, data={"search[keyword]": searchstr, "page": str(n)})
	363	+	soup = BeautifulSoup(resp.text, 'html5lib')
	364	+	results = results + link_finder("darksearchenginer", soup)
	365	+	progress_bar.update()
	366	+
	367	+	return results
	368	+
	369	+
	370	+	def phobos(searchstr):
	371	+	results = []
	372	+	phobos_url = supported_engines['phobos'] + "/search?query={}&p={}"
	373	+	max_nb_page = 100
	374	+	if args.limit != 0:
	375	+	max_nb_page = args.limit
	376	+
	377	+	with requests.Session() as s:
	378	+	s.proxies = proxies
	379	+	s.headers = random_headers()
	380	+
	381	+	resp = s.get(phobos_url.format(quote(searchstr), 1), proxies=proxies, headers=random_headers())
	382	+	soup = BeautifulSoup(resp.text, 'html5lib')
	383	+
	384	+	page_number = 1
	385	+	pages = soup.find("div", attrs={"class": "pages"}).find_all('a')
	386	+	if pages is not None:
	387	+	for i in pages:
	388	+	page_number = int(i.get_text())
	389	+	if page_number > max_nb_page:
	390	+	page_number = max_nb_page
	391	+
	392	+	pos = get_proc_pos()
	393	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Phobos", pos), position=pos) as progress_bar:
	394	+	results = link_finder("phobos", soup)
	395	+	progress_bar.update()
	396	+
	397	+	for n in range(2, page_number + 1):
	398	+	resp = s.get(phobos_url.format(quote(searchstr), n), proxies=proxies, headers=random_headers())
	399	+	soup = BeautifulSoup(resp.text, 'html5lib')
	400	+	results = results + link_finder("phobos", soup)
	401	+	progress_bar.update()
	402	+
	403	+	return results
	404	+
	405	+
	406	+	def onionsearchserver(searchstr):
	407	+	results = []
	408	+	onionsearchserver_url1 = supported_engines['onionsearchserver'] + "/oss/"
	409	+	onionsearchserver_url2 = None
	410	+	results_per_page = 10
	411	+	max_nb_page = 100
	412	+	if args.limit != 0:
	413	+	max_nb_page = args.limit
	414	+
	415	+	with requests.Session() as s:
	416	+	s.proxies = proxies
	417	+	s.headers = random_headers()
	418	+
	419	+	resp = s.get(onionsearchserver_url1)
	420	+	soup = BeautifulSoup(resp.text, 'html5lib')
	421	+	for i in soup.find_all('iframe', attrs={"style": "display:none;"}):
	422	+	onionsearchserver_url2 = i['src'] + "{}&page={}"
	423	+
	424	+	if onionsearchserver_url2 is None:
	425	+	return results
	426	+
	427	+	resp = s.get(onionsearchserver_url2.format(quote(searchstr), 1))
	428	+	soup = BeautifulSoup(resp.text, 'html5lib')
	429	+
	430	+	page_number = 1
	431	+	pages = soup.find_all("div", attrs={"class": "osscmnrdr ossnumfound"})
	432	+	if pages is not None and not str(pages[0].get_text()).startswith("No"):
	433	+	total_results = float(str.split(clear(pages[0].get_text()))[0])
	434	+	page_number = math.ceil(total_results / results_per_page)
	435	+	if page_number > max_nb_page:
	436	+	page_number = max_nb_page
	437	+
	438	+	pos = get_proc_pos()
	439	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Onion Search Server", pos), position=pos) \
	440	+	as progress_bar:
	441	+
	442	+	results = link_finder("onionsearchserver", soup)
	443	+	progress_bar.update()
	444	+
	445	+	for n in range(2, page_number + 1):
	446	+	resp = s.get(onionsearchserver_url2.format(quote(searchstr), n))
	447	+	soup = BeautifulSoup(resp.text, 'html5lib')
	448	+	results = results + link_finder("onionsearchserver", soup)
	449	+	progress_bar.update()
	450	+
	451	+	return results
	452	+
	453	+
	454	+	def grams(searchstr):
	455	+	results = []
	456	+	# No multi pages handling as it is very hard to get many results on this engine
	457	+	grams_url1 = supported_engines['grams']
	458	+	grams_url2 = supported_engines['grams'] + "/results"
	459	+
	460	+	with requests.Session() as s:
	461	+	s.proxies = proxies
	462	+	s.headers = random_headers()
	463	+
	464	+	resp = s.get(grams_url1)
	465	+	soup = BeautifulSoup(resp.text, 'html5lib')
	466	+	token = soup.find('input', attrs={'name': '_token'})['value']
	467	+
	468	+	pos = get_proc_pos()
	469	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("Grams", pos), position=pos) as progress_bar:
	470	+	resp = s.post(grams_url2, data={"req": searchstr, "_token": token})
	471	+	soup = BeautifulSoup(resp.text, 'html5lib')
	472	+	results = link_finder("grams", soup)
	473	+	progress_bar.update()
	474	+
	475	+	return results
	476	+
	477	+
	478	+	def candle(searchstr):
	479	+	results = []
	480	+	candle_url = supported_engines['candle'] + "/?q={}"
	481	+
	482	+	pos = get_proc_pos()
	483	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("Candle", pos), position=pos) as progress_bar:
	484	+	response = requests.get(candle_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
	485	+	soup = BeautifulSoup(response.text, 'html5lib')
	486	+	results = link_finder("candle", soup)
	487	+	progress_bar.update()
	488	+
	489	+	return results
	490	+
	491	+
	492	+	def torsearchengine(searchstr):
	493	+	results = []
	494	+	torsearchengine_url = supported_engines['torsearchengine'] + "/search/move/?q={}&pn={}&num=10&sdh=&"
	495	+	max_nb_page = 100
	496	+	if args.limit != 0:
	497	+	max_nb_page = args.limit
	498	+
	499	+	with requests.Session() as s:
	500	+	s.proxies = proxies
	501	+	s.headers = random_headers()
	502	+
	503	+	resp = s.get(torsearchengine_url.format(quote(searchstr), 1))
	504	+	soup = BeautifulSoup(resp.text, 'html5lib')
	505	+
	506	+	page_number = 1
	507	+	for i in soup.find_all('div', attrs={"id": "subheader"}):
	508	+	if i.get_text() is not None and "of" in i.get_text():
	509	+	total_results = int(i.find('p').find_all('b')[2].get_text().replace(",", ""))
	510	+	results_per_page = 10
	511	+	page_number = math.ceil(total_results / results_per_page)
	512	+	if page_number > max_nb_page:
	513	+	page_number = max_nb_page
	514	+
	515	+	pos = get_proc_pos()
	516	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Tor Search Engine", pos), position=pos) \
	517	+	as progress_bar:
	518	+
	519	+	results = link_finder("torsearchengine", soup)
	520	+	progress_bar.update()
	521	+
	522	+	for n in range(2, page_number + 1):
	523	+	resp = s.get(torsearchengine_url.format(quote(searchstr), n))
	524	+	soup = BeautifulSoup(resp.text, 'html5lib')
	525	+	results = results + link_finder("torsearchengine", soup)
	526	+	progress_bar.update()
	527	+
	528	+	return results
	529	+
	530	+
	531	+	def torgle(searchstr):
	532	+	results = []
	533	+	torgle_url = supported_engines['torgle'] + "/search.php?term={}"
	534	+
	535	+	pos = get_proc_pos()
	536	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("Torgle", pos), position=pos) as progress_bar:
	537	+	response = requests.get(torgle_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
	538	+	soup = BeautifulSoup(response.text, 'html5lib')
	539	+	results = link_finder("torgle", soup)
	540	+	progress_bar.update()
	541	+
	542	+	return results
	543	+
	544	+
	545	+	def onionsearchengine(searchstr):
	546	+	results = []
	547	+	onionsearchengine_url = supported_engines['onionsearchengine'] + "/search.php?search={}&submit=Search&page={}"
	548	+	# same as onionsearchengine_url = "http://5u56fjmxu63xcmbk.onion/search.php?search={}&submit=Search&page={}"
	549	+	max_nb_page = 100
	550	+	if args.limit != 0:
	551	+	max_nb_page = args.limit
	552	+
	553	+	with requests.Session() as s:
	554	+	s.proxies = proxies
	555	+	s.headers = random_headers()
	556	+
	557	+	resp = s.get(onionsearchengine_url.format(quote(searchstr), 1))
	558	+	soup = BeautifulSoup(resp.text, 'html5lib')
	559	+
	560	+	page_number = 1
	561	+	approx_re = re.search(r"\s([0-9]+)\sresult[s]?\sfound\s!.*", clear(soup.find('body').get_text()))
	562	+	if approx_re is not None:
	563	+	nb_res = int(approx_re.group(1))
	564	+	results_per_page = 9
	565	+	page_number = math.ceil(float(nb_res / results_per_page))
	566	+	if page_number > max_nb_page:
	567	+	page_number = max_nb_page
	568	+
	569	+	pos = get_proc_pos()
	570	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Onion Search Engine", pos), position=pos) \
	571	+	as progress_bar:
	572	+
	573	+	results = link_finder("onionsearchengine", soup)
	574	+	progress_bar.update()
	575	+
	576	+	for n in range(2, page_number + 1):
	577	+	resp = s.get(onionsearchengine_url.format(quote(searchstr), n))
	578	+	soup = BeautifulSoup(resp.text, 'html5lib')
	579	+	results = results + link_finder("onionsearchengine", soup)
	580	+	progress_bar.update()
	581	+
	582	+	return results
	583	+
	584	+
	585	+	def tordex(searchstr):
	586	+	results = []
	587	+	tordex_url = supported_engines['tordex'] + "/search?query={}&page={}"
	588	+	max_nb_page = 100
	589	+	if args.limit != 0:
	590	+	max_nb_page = args.limit
	591	+
	592	+	with requests.Session() as s:
	593	+	s.proxies = proxies
	594	+	s.headers = random_headers()
	595	+
	596	+	resp = s.get(tordex_url.format(quote(searchstr), 1))
	597	+	soup = BeautifulSoup(resp.text, 'html5lib')
	598	+
	599	+	page_number = 1
	600	+	pages = soup.find_all("li", attrs={"class": "page-item"})
	601	+	if pages is not None:
	602	+	for i in pages:
	603	+	if i.get_text() != "...":
	604	+	page_number = int(i.get_text())
	605	+	if page_number > max_nb_page:
	606	+	page_number = max_nb_page
73	607
	608	+	pos = get_proc_pos()
	609	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Tordex", pos), position=pos) as progress_bar:
	610	+
	611	+	results = link_finder("tordex", soup)
	612	+	progress_bar.update()
	613	+
	614	+	for n in range(2, page_number + 1):
	615	+	resp = s.get(tordex_url.format(quote(searchstr), n))
	616	+	soup = BeautifulSoup(resp.text, 'html5lib')
	617	+	results = results + link_finder("tordex", soup)
	618	+	progress_bar.update()
	619	+
	620	+	return results
	621	+
	622	+
	623	+	def tor66(searchstr):
	624	+	results = []
	625	+	tor66_url = supported_engines['tor66'] + "/search?q={}&sorttype=rel&page={}"
	626	+	max_nb_page = 30
	627	+	if args.limit != 0:
	628	+	max_nb_page = args.limit
	629	+
	630	+	with requests.Session() as s:
	631	+	s.proxies = proxies
	632	+	s.headers = random_headers()
	633	+
	634	+	resp = s.get(tor66_url.format(quote(searchstr), 1))
	635	+	soup = BeautifulSoup(resp.text, 'html5lib')
	636	+
	637	+	page_number = 1
	638	+	approx_re = re.search(r"\.Onion\ssites\sfound\s:\s([0-9]+)", resp.text)
	639	+	if approx_re is not None:
	640	+	nb_res = int(approx_re.group(1))
	641	+	results_per_page = 20
	642	+	page_number = math.ceil(float(nb_res / results_per_page))
	643	+	if page_number > max_nb_page:
	644	+	page_number = max_nb_page
	645	+
	646	+	pos = get_proc_pos()
	647	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Tor66", pos), position=pos) as progress_bar:
	648	+
	649	+	results = link_finder("tor66", soup)
	650	+	progress_bar.update()
	651	+
	652	+	for n in range(2, page_number + 1):
	653	+	resp = s.get(tor66_url.format(quote(searchstr), n))
	654	+	soup = BeautifulSoup(resp.text, 'html5lib')
	655	+	results = results + link_finder("tor66", soup)
	656	+	progress_bar.update()
	657	+
	658	+	return results
	659	+
	660	+
	661	+	def tormax(searchstr):
	662	+	results = []
	663	+	tormax_url = supported_engines['tormax'] + "/tormax/search?q={}"
	664	+
	665	+	pos = get_proc_pos()
	666	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("Tormax", pos), position=pos) as progress_bar:
	667	+	response = requests.get(tormax_url.format(quote(searchstr)), proxies=proxies, headers=random_headers())
	668	+	soup = BeautifulSoup(response.text, 'html5lib')
	669	+	results = link_finder("tormax", soup)
	670	+	progress_bar.update()
	671	+
	672	+	return results
	673	+
	674	+
	675	+	def haystack(searchstr):
	676	+	results = []
	677	+	haystack_url = supported_engines['haystack'] + "/?q={}&offset={}"
	678	+	# At the 52nd page, it timeouts 100% of the time
	679	+	max_nb_page = 50
	680	+	if args.limit != 0:
	681	+	max_nb_page = args.limit
	682	+	offset_coeff = 20
	683	+
	684	+	with requests.Session() as s:
	685	+	s.proxies = proxies
	686	+	s.headers = random_headers()
	687	+
	688	+	req = s.get(haystack_url.format(quote(searchstr), 0))
	689	+	soup = BeautifulSoup(req.text, 'html5lib')
	690	+
	691	+	pos = get_proc_pos()
	692	+	with tqdm(total=max_nb_page, initial=0, desc=get_tqdm_desc("Haystack", pos), position=pos) as progress_bar:
	693	+	continue_processing = True
	694	+	ret = link_finder("haystack", soup)
	695	+	results = results + ret
	696	+	progress_bar.update()
	697	+	if len(ret) == 0:
	698	+	continue_processing = False
	699	+
	700	+	it = 1
	701	+	while continue_processing:
	702	+	offset = int(it * offset_coeff)
	703	+	req = s.get(haystack_url.format(quote(searchstr), offset))
	704	+	soup = BeautifulSoup(req.text, 'html5lib')
	705	+	ret = link_finder("haystack", soup)
	706	+	results = results + ret
	707	+	progress_bar.update()
	708	+	it += 1
	709	+	if it >= max_nb_page or len(ret) == 0:
	710	+	continue_processing = False
	711	+
	712	+	return results
	713	+
	714	+
	715	+	def multivac(searchstr):
	716	+	results = []
	717	+	multivac_url = supported_engines['multivac'] + "/?q={}&page={}"
	718	+	max_nb_page = 10
	719	+	if args.limit != 0:
	720	+	max_nb_page = args.limit
	721	+
	722	+	with requests.Session() as s:
	723	+	s.proxies = proxies
	724	+	s.headers = random_headers()
	725	+
	726	+	page_to_request = 1
	727	+	req = s.get(multivac_url.format(quote(searchstr), page_to_request))
	728	+	soup = BeautifulSoup(req.text, 'html5lib')
	729	+
	730	+	pos = get_proc_pos()
	731	+	with tqdm(total=max_nb_page, initial=0, desc=get_tqdm_desc("Multivac", pos), position=pos) as progress_bar:
	732	+	continue_processing = True
	733	+	ret = link_finder("multivac", soup)
	734	+	results = results + ret
	735	+	progress_bar.update()
	736	+	if len(ret) == 0 or page_to_request >= max_nb_page:
	737	+	continue_processing = False
	738	+
	739	+	while continue_processing:
	740	+	page_to_request += 1
	741	+	req = s.get(multivac_url.format(quote(searchstr), page_to_request))
	742	+	soup = BeautifulSoup(req.text, 'html5lib')
	743	+	ret = link_finder("multivac", soup)
	744	+	results = results + ret
	745	+	progress_bar.update()
	746	+	if len(ret) == 0 or page_to_request >= max_nb_page:
	747	+	continue_processing = False
	748	+
	749	+	return results
	750	+
	751	+
	752	+	def evosearch(searchstr):
	753	+	results = []
	754	+	evosearch_url = supported_engines['evosearch'] + "/evo/search.php?" \
	755	+	"query={}&" \
	756	+	"start={}&" \
	757	+	"search=1&type=and&mark=bold+text&" \
	758	+	"results={}"
	759	+	results_per_page = 50
	760	+	max_nb_page = 30
	761	+	if args.limit != 0:
	762	+	max_nb_page = args.limit
	763	+
	764	+	with requests.Session() as s:
	765	+	s.proxies = proxies
	766	+	s.headers = random_headers()
	767	+
	768	+	req = s.get(evosearch_url.format(quote(searchstr), 1, results_per_page))
	769	+	soup = BeautifulSoup(req.text, 'html5lib')
	770	+
	771	+	page_number = 1
	772	+	i = soup.find("p", attrs={"class": "cntr"})
	773	+	if i is not None:
	774	+	if i.get_text() is not None and "of" in i.get_text():
	775	+	nb_res = float(clear(str.split(i.get_text().split("-")[1].split("of")[1])[0]))
	776	+	page_number = math.ceil(nb_res / results_per_page)
	777	+	if page_number > max_nb_page:
	778	+	page_number = max_nb_page
	779	+
	780	+	pos = get_proc_pos()
	781	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Evo Search", pos), position=pos) as progress_bar:
	782	+	results = link_finder("evosearch", soup)
	783	+	progress_bar.update()
	784	+
	785	+	for n in range(2, page_number + 1):
	786	+	resp = s.get(evosearch_url.format(quote(searchstr), n, results_per_page))
	787	+	soup = BeautifulSoup(resp.text, 'html5lib')
	788	+	results = results + link_finder("evosearch", soup)
	789	+	progress_bar.update()
	790	+
	791	+	return results
	792	+
	793	+
	794	+	def oneirun(searchstr):
	795	+	results = []
	796	+	oneirun_url = supported_engines['oneirun'] + "/Home/IndexEn"
	797	+
	798	+	with requests.Session() as s:
	799	+	s.proxies = proxies
	800	+	s.headers = random_headers()
	801	+
	802	+	resp = s.get(oneirun_url)
	803	+	soup = BeautifulSoup(resp.text, 'html5lib')
	804	+	token = soup.find('input', attrs={"name": "__RequestVerificationToken"})['value']
	805	+
	806	+	pos = get_proc_pos()
	807	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("Oneirun", pos), position=pos) as progress_bar:
	808	+	response = s.post(oneirun_url.format(quote(searchstr)), data={
	809	+	"searchString": searchstr,
	810	+	"__RequestVerificationToken": token
	811	+	})
	812	+	soup = BeautifulSoup(response.text, 'html5lib')
	813	+	results = link_finder("oneirun", soup)
	814	+	progress_bar.update()
	815	+
	816	+	return results
	817	+
	818	+
	819	+	def deeplink(searchstr):
	820	+	results = []
	821	+	deeplink_url1 = supported_engines['deeplink'] + "/index.php"
	822	+	deeplink_url2 = supported_engines['deeplink'] + "/?search={}&type=verified"
	823	+
	824	+	with requests.Session() as s:
	825	+	s.proxies = proxies
	826	+	s.headers = random_headers()
	827	+	s.get(deeplink_url1)
	828	+
	829	+	pos = get_proc_pos()
	830	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("DeepLink", pos), position=pos) as progress_bar:
	831	+	response = s.get(deeplink_url2.format(quote(searchstr)))
	832	+	soup = BeautifulSoup(response.text, 'html5lib')
	833	+	results = link_finder("deeplink", soup)
	834	+	progress_bar.update()
	835	+
	836	+	return results
	837	+
	838	+
	839	+	def torsearchengine1(searchstr):
	840	+	results = []
	841	+	torsearchengine1_url1 = supported_engines['torsearchengine1']
	842	+	torsearchengine1_url2 = supported_engines['torsearchengine1'] + "/index.php"
	843	+
	844	+	with requests.Session() as s:
	845	+	s.proxies = proxies
	846	+	s.headers = random_headers()
	847	+	s.get(torsearchengine1_url1)
	848	+
	849	+	pos = get_proc_pos()
	850	+	with tqdm(total=1, initial=0, desc=get_tqdm_desc("TOR Search Engine 1", pos), position=pos) as progress_bar:
	851	+	response = s.post(torsearchengine1_url2, {'search': searchstr, 'search2': ''})
	852	+	soup = BeautifulSoup(response.text, 'html5lib')
	853	+	results = link_finder("torsearchengine1", soup)
	854	+	progress_bar.update()
	855	+
	856	+	return results
	857	+
	858	+
	859	+	def torgle1(searchstr):
	860	+	results = []
	861	+	torgle1_url = supported_engines['torgle1'] + "/torgle/index-frame.php?query={}&search=1&engine-ver=2&isframe=0{}"
	862	+	results_per_page = 10
	863	+	max_nb_page = 30
	864	+	if args.limit != 0:
	865	+	max_nb_page = args.limit
	866	+
	867	+	with requests.Session() as s:
	868	+	s.proxies = proxies
	869	+	s.headers = random_headers()
	870	+
	871	+	resp = s.get(torgle1_url.format(quote(searchstr), ""))
	872	+	soup = BeautifulSoup(resp.text, 'html5lib')
	873	+
	874	+	page_number = 1
	875	+	i = soup.find('div', attrs={"id": "result_report"})
	876	+	if i is not None:
	877	+	if i.get_text() is not None and "of" in i.get_text():
	878	+	res_re = re.match(r".of\s([0-9]+)\s.", clear(i.get_text()))
	879	+	total_results = int(res_re.group(1))
	880	+	page_number = math.ceil(total_results / results_per_page)
	881	+	if page_number > max_nb_page:
	882	+	page_number = max_nb_page
	883	+
	884	+	pos = get_proc_pos()
	885	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Torgle 1", pos), position=pos) as progress_bar:
	886	+	results = link_finder("torgle1", soup)
	887	+	progress_bar.update()
	888	+
	889	+	for n in range(2, page_number + 1):
	890	+	start_page_param = "&start={}".format(n)
	891	+	resp = s.get(torgle1_url.format(quote(searchstr), start_page_param))
	892	+	soup = BeautifulSoup(resp.text, 'html5lib')
	893	+	results = results + link_finder("torgle1", soup)
	894	+	progress_bar.update()
	895	+
	896	+	return results
	897	+
	898	+
	899	+	def grams1(searchstr):
	900	+	results = []
	901	+	grams1_url = supported_engines['grams1'] + "/results/index.php?page={}&searchstr={}"
	902	+	results_per_page = 25
	903	+	max_nb_page = 30
	904	+	if args.limit != 0:
	905	+	max_nb_page = args.limit
	906	+
	907	+	with requests.Session() as s:
	908	+	s.proxies = proxies
	909	+	s.headers = random_headers()
	910	+
	911	+	resp = s.get(grams1_url.format(1, quote(searchstr)))
	912	+	soup = BeautifulSoup(resp.text, 'html5lib')
	913	+
	914	+	page_number = 1
	915	+	pages = soup.find_all('div', attrs={"class": "result-text"})
	916	+	if pages is not None:
	917	+	res_re = re.match(r"About ([0-9]+) result(.*)", clear(pages[0].get_text()))
	918	+	total_results = int(res_re.group(1))
	919	+	page_number = math.ceil(total_results / results_per_page)
	920	+	if page_number > max_nb_page:
	921	+	page_number = max_nb_page
	922	+
	923	+	pos = get_proc_pos()
	924	+	with tqdm(total=page_number, initial=0, desc=get_tqdm_desc("Grams 1", pos), position=pos) as progress_bar:
	925	+	results = link_finder("grams1", soup)
	926	+	progress_bar.update()
	927	+
	928	+	for n in range(2, page_number + 1):
	929	+	resp = s.get(grams1_url.format(n, quote(searchstr)))
	930	+	soup = BeautifulSoup(resp.text, 'html5lib')
	931	+	results = results + link_finder("grams1", soup)
	932	+	progress_bar.update()
	933	+
	934	+	return results
	935	+
	936	+
	937	+	def get_domain_from_url(link):
	938	+	fqdn_re = r"^[a-z][a-z0-9+\-.]://([a-z0-9\-._~%!$&'()+,;=]+@)?([a-z0-9\-._~%]+\|\[[a-z0-9\-._~%!$&'()*+,;=:]+\])"
	939	+	domain_re = re.match(fqdn_re, link)
	940	+	if domain_re is not None:
	941	+	if domain_re.lastindex == 2:
	942	+	return domain_re.group(2)
	943	+	return None
	944	+
	945	+
	946	+	def write_to_csv(csv_writer, fields):
	947	+	line_to_write = []
	948	+	if args.fields and len(args.fields) > 0:
	949	+	for f in args.fields[0]:
	950	+	if f in fields:
	951	+	line_to_write.append(fields[f])
	952	+	if f == "domain":
	953	+	domain = get_domain_from_url(fields['link'])
	954	+	line_to_write.append(domain)
	955	+	csv_writer.writerow(line_to_write)
74	956		else:
75		-	print("Rate limit darksearch.io !")
	957	+	# Default output mode
	958	+	line_to_write.append(fields['engine'])
	959	+	line_to_write.append(fields['name'])
	960	+	line_to_write.append(fields['link'])
	961	+	csv_writer.writerow(line_to_write)
	962	+
	963	+
	964	+	def link_finder(engine_str, data_obj):
	965	+	global filename
	966	+	name = ""
	967	+	link = ""
	968	+	csv_file = None
	969	+	found_links = []
	970	+
	971	+	if args.continuous_write:
	972	+	csv_file = open(filename, 'a', newline='')
	973	+
	974	+	def add_link():
	975	+	found_links.append({"engine": engine_str, "name": name, "link": link})
	976	+
	977	+	if args.continuous_write and csv_file.writable():
	978	+	csv_writer = csv.writer(csv_file, delimiter=field_delim, quoting=csv.QUOTE_ALL)
	979	+	fields = {"engine": engine_str, "name": name, "link": link}
	980	+	write_to_csv(csv_writer, fields)
	981	+
	982	+	if engine_str == "ahmia":
	983	+	for r in data_obj.select('li.result h4'):
	984	+	name = clear(r.get_text())
	985	+	link = r.find('a')['href'].split('redirect_url=')[1]
	986	+	add_link()
	987	+
	988	+	if engine_str == "candle":
	989	+	for r in data_obj.select("body h2 a"):
	990	+	if str(r['href']).startswith("http"):
	991	+	name = clear(r.get_text())
	992	+	link = clear(r['href'])
	993	+	add_link()
	994	+
	995	+	if engine_str == "darksearchenginer":
	996	+	for r in data_obj.select('.table-responsive a'):
	997	+	name = clear(r.get_text())
	998	+	link = clear(r['href'])
	999	+	add_link()
	1000	+
	1001	+	if engine_str == "darksearchio":
	1002	+	for r in data_obj:
	1003	+	name = clear(r["title"])
	1004	+	link = clear(r["link"])
	1005	+	add_link()
	1006	+
	1007	+	if engine_str == "deeplink":
	1008	+	for tr in data_obj.find_all('tr'):
	1009	+	cels = tr.find_all('td')
	1010	+	if cels is not None and len(cels) == 4:
	1011	+	name = clear(cels[1].get_text())
	1012	+	link = clear(cels[0].find('a')['href'])
	1013	+	add_link()
	1014	+
	1015	+	if engine_str == "evosearch":
	1016	+	for r in data_obj.select("#results .title a"):
	1017	+	name = clear(r.get_text())
	1018	+	link = get_parameter(r['href'], 'url')
	1019	+	add_link()
	1020	+
	1021	+	if engine_str == "grams":
	1022	+	for i in data_obj.find_all("div", attrs={"class": "media-body"}):
	1023	+	if not i.find('span'):
	1024	+	for r in i.select(".searchlinks a"):
	1025	+	name = clear(r.get_text())
	1026	+	link = clear(r['href'])
	1027	+	add_link()
	1028	+
	1029	+	if engine_str == "grams1":
	1030	+	for r in data_obj.select(".searchlinks a"):
	1031	+	name = clear(r.get_text())
	1032	+	link = clear(r['href'])
	1033	+	add_link()
	1034	+
	1035	+	if engine_str == "haystack":
	1036	+	for r in data_obj.select(".result b a"):
	1037	+	name = clear(r.get_text())
	1038	+	link = get_parameter(r['href'], 'url')
	1039	+	add_link()
	1040	+
	1041	+	if engine_str == "multivac":
	1042	+	for r in data_obj.select("dl dt a"):
	1043	+	if r['href'] != "":
	1044	+	name = clear(r.get_text())
	1045	+	link = clear(r['href'])
	1046	+	add_link()
	1047	+	else:
	1048	+	break
	1049	+
	1050	+	if engine_str == "notevil":
	1051	+	for r in data_obj.select('#content > div > p > a:not([target])'):
	1052	+	name = clear(r.get_text())
	1053	+	link = get_parameter(r['href'], 'url')
	1054	+	add_link()
	1055	+
	1056	+	if engine_str == "oneirun":
	1057	+	for td in data_obj.find_all('td', attrs={"style": "vertical-align: top;"}):
	1058	+	name = clear(td.find('h5').get_text())
	1059	+	link = clear(td.find('a')['href'])
	1060	+	add_link()
	1061	+
	1062	+	if engine_str == "onionland":
	1063	+	for r in data_obj.select('.result-block .title a'):
	1064	+	if not r['href'].startswith('/ads/'):
	1065	+	name = clear(r.get_text())
	1066	+	link = unquote(unquote(get_parameter(r['href'], 'l')))
	1067	+	add_link()
	1068	+
	1069	+	if engine_str == "onionsearchengine":
	1070	+	for r in data_obj.select("table a b"):
	1071	+	name = clear(r.get_text())
	1072	+	link = get_parameter(r.parent['href'], 'u')
	1073	+	add_link()
	1074	+
	1075	+	if engine_str == "onionsearchserver":
	1076	+	for r in data_obj.select('.osscmnrdr.ossfieldrdr1 a'):
	1077	+	name = clear(r.get_text())
	1078	+	link = clear(r['href'])
	1079	+	add_link()
	1080	+
	1081	+	if engine_str == "phobos":
	1082	+	for r in data_obj.select('.serp .titles'):
	1083	+	name = clear(r.get_text())
	1084	+	link = clear(r['href'])
	1085	+	add_link()
	1086	+
	1087	+	if engine_str == "tor66":
	1088	+	for i in data_obj.find('hr').find_all_next('b'):
	1089	+	if i.find('a'):
	1090	+	name = clear(i.find('a').get_text())
	1091	+	link = clear(i.find('a')['href'])
	1092	+	add_link()
	1093	+
	1094	+	if engine_str == "torch":
	1095	+	for r in data_obj.select("dl > dt > a"):
	1096	+	name = clear(r.get_text())
	1097	+	link = clear(r['href'])
	1098	+	add_link()
	1099	+
	1100	+	if engine_str == "torch1":
	1101	+	for r in data_obj.select("dl > dt > a"):
	1102	+	name = clear(r.get_text())
	1103	+	link = clear(r['href'])
	1104	+	add_link()
	1105	+
	1106	+	if engine_str == "tordex":
	1107	+	for r in data_obj.select('.container h5 a'):
	1108	+	name = clear(r.get_text())
	1109	+	link = clear(r['href'])
	1110	+	add_link()
	1111	+
	1112	+	if engine_str == "torgle":
	1113	+	for i in data_obj.find_all('ul', attrs={"id": "page"}):
	1114	+	for j in i.find_all('a'):
	1115	+	if str(j.get_text()).startswith("http"):
	1116	+	link = clear(j.get_text())
	1117	+	else:
	1118	+	name = clear(j.get_text())
	1119	+	add_link()
	1120	+
	1121	+	if engine_str == "torgle1":
	1122	+	for r in data_obj.select("#results a.title"):
	1123	+	name = clear(r.get_text())
	1124	+	link = clear(r['href'])
	1125	+	add_link()
	1126	+
	1127	+	if engine_str == "tormax":
	1128	+	for r in data_obj.select("#search-results article a.title"):
	1129	+	name = clear(r.get_text())
	1130	+	link = clear(r.find_next_sibling('div', {'class': 'url'}).get_text())
	1131	+	add_link()
	1132	+
	1133	+	if engine_str == "torsearchengine":
	1134	+	for i in data_obj.find_all('h3', attrs={'class': 'title text-truncate'}):
	1135	+	name = clear(i.find('a').get_text())
	1136	+	link = i.find('a')['data-uri']
	1137	+	add_link()
	1138	+
	1139	+	if engine_str == "torsearchengine1":
	1140	+	for r in data_obj.find_all('span', {'style': 'font-size:1.2em;font-weight:bold;color:#1a0dab'}):
	1141	+	name = clear(r.get_text())
	1142	+	link = r.find_next_sibling('a')['href']
	1143	+	add_link()
	1144	+
	1145	+	if engine_str == "visitor":
	1146	+	for r in data_obj.select(".hs_site h3 a"):
	1147	+	name = clear(r.get_text())
	1148	+	link = clear(r['href'])
	1149	+	add_link()
	1150	+
	1151	+	if args.continuous_write and not csv_file.closed:
	1152	+	csv_file.close()
	1153	+
	1154	+	return found_links
	1155	+
	1156	+
	1157	+	def run_method(method_name_and_argument):
	1158	+	method_name = method_name_and_argument.split(':')[0]
	1159	+	argument = method_name_and_argument.split(':')[1]
	1160	+	ret = []
	1161	+	try:
	1162	+	ret = globals()[method_name](argument)
	1163	+	except ConnectionError:
	1164	+	print("Error: unable to connect")
	1165	+	except OSError:
	1166	+	print("Error: unable to connect")
	1167	+	except ProtocolError:
	1168	+	print("Error: unable to connect")
	1169	+	return ret
	1170	+
	1171	+
	1172	+	def scrape():
	1173	+	global filename
	1174	+
	1175	+	start_time = datetime.now()
	1176	+
	1177	+	# Building the filename
	1178	+	filename = str(filename).replace("$DATE", start_time.strftime("%Y%m%d%H%M%S"))
	1179	+	search = str(args.search).replace(" ", "")
	1180	+	if len(search) > 10:
	1181	+	search = search[0:9]
	1182	+	filename = str(filename).replace("$SEARCH", search)
	1183	+
	1184	+	func_args = []
	1185	+	stats_dict = {}
	1186	+	if args.engines and len(args.engines) > 0:
	1187	+	eng = args.engines[0]
	1188	+	for e in eng:
	1189	+	try:
	1190	+	if not (args.exclude and len(args.exclude) > 0 and e in args.exclude[0]):
	1191	+	func_args.append("{}:{}".format(e, args.search))
	1192	+	stats_dict[e] = 0
	1193	+	except KeyError:
	1194	+	print("Error: search engine {} not in the list of supported engines".format(e))
	1195	+	else:
	1196	+	for e in supported_engines.keys():
	1197	+	if not (args.exclude and len(args.exclude) > 0 and e in args.exclude[0]):
	1198	+	func_args.append("{}:{}".format(e, args.search))
	1199	+	stats_dict[e] = 0
	1200	+
	1201	+	# Doing multiprocessing
	1202	+	units = min((cpu_count() - 1), len(func_args))
	1203	+	if args.mp_units and args.mp_units > 0:
	1204	+	units = min(args.mp_units, len(func_args))
	1205	+	print("search.py started with {} processing units...".format(units))
	1206	+	freeze_support()
	1207	+
	1208	+	results = {}
	1209	+	with Pool(units, initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),)) as p:
	1210	+	results_map = p.map(run_method, func_args)
	1211	+	results = reduce(lambda a, b: a + b if b is not None else a, results_map)
76	1212
77		-	result['onionland'] = []
78		-	for n in tqdm(range(1,400),desc="OnionLand"):
79		-	onionland = "http://3bbaaaccczcbdddz.onion/search?q={}&page={}".format(args.search,n)
80		-	#print(urlTorch)
81		-	req = requests.get(onionland,proxies=proxies)
82		-	if(req.status_code==200):
83		-	soup = BeautifulSoup(req.text, 'html.parser')
84		-	for i in soup.findAll('div',attrs={"class":"result-block"}):
85		-	if('''<span class="label-ad">Ad</span>''' not in i):
86		-	#print({"name":i.find('div',attrs={'class':"title"}).get_text(),"link":clear(i.find('div',attrs={'class':"link"}).get_text())})
87		-	result['onionland'].append({"name":i.find('div',attrs={'class':"title"}).get_text(),"link":clear(i.find('div',attrs={'class':"link"}).get_text())})
88		-	else:
89		-	break
	1213	+	stop_time = datetime.now()
	1214	+
	1215	+	if not args.continuous_write:
	1216	+	with open(filename, 'w', newline='') as csv_file:
	1217	+	csv_writer = csv.writer(csv_file, delimiter=field_delim, quoting=csv.QUOTE_ALL)
	1218	+	for r in results:
	1219	+	write_to_csv(csv_writer, r)
	1220	+
	1221	+	total = 0
	1222	+	print("\nReport:")
	1223	+	print(" Execution time: %s seconds" % (stop_time - start_time))
	1224	+	print(" Results per engine:")
	1225	+	for r in results:
	1226	+	stats_dict[r['engine']] += 1
	1227	+	for s in stats_dict:
	1228	+	n = stats_dict[s]
	1229	+	print(" {}: {}".format(s, str(n)))
	1230	+	total += n
	1231	+	print(" Total: {} links written to {}".format(str(total), filename))
90	1232
91		-	print("Ahmia : " + str(len(result['ahmia'])))
92		-	print("Torch : "+str(len(result['urlTorch'])))
93		-	print("Darksearch io : "+str(len(result['darksearch'])))
94		-	print("Onionland : "+str(len(result['onionland'])))
95		-	print("Total of {} links !\nExported to {}".format(str(len(result['ahmia'])+len(result['urlTorch'])+len(result['darksearch'])+len(result['onionland'])),args.output))
96		-	f= open(args.output,"w+")
97		-	for i in result['urlTorch']:
98		-	f.write("name : {} link: {}\n".format(clearn(i["name"]),i["link"]))
99		-	for i in result['onionland']:
100		-	f.write("name: {} link : {}\n".format(clearn(i["name"]),i["link"]))
101		-	for i in result['ahmia']:
102		-	f.write("name : {} link : {}\n".format(clearn(i["name"]),i["link"]))
103		-	for i in result['darksearch']:
104		-	f.write("name : {} link : {}\n".format(clearn(i["name"]),i["link"]))
105	1233
106		-	f.close()
107		-	scrape()
	1234	+	if __name__ == "__main__":
	1235	+	scrape()
108	1236

Merge pull request #1 from Gobarigo/master