| skipped 2 lines |
3 | 3 | | [![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/) |
4 | 4 | | |
5 | 5 | | OnionSearch is a Python3 script that scrapes urls on different ".onion" search engines. |
| 6 | + | |
6 | 7 | | In 30 minutes you get thousands of unique urls. |
7 | 8 | | |
8 | 9 | | ## 💡 Prerequisite |
| skipped 37 lines |
46 | 47 | | |
47 | 48 | | ``` |
48 | 49 | | usage: search.py [-h] [--proxy PROXY] [--output OUTPUT] [--limit LIMIT] |
49 | | - | [--barmode BARMODE] [--engines [ENGINES [ENGINES ...]]] |
50 | | - | [--exclude [EXCLUDE [EXCLUDE ...]]] |
51 | | - | search |
| 50 | + | [--barmode BARMODE] [--engines [ENGINES [ENGINES ...]]] |
| 51 | + | [--exclude [EXCLUDE [EXCLUDE ...]]] |
| 52 | + | search |
52 | 53 | | |
53 | 54 | | positional arguments: |
54 | 55 | | search The search string or phrase |
| skipped 1 lines |
56 | 57 | | optional arguments: |
57 | 58 | | -h, --help show this help message and exit |
58 | 59 | | --proxy PROXY Set Tor proxy (default: 127.0.0.1:9050) |
59 | | - | --output OUTPUT Output File (default: output.txt) |
| 60 | + | --output OUTPUT Output File (default: output_$SEARCH_$DATE.txt), where |
| 61 | + | $SEARCH is replaced by the first chars of the search |
| 62 | + | string and $DATE is replaced by the datetime |
60 | 63 | | --limit LIMIT Set a max number of pages per engine to load |
61 | 64 | | --barmode BARMODE Can be 'fixed' (default) or 'unknown' |
62 | 65 | | --engines [ENGINES [ENGINES ...]] |
63 | 66 | | Engines to request (default: full list) |
64 | 67 | | --exclude [EXCLUDE [EXCLUDE ...]] |
65 | 68 | | Engines to exclude (default: none) |
| 69 | + | |
| 70 | + | [...] |
66 | 71 | | ``` |
67 | 72 | | |
68 | 73 | | ### Examples |
69 | 74 | | |
70 | | - | To request the string "computer" on all the engines to default file: |
| 75 | + | To request all the engines for the word "computer": |
71 | 76 | | ``` |
72 | 77 | | python3 search.py "computer" |
73 | 78 | | ``` |
74 | 79 | | |
75 | | - | To request all the engines but "Ahmia" and "Candle": |
| 80 | + | To request all the engines excepted "Ahmia" and "Candle" for the word "computer": |
76 | 81 | | ``` |
77 | | - | python3 search.py "computer" --proxy 127.0.0.1:1337 --exclude ahmia candle |
| 82 | + | python3 search.py "computer" --exclude ahmia candle |
78 | 83 | | ``` |
79 | 84 | | |
80 | | - | To request only "Tor66", "DeepLink" and "Phobos": |
| 85 | + | To request only "Tor66", "DeepLink" and "Phobos" for the word "computer": |
81 | 86 | | ``` |
82 | | - | python3 search.py "computer" --proxy 127.0.0.1:1337 --engines tor66 deeplink phobos |
| 87 | + | python3 search.py "computer" --engines tor66 deeplink phobos |
83 | 88 | | ``` |
84 | 89 | | |
85 | | - | The same but limiting the number of page per engine to load to 3: |
| 90 | + | The same as previously but limiting to 3 the number of pages to load per engine: |
86 | 91 | | ``` |
87 | | - | python3 search.py "computer" --proxy 127.0.0.1:1337 --engines tor66 deeplink phobos --limit 3 |
| 92 | + | python3 search.py "computer" --engines tor66 deeplink phobos --limit 3 |
88 | 93 | | ``` |
89 | 94 | | |
90 | 95 | | Please kindly note that the list of supported engines (and their keys) is given in the script help (-h). |
91 | 96 | | |
| 97 | + | |
92 | 98 | | ### Output |
93 | 99 | | |
94 | 100 | | The file written at the end of the process will be a csv containing the following columns: |
| skipped 1 lines |
96 | 102 | | "engine","name of the link","url" |
97 | 103 | | ``` |
98 | 104 | | |
99 | | - | The name and url strings are sanitized as much as possible, but there might still be some problems. |
| 105 | + | The filename will be set by default to `output_$DATE_$SEARCH.txt`, where $DATE represents the current datetime and $SEARCH the first |
| 106 | + | characters of the search string. |
| 107 | + | |
| 108 | + | You can modify this filename by using `--output` when running the script, for instance: |
| 109 | + | ``` |
| 110 | + | python3 search.py "computer" --output "\$DATE.csv" |
| 111 | + | python3 search.py "computer" --output output.txt |
| 112 | + | python3 search.py "computer" --output "\$DATE_\$SEARCH.csv" |
| 113 | + | ... |
| 114 | + | ``` |
| 115 | + | (Note that it might be necessary to escape the dollar character.) |
| 116 | + | |
| 117 | + | In the csv file produced, the name and url strings are sanitized as much as possible, but there might still be some problems. |
100 | 118 | | |
101 | 119 | | |
102 | 120 | | ## 📝 License |
| skipped 4 lines |