Projects STRLCPY snscrape Commits 1e4e0c27
🤬
  • fixed issue where Telegram scraper terminated early because some pages didn't have a next page link (added reasonable default)

  • Loading...
  • Tristan Lee committed 2 years ago
    1e4e0c27
    1 parent babcddda
  • ■ ■ ■ ■ ■
    snscrape/modules/telegram.py
    skipped 213 lines
    214 214   yield from self._soup_to_items(soup, r.url)
    215 215   pageLink = soup.find('a', attrs = {'class': 'tme_messages_more', 'data-before': True})
    216 216   if not pageLink:
    217  - break
     217 + nextPostIndex = int(nextPageUrl.split('=')[-1]) - 20
     218 + if nextPostIndex > 20:
     219 + pageLink = {'href': nextPageUrl.split('=')[0] + f'={nextPostIndex}'}
     220 + else:
     221 + break
    218 222   nextPageUrl = urllib.parse.urljoin(r.url, pageLink['href'])
     223 + print(f'nextPageUrl: {nextPageUrl}')
    219 224   r = self._get(nextPageUrl, headers = self._headers, responseOkCallback = telegramResponseOkCallback)
    220 225   if r.status_code != 200:
    221 226   raise snscrape.base.ScraperException(f'Got status code {r.status_code}')
    skipped 72 lines
Please wait...
Page is in error, reload to recover