Webscraping d'un site avec password avec SCRAPY

distrispider.py

import scrapy
from scrapy.selector import Selector
from distrifil.items import DistrifilItem
from pprint import pprint

class DistrispiderSpider(scrapy.Spider):
    name = 'distrispider'
    allowed_domains = ['distrifil.com']
    start_urls = ['https://distrifil.com/fr/login']

    

    def parse(self,response):
        data = {
            'login' : 'login_a_mofifier',
            'code_client' : 'code_a_mofifier',
            'pass': 'password_a_mofifier'
        }

        yield scrapy.FormRequest.from_response(
            response, 
            formdata=data,
            callback=self.after_login,method="POST")
        
    def after_login(self, response):
        
        yield scrapy.Request(url="https://distrifil.com/fr/catalogue", callback=self.parse_cats)
    
    def parse_cats(self, response):

        onclicks = response.xpath('//div[@class="col-xs-6 col-sm-6 col-md-5columns vignette-categorie"]/@onclick').extract() or [None][0]

        for onclick in onclicks:
                
            urls = onclick.replace('document.location.href=', '')
            new_urls = urls.replace("'", '')
            
            yield scrapy.Request(url=new_urls, callback = self.parse_sous_cats)

    def parse_sous_cats(self, response):

        url_cats = response.xpath('//a[@class="lien-categorie"]/@href').extract() or [None][0]

        for url_cat in url_cats:
            
            yield scrapy.Request(url=url_cat, callback = self.parse_product)

    def parse_product(self, response):

        sel = Selector(response)
        item = DistrifilItem()
        
        item['product_name'] = sel.xpath('//div[@class="produit_designation"]/text()').extract() or [None][0]
        item['product_reference'] = sel.xpath('//div[@class="panel-body"]/div/strong/text()').extract() or [None][0]
        item['product_price'] = sel.xpath('//div[@class="col-xs-5 prix btn btn-default masque_tarif"]/text()').extract() or [None][0]
        item['product_status'] = sel.xpath('//div[@class="not-sold-available"]/span[@class="not-sold-available-text"]/text()').extract() or [None][0]
        item['product_url'] = sel.xpath('//img/@data-remote').extract() or [None][0]

        #lightbox
        #product_urls = sel.xpath('//img/@data-remote').extract() or [None][0]

        #for product_url in product_urls:
                
            #yield scrapy.Request(url=product_url, callback = self.parse_product_infos)

        return item

Alt & Spoon

Développeur Prestashop

Posted in: Web Scrapping

Name

Subject

Security code:

Scrapy - Scrapper un site Odoo

Posted in: Web Scrapping
26/10/2022

2586 views

Scrapy - Scrapper un site Odoo
Lire la suite
Scrapy - woocomerce

Posted in: Web Scrapping
26/10/2022

2512 views

https://github.com/venkatamutyala/wordpress-plugins-crawler-scrapy/tree/master/WordPress
Lire la suite
Scrapy - liste de lien

Posted in: Web Scrapping
28/10/2022

2671 views

https://www.analyticsvidhya.com/blog/2017/07/web-scraping-in-python-using-scrapy/
Lire la suite
Web scrapping facebook

Posted in: Web Scrapping
12/11/2022

2635 views

https://github.com/opplieam/Facebook_page_crawler
Lire la suite

Blog categories

View all categories

Search in blog

Add sentry to prestashop

Posted in: Aide Prestashop

18/07/2023

4155 views

https://medium.com/@lmeyer./get-an-error-free-e-commerce-web-site-using-sentry-b6061264efc8...
Lire la suite
Prestashop – comment corriger l’erreur des transporteurs “les tranches se chevauchent” ?

Posted in: Aide Prestashop

10/01/2023

3374 views

https://www.prestasafe.com/prestashop-corriger-lerreur-transporteurs-tranches-se-chevauchent
Lire la suite
Prettyblocks le page builder open source pour PrestaShop

Posted in:

04/01/2023

3061 views

Prettyblocks le page builder open source pour PrestaShop https://www.youtube.com/watch?v=TNJpDHOx41I
Lire la suite
Prestashop: ajouter input image module

Posted in:

01/01/2023

2769 views

https://www.h-hennes.fr/blog/2019/11/18/prestashop-1-7-ajouter-un-champ-produit-de-type-file-dans-ladministration/
Lire la suite
Prestashop page produit prix hors taxe

Posted in: Aide Prestashop

01/01/2023

2670 views

{block name='product_without_taxes'} <p class="product-without-taxes">{l s='%price% tax excl.'...
Lire la suite