How to store crawled data from Scrapy to FTP as csv?

240 Views Asked by At

My scrapy settings.py

from datetime import datetime
file_name = datetime.today().strftime('%Y-%m-%d_%H%M_')
save_name = file_name + 'Mobile_Nshopping'
FEED_URI = 'ftp://myusername:[email protected]/uploads/%(save_name)s.csv'

when I'm running my spider scrapy crawl my_project_name getting error... Can I have to create a pipeline?

\scrapy\extensions\feedexport.py:247: ScrapyDeprecationWarning: The `FEED_URI` and `FEED_FORMAT` settings have been deprecated in favor of the `FEEDS` setting. Please see the `FEEDS` setting docs for more details
 exporter = cls(crawler)
Traceback (most recent call last):
 File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 194, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code
   exec(code, run_globals)
 File "C:\Users\viren\AppData\Local\Programs\Python\Python38\Scripts\scrapy.exe\__main__.py", line 7, in <module>
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 145, in execute
   _run_print_help(parser, _run_command, cmd, args, opts)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
   func(*a, **kw)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 153, in _run_command
   cmd.run(args, opts)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\commands\crawl.py", line 22, in run
   crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 191, in crawl
   crawler = self.create_crawler(crawler_or_spidercls)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 224, in create_crawler
   return self._create_crawler(crawler_or_spidercls)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 229, in _create_crawler
   return Crawler(spidercls, self.settings)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 72, in __init__
   self.extensions = ExtensionManager.from_crawler(self)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
   return cls.from_settings(crawler.settings, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
   mw = create_instance(mwcls, settings, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\utils\misc.py", line 167, in create_instance
   instance = objcls.from_crawler(crawler, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 247, in from_crawler
   exporter = cls(crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 282, in __init__
   if not self._storage_supported(uri, feed_options):
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 427, in _storage_supported
   self._get_storage(uri, feed_options)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 458, in _get_storage
   instance = build_instance(feedcls.from_crawler, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 455, in build_instance
   return build_storage(builder, uri, feed_options=feed_options, preargs=preargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
   return builder(*preargs, uri, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 201, in from_crawler
   return build_storage(
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
   return builder(*preargs, uri, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 192, in __init__
   self.port = int(u.port or '21')
 File "c:\users\viren\appdata\local\programs\python\python38\lib\urllib\parse.py", line 174, in port
raise ValueError(message) from None
ValueError: Port could not be cast to integer value as 'Edh=)9sd'

I don't know how to store CSV into FTP. error is coming because my password is int? Is there anything I forget to write?

2

There are 2 best solutions below

0
On

Can I have to create a pipeline?

Yes, you probably should create a pipeline. As shown in the Scrapy Architecture Diagram, the basic concept is this: requests are sent, responses come back and processed by the spider, and finally, the pipeline does something with the items returned by the spider. In your case, you could create a pipeline that saves the data in a CSV file and uploads it to an ftp server. See Scrapy's Item Pipeline documentation for more information.

I don't know how to store CSV into FTP. error is coming because my password is int? Is there anything I forget to write?

I believe this is due to the deprecation error below (and shown at the top of the errors you provided): ScrapyDeprecationWarning: The FEED_URI and FEED_FORMAT settings have been deprecated in favor of the FEEDS setting. Please see the FEEDS setting docs for more details.

Try replacing FEED_URI with FEEDS; see the Scrapy documentation on FEEDS.

0
On

You need to specify the port as well.

You can specify this in settings.

See also class definition from scrapy docs

class FTPFilesStore:

FTP_USERNAME = None
FTP_PASSWORD = None
USE_ACTIVE_MODE = None

def __init__(self, uri):
    if not uri.startswith("ftp://"):
        raise ValueError(f"Incorrect URI scheme in {uri}, expected 'ftp'")
    u = urlparse(uri)
    self.port = u.port
    self.host = u.hostname
    self.port = int(u.port or 21)
    self.username = u.username or self.FTP_USERNAME
    self.password = u.password or self.FTP_PASSWORD
    self.basedir = u.path.rstrip('/')