How to store crawled data from Scrapy to FTP as csv?

Question

How to store crawled data from Scrapy to FTP as csv?

259 Views Asked by Viren Ramani At 28 April 2021 at 12:46

My scrapy settings.py

from datetime import datetime
file_name = datetime.today().strftime('%Y-%m-%d_%H%M_')
save_name = file_name + 'Mobile_Nshopping'
FEED_URI = 'ftp://myusername:[email protected]/uploads/%(save_name)s.csv'

when I'm running my spider scrapy crawl my_project_name getting error... Can I have to create a pipeline?

\scrapy\extensions\feedexport.py:247: ScrapyDeprecationWarning: The `FEED_URI` and `FEED_FORMAT` settings have been deprecated in favor of the `FEEDS` setting. Please see the `FEEDS` setting docs for more details
 exporter = cls(crawler)
Traceback (most recent call last):
 File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 194, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code
   exec(code, run_globals)
 File "C:\Users\viren\AppData\Local\Programs\Python\Python38\Scripts\scrapy.exe\__main__.py", line 7, in <module>
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 145, in execute
   _run_print_help(parser, _run_command, cmd, args, opts)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
   func(*a, **kw)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 153, in _run_command
   cmd.run(args, opts)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\commands\crawl.py", line 22, in run
   crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 191, in crawl
   crawler = self.create_crawler(crawler_or_spidercls)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 224, in create_crawler
   return self._create_crawler(crawler_or_spidercls)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 229, in _create_crawler
   return Crawler(spidercls, self.settings)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 72, in __init__
   self.extensions = ExtensionManager.from_crawler(self)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
   return cls.from_settings(crawler.settings, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
   mw = create_instance(mwcls, settings, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\utils\misc.py", line 167, in create_instance
   instance = objcls.from_crawler(crawler, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 247, in from_crawler
   exporter = cls(crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 282, in __init__
   if not self._storage_supported(uri, feed_options):
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 427, in _storage_supported
   self._get_storage(uri, feed_options)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 458, in _get_storage
   instance = build_instance(feedcls.from_crawler, crawler)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 455, in build_instance
   return build_storage(builder, uri, feed_options=feed_options, preargs=preargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
   return builder(*preargs, uri, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 201, in from_crawler
   return build_storage(
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
   return builder(*preargs, uri, *args, **kwargs)
 File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 192, in __init__
   self.port = int(u.port or '21')
 File "c:\users\viren\appdata\local\programs\python\python38\lib\urllib\parse.py", line 174, in port
raise ValueError(message) from None
ValueError: Port could not be cast to integer value as 'Edh=)9sd'

I don't know how to store CSV into FTP. error is coming because my password is int? Is there anything I forget to write?

Original Q&A

There are 2 best solutions below

**zmike** · Answer 1 · 2021-04-29T05:17:59.493000

Can I have to create a pipeline?

Yes, you probably should create a pipeline. As shown in the Scrapy Architecture Diagram, the basic concept is this: requests are sent, responses come back and processed by the spider, and finally, the pipeline does something with the items returned by the spider. In your case, you could create a pipeline that saves the data in a CSV file and uploads it to an ftp server. See Scrapy's Item Pipeline documentation for more information.

I don't know how to store CSV into FTP. error is coming because my password is int? Is there anything I forget to write?

I believe this is due to the deprecation error below (and shown at the top of the errors you provided): ScrapyDeprecationWarning: The FEED_URI and FEED_FORMAT settings have been deprecated in favor of the FEEDS setting. Please see the FEEDS setting docs for more details.

Try replacing FEED_URI with FEEDS; see the Scrapy documentation on FEEDS.

**Dr Pi** · Answer 2 · 2021-04-30T08:09:01.890000

You need to specify the port as well.

You can specify this in settings.

See also class definition from scrapy docs

class FTPFilesStore:

FTP_USERNAME = None
FTP_PASSWORD = None
USE_ACTIVE_MODE = None

def __init__(self, uri):
    if not uri.startswith("ftp://"):
        raise ValueError(f"Incorrect URI scheme in {uri}, expected 'ftp'")
    u = urlparse(uri)
    self.port = u.port
    self.host = u.hostname
    self.port = int(u.port or 21)
    self.username = u.username or self.FTP_USERNAME
    self.password = u.password or self.FTP_PASSWORD
    self.basedir = u.path.rstrip('/')

How to store crawled data from Scrapy to FTP as csv?

There are 2 best solutions below

Related Questions in SCRAPY

Related Questions in FTP

Related Questions in SCRAPY-PIPELINE

Trending Questions

Popular # Hahtags

Popular Questions