I have a program using the spreadsheet gem to create a CSV file; I have not been able to find the way to configure the functionality that I need.
This is what I would like the gem to do: The model number and additional_image
field should be "in sync", that is, each additional image written to the spreadsheet doc should be a new line and should not be wrapped.
Here are some snippets of the desired output in contrast with the current. These fields are defined by XPath objects that are screen scraped using another gem. The program won't know for sure how many objects it will encounter in the additional image field but due to business logic the number of objects in the additional image field should mirror the number of model number objects that are written to the spreadsheet.
model
168868837a
168868837a
168868837a
168868837a
168868837a
168868837a
additional_image
1688688371.jpg
1688688372.jpg
1688688373.jpg
1688688374.jpg
1688688375.jpg
1688688376.jpg
This is the current code:
require "capybara/dsl"
require "spreadsheet"
require "fileutils"
require "open-uri"
LOCAL_DIR = 'data-hold/images'
FileUtils.makedirs(LOCAL_DIR) unless File.exists?LOCAL_DIR
Capybara.run_server = false
Capybara.default_driver = :selenium
Capybara.default_selector = :xpath
Spreadsheet.client_encoding = 'UTF-8'
class Tomtop
include Capybara::DSL
def initialize
@excel = Spreadsheet::Workbook.new
@work_list = @excel.create_worksheet
@row = 0
end
def go
visit_main_link
end
def retryable(options = {}, &block)
opts = { :tries => 1, :on => Exception }.merge(options)
retry_exception, retries = opts[:on], opts[:tries]
begin
return yield
rescue retry_exception
retry if (retries -= 1) > 0
end
yield
end
def visit_main_link
retryable(:tries => 1, :on => OpenURI::HTTPError) do
visit "http://www.example.com/clothing-accessories?dir=asc&limit=72&order=position"
results = all("//h5/a[contains(@onclick, 'analyticsLog')]")
item = []
results.each do |a|
item << a[:href]
end
item.each do |link|
visit link
save_item
end
@excel.write "inventory.csv"
end
end
def save_item
data = all("//*[@id='content-wrapper']/div[2]/div/div")
data.each do |info|
@work_list[@row, 0] = info.find("//*[@id='productright']/div/div[1]/h1").text
price = info.first("//div[contains(@class, 'price font left')]")
@work_list[@row, 1] = (price.text.to_f * 1.33).round(2) if price
@work_list[@row, 2] = info.find("//*[@id='productright']/div/div[11]").text
@work_list[@row, 3] = info.find("//*[@id='tabcontent1']/div/div").text.strip
color = info.all("//dd[1]//select[contains(@name, 'options')]//*[@price='0']")
@work_list[@row, 4] = color.collect(&:text).join(', ')
size = info.all("//dd[2]//select[contains(@name, 'options')]//*[@price='0']")
@work_list[@row, 5] = size.collect(&:text).join(', ')
model = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
@work_list[@row, 6] = model.gsub!(/\D/, "")
@work_list[@row, 7] = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
additional_image = info.all("//*[@rel='lightbox[rotation]']")
@work_list[@row, 8] = additional_image.map { |link| File.basename(link['href']) }.join(', ')
images = imagelink.map { |link| link['href'] }
images.each do |image|
File.open(File.basename("#{LOCAL_DIR}/#{image}"), 'w') do |f|
f.write(open(image).read)
end
end
@row = @row + 1
end
end
end
tomtop = Tomtop.new
tomtop.go
I would like this to do two things that I'm not sure how to do:
- Each additional image should print to a new line (currently it prints all in one cell).
- I would like the model field to be duplicated exactly as many times as there are
additional_images
in the same new line manner.
Use the CSV gem. I took the long way of writing this so you can see how it works.
The csv file should look like this
for your last question
OR
NIL error in array