I am trying to scrape player information from MLS sites to create a map of where the players come from, as well as other information. I am as new to this as it gets.
So far I have used this code:
require 'HTTParty'
require 'Nokogiri'
require 'JSON'
require 'Pry'
require 'csv'
page = HTTParty.get('https://www.atlutd.com/players')
parse_page = Nokogiri::HTML(page)
players_array = []
parse_page.css('.player_list.list-reset').css('.row').css('.player_info').map do |a|
player_info = a.text
players_array.push(player_info)
end
#CSV.open('atlantaplayers.csv', 'w') do |csv|
# csv << players_array
#end
pry.start(binding)
The output of the pry function is:
"Miguel Almirón10\nMidfielder\n-\nAsunción, ParaguayAge:\n23\nHT:\n5' 9\"\nWT:\n140\n"
Which when put into the csv creates this in a single cell:
"Miguel Almirón10
Midfielder
-
Asunción, ParaguayAge:
23
HT:
5' 9""
WT:
140
"
I've looked into things and have determined that it is possible nodes (\n)? that is throwing off the formatting.
My desired outcome here is to figure out how to get the pry output into the array as follows:
Miguel, Almiron, 10, Midfielder, Asuncion, Paraguay, 23, 5'9", 140
Bonus points if you can help with the accent marks on names. Also if there is going to be an issue with height, is there a way to convert it to metric?
Thank you in advance!
Yes that's why it's showing in this odd format, you can
strip
the rendered text to remove extra spaces/lines then your text will show without the\n
sThis will only remove the
\n
if you wish to store them in a CSV in this orderMiguel, Almiron, 10, Midfielder, Asuncion, Paraguay, 23, 5'9", 140
then you might want to split by spaces and then create an array for each row so when pushing the line to the CSV file it will look like this:you can use the
transliterate
method which will remove accentsSee http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-transliterate and you might want to
require 'rails'
for this