Using Ruby-on-Rails, Sphinx or UltraSphinx and an HTML source (not a database)

644 Views Asked by At

The documentation states for sphinx-0.9.9-rc2:

The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on.

However, I can't find any documentation on setting up a a source besides SQL. The config file doesn't seem to indicate that the source can be anything but a database. Anyone have any helpful links for setting up sphinx with an HTML source?

1

There are 1 best solutions below

1
On

Are you looking for the xmlpipe (now called xmlpipe2) feature on Sphinx? I've tried it out for XML files and it works just like it does for SQL.

I haven't tried out Sphinx with vanilla HTML files, so I'm guessing you'll need to parse your HTML file and create XML files with the attributes/fields that you want indexed and feed them to Sphinx using xmlpipe.

You can see here and here for more.

HTH