Following XPath select div element with class ajaxcourseindentfix and split it from Prerequisite and gives me all the content after prerequisite.
div = soup.select("div.ajaxcourseindentfix")[0]
" ".join([word for word in div.stripped_strings]).split("Prerequisite: ")[-1]
My div can have not only prerequisite but also the following splitting points:
Prerequisites
Corerequisite
Corerequisites
Now, whenever I have Prerequisite, above XPath works fine but whenever anything from above three comes, the XPath fails and gives me the whole text.
Is there a way to put multiple delimiters in XPath? Or how do I solve it?
Sample pages:
Corequisite URL: http://catalog.fullerton.edu/ajax/preview_course.php?catoid=16&coid=96106&show
Prerequisite URL: http://catalog.fullerton.edu/ajax/preview_course.php?catoid=16&coid=96564&show
Both: http://catalog.fullerton.edu/ajax/preview_course.php?catoid=16&coid=98590&show
[Old Thread] - How to get text which has no HTML tag
This code is the solution to your problem unless you need XPath specifically, I would also suggest that you review BeautifulSoup documentation on the methods I've used, you can find that HERE
.next_element
and.next_sibling
can be very useful in these cases. or.next_elements
we'll get a generator that we'll have either to convert or use it in a manner that we can manipulate a generator.Solves both issues, we don't have to use CSS selectors and those weird list manipulations. Everything is organic and works well.