After I Crawl the URL which I used nutch2.x,I solrindex the parsing data into solr ,but I get the beow json data,I hope get the content from the below url,how to set my seed url text and regex-urlfilter.txt? ---------------response the incorrect data-------------------------------- "response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[ { "tstamp":"2023-04-22T09:54:19.129Z", "digest":"def97ee1241655c3980bba6bdde9d3ea", "boost":1.0177004, "id":"http://www.iwencai.com/unifiedwap/result?tid=stockpick&qs=box_main_ths&w=A%E8%82%A1%E4%B8%BB%E6%9D%BF%3B%28%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma10-%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%29%3E%28%E5%BD%93%E5%89%8Dma10-%E5%BD%93%E5%89%8Dma5%29%20%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5KDJ%E5%8D%B3%E5%B0%86%E9%87%91%E5%8F%89%E6%88%96%E9%87%91%E5%8F%89%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5%E6%9C%80%E9%AB%98%E4%BB%B7%3C%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%3C%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma20%20%3B%E5%BD%93%E5%89%8Dma10%3Ema5%3B%E7%8E%B0%E4%BB%B7%3E%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5%E6%94%B6%E7%9B%98%E4%BB%B7", "title":"同花顺问财", "url":"http://www.iwencai.com/unifiedwap/result?tid=stockpick&qs=box_main_ths&w=A%E8%82%A1%E4%B8%BB%E6%9D%BF%3B%28%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma10-%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%29%3E%28%E5%BD%93%E5%89%8Dma10-%E5%BD%93%E5%89%8Dma5%29%20%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5KDJ%E5%8D%B3%E5%B0%86%E9%87%91%E5%8F%89%E6%88%96%E9%87%91%E5%8F%89%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5%E6%9C%80%E9%AB%98%E4%BB%B7%3C%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%3B%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma5%3C%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5ma20%20%3B%E5%BD%93%E5%89%8Dma10%3Ema5%3B%E7%8E%B0%E4%BB%B7%3E%E4%B8%8A%E4%B8%80%E4%B8%AA%E4%BA%A4%E6%98%93%E6%97%A5%E6%94%B6%E7%9B%98%E4%BB%B7", "content":"同花顺问财\nWe're sorry but 同花顺问财选股 doesn't work properly without JavaScript enabled. Please enable it to continue..\n", "version":1763869842097569792}] }}
Nutch 2.x response content : doesn't work properly without JavaScript enabled. Please enable it to continue
28 Views Asked by xinshouke At
1
There are 1 best solutions below
Related Questions in NUTCH
- Apache Nutch - How to store crawl data under the folder with the page name/url
- Nutch 1.19 / Solr 9.4.0 How to point Nutch to the Solr instance?
- nutch error: Illegal to have multiple roots (start tag in epilog?)
- What is the correct format for a solrcloud url in Nutch's index-writers.xml config?
- How can I fix the Bad Gateway error when adding Solr as a data source to Grafana?
- Apache Nutch 1.19 Getting Error: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'
- Running apache nutch in local machine
- Nutch 1.19 Webgraph command error: OutlinkDb job did not succeed, job id: job_local306968781_0001, job status: FAILED, reason: NA
- Nutch 2.x response content : doesn't work properly without JavaScript enabled. Please enable it to continue
- Using Java & Apache Nutch to scrape dynamic elements from a website
- Building Apache Nutch Docker container
- Nutch additional fields for indexing in solr
- after fresh installation of nutch and solr crawl error
- Updating Max Depth for Apache-Nutch Crawler in scoring-depth filter is not working
- Search for solve a error 255 in SOLR Nutch
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
You can use the Selenium-based protocol plugins, In order to make Nutch crawler properly sites which do not function without JavaScript enabled. See the Readme of protocol-selenium.