I am using apache manifoldcf open source project for indexing documents from Google Drive into my solr. Often I have seen it is quite inconsistent in indexing the data. Also it takes time to reflect even small number of documents in solr . Do you really think its a good option to index Google Drive using it?
Is manifold cf a good option for Google Drive indexing?
416 Views Asked by Saurabh Chaturvedi At
2
There are 2 best solutions below
0
Shashank Raj
On
Manifold CF is good for crawling through file-system. You can go for Apache Nutch if you are interested in web crawling.
Yes ManifoldCF does take a lot of time to reflect a small number of document. Also it has very less documentation. Although, you can join the mailing list where you can ask questions to the lead developer "Karl". He is very helpful and usually answers withing a few hours.
P.S. :I have worked using ManifoldCF over a project for a span of 10 months.
Related Questions in INDEXING
- Why does mysql stop using indexes when date ranges are added to the query?
- MySQL: Using natural primary index or adding surrogate when tables are given
- How does MongoDB process unsupported languages?
- Error in indicies while unsetting Sessions
- How to index a field with mongodb-erlang
- How to force use of indices in MongoDB?
- Hint indexes to mysql on Join
- Lucene get all non deleted document from index file
- Querydsl generated sql query wrong sql type (nvarchar instead of varchar)
- Numpy Indexing: Get every second coloumn for each even row
- Simpler, safer string manipulation Python
- Understanding "ValueError: need more than 1 value to unpack" w/without enumerate()
- Poor performance with mongo array index
- Is it possible to skip IndexRebuilder in the startup process of mongodb 2.6?
- Does PostgreSQL self join ignore indexes?
Related Questions in SOLR
- Developing a search and tag heavy website
- How can I integrate Solr5.1.0 with Nutch1.10
- Solr ping taking time during full import
- Indexed data is not displaying on storefront
- Heap size issue on migrating from Solr 5.0.0 to Solr 5.1.0
- Monolithic ETL to distributed/scalable solution and OLAP cube to Elasticsearch/Solr
- Exact word not boosting much Solr
- Solr stopped with Error opening new searcher at org.apache.solr.core
- Data import in solr from multiple entities
- solr reindexing issue for EdgeNgramFilter
- Heap memory Solr and Elasticsearch
- How to index documents with their metadata in a DB using Solr 5.1.0
- Isnull equivalent in SOLR
- SolrNet query not working for Scandinavian characters
- Query always the same with Sunspot/Solr on rails
Related Questions in GOOGLE-DRIVE-API
- Google Drive API VB.NET Parent Folder of a Folder
- RealTime getCollaborators() method returning only 1 Collaborators
- Directory sandboxed access for Google Drive / Dropbox API / RemoteStorage apps?
- How can I make a copy of a file in Google Drive via Python?
- Google Drive APi and Google Maps in the same application
- Google Drive API: Change Slide During The Presentation
- How to sign out of a Google Drive account?
- Automated OAuth2 token not working - Google Apps Script
- Google Drive Sync + Read-only access
- Google Drive Progress Upload/Download Status
- ng-repeat list doesn't update immediately after api call
- Insert file using Google Drive API?
- Google drive PHP API: unable to insert files or folders into subfolders
- 401 Unauthorized - Google Drive API
- Convert docx to gdoc ( OpenWithLinks = null )
Related Questions in MANIFOLDCF
- Is manifold cf a good option for Google Drive indexing?
- SessionException occurs when crawling with solrCloud
- Web crawl using manifoldcf
- Best way to crawl through file system and index
- manifold sharepoint elasticsearch
- writing Mongo DB output connector for manifoldcf
- Extracting contents using Tika transformation - Manifold CF
- ManifoldCF error when creating ElasticSearch output connector
- Do I need to configure Authorities in ManifoldCF?
- How to crawl a website that has SAML authentication using ManifoldCF or nutch?
- Searching metadata from images using Datafari
- Manifoldcf documentum crawling slowness
- Alfresco Community Edition, ManifoldCF and Elasticsearch to optimize full-text search
- Crawling Jira with Manifoldcf and Solr - String index out of range
- How to get "Document status" data through REST API with Apache ManifoldCF
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
It is currently bit on slow side, due to response time and throttling constraints from google drive itself. But this limit can probably relieved if you buy additional bandwidth from google. With current setup if you are looking to index a large set of documents in google drive it may not be quick as you may expect