I have used gensim.utils.simple_preprocess(str(sentence) to create a dictionary of words that I want to use for topic modelling. However, this is also filtering important numbers (house resolutions, bill no, etc) that I really need. How did I overcome this? Possibly by replacing digits with their word form. How do i go about it, though?
How do i retain numbers while preprocessing data using gensim in python?
653 Views Asked by piñatabreaker At
1
There are 1 best solutions below
Related Questions in NLP
- Is it possible to use ES5 JavaScript with Angular 2 instead of TypeScript?
- Module '"angular2/angular2"' has no exported member 'For'
- import syntax in typescript creating another js file in visual studio
- Separate ts file for imports
- How to use an AngularJS 2 component multiple times in the same page?
- injectables not working in angular 2.0 latest build 26
- Does angular2 bootstrap have a way to dynamically target elements like it does in angular 1.x
- Import {} from location is not found in VS Code using TypeScript and Angular 2
- Angular 2/Typescript: require not found
- ng-switch in Angular2
Related Questions in GENSIM
- Is it possible to use ES5 JavaScript with Angular 2 instead of TypeScript?
- Module '"angular2/angular2"' has no exported member 'For'
- import syntax in typescript creating another js file in visual studio
- Separate ts file for imports
- How to use an AngularJS 2 component multiple times in the same page?
- injectables not working in angular 2.0 latest build 26
- Does angular2 bootstrap have a way to dynamically target elements like it does in angular 1.x
- Import {} from location is not found in VS Code using TypeScript and Angular 2
- Angular 2/Typescript: require not found
- ng-switch in Angular2
Related Questions in PREPROCESSOR
- Is it possible to use ES5 JavaScript with Angular 2 instead of TypeScript?
- Module '"angular2/angular2"' has no exported member 'For'
- import syntax in typescript creating another js file in visual studio
- Separate ts file for imports
- How to use an AngularJS 2 component multiple times in the same page?
- injectables not working in angular 2.0 latest build 26
- Does angular2 bootstrap have a way to dynamically target elements like it does in angular 1.x
- Import {} from location is not found in VS Code using TypeScript and Angular 2
- Angular 2/Typescript: require not found
- ng-switch in Angular2
Related Questions in LDA
- Is it possible to use ES5 JavaScript with Angular 2 instead of TypeScript?
- Module '"angular2/angular2"' has no exported member 'For'
- import syntax in typescript creating another js file in visual studio
- Separate ts file for imports
- How to use an AngularJS 2 component multiple times in the same page?
- injectables not working in angular 2.0 latest build 26
- Does angular2 bootstrap have a way to dynamically target elements like it does in angular 1.x
- Import {} from location is not found in VS Code using TypeScript and Angular 2
- Angular 2/Typescript: require not found
- ng-switch in Angular2
Related Questions in LATENT-SEMANTIC-ANALYSIS
- Is it possible to use ES5 JavaScript with Angular 2 instead of TypeScript?
- Module '"angular2/angular2"' has no exported member 'For'
- import syntax in typescript creating another js file in visual studio
- Separate ts file for imports
- How to use an AngularJS 2 component multiple times in the same page?
- injectables not working in angular 2.0 latest build 26
- Does angular2 bootstrap have a way to dynamically target elements like it does in angular 1.x
- Import {} from location is not found in VS Code using TypeScript and Angular 2
- Angular 2/Typescript: require not found
- ng-switch in Angular2
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
You don't have to use
simple_preprocess()
- it's not doing much, it's not that configurable or sophisticated, and typically the other Gensim algorithms just need lists-of-tokens.So, choose your own tokenization - which in some cases, depnding on your source data, could be as simple as a
.split()
on whitespace.If you want to look at what
simple_preprocess()
does, as a model, you can view its Python source at:https://github.com/RaRe-Technologies/gensim/blob/351456b4f7d597e5a4522e71acedf785b2128ca1/gensim/utils.py#L288