we have a requirement to extract dark data from unstructured sources such as letters, rad reports, etc. Please suggest azure resource to extract data from common document formats: DOC, DOCX, PDF, RTF, TXT, HTML, etc and then to do analysis on the extracted data.
Azure resource to handle unstructured data sources
532 Views Asked by 191180rk At
1
There are 1 best solutions below
Related Questions in AZURE
- Why does Azure Auto-Scale scale go lower then minimum amount of instances?
- Data execution plan ended with error on DB restore
- Why does Azure CloudConfigurationManager.GetSetting return null
- Do I need other roles than Worker Role for a web site and service layer in Azure?
- Azure Web App PATH Variable Modification
- Azure Data Factory: LinkedService for AzureSql in failed state
- How To Update a Web Application In Azure and Keep The App Up the whole time
- Using Azure MobileServices library with my own LAN WebApi
- ionCube loader error on Azure IIS
- App crash (if closed) after click on notification
- How to get sql data bases instances in azure using java api
- I want to create file in azure share using python PUT requests but getting error signature not correct including headers
- Enabling OPTIONS method on Azure Cloud Service (to enable CORS)
- Redirecting subdomain to directory on Azure
- Kaltura account settings error
Related Questions in AZURE-COGNITIVE-SERVICES
- Microsoft Cognitive Services - Authentication Issues, Unable to get Access Token
- Available Miicrosoft Cognitive regions
- What is default threshold for Microsoft face api?
- How to convert the .wav audio files into text and identify the intents using LUIS
- microsoft cognitive face find similar return empty list
- Microsoft Translator API Cognitive Services - What is the correct endpoint?
- How to detect relationships using Microsoft Cognitive services?
- Guaranteed way to associate speech recognition result with an utterance?
- How to "post" image URL instead of Image to Vision Cognitive API?
- Try to get person image
- Where do I find the old Bing Client ID and Bing Client Secret
- Is it possible for either Microsoft Computer Vision API or Google's Cloud Vision API to get a location for objects?
- Vision API C# - reading stored image's URL in Azure
- Microsoft-Cognitive Face API - Verify. Is there a way to avoid pictures of pictures?
- How to use Microsoft Face API to identify faces in a group pictures
Related Questions in AZURE-ANALYSIS-SERVICES
- How to create SSAS server in Multidimensional mode?
- Can't use Direct Query mode in tabular project In Analysis service
- How can I add Azure Active Directory group to role in SSDT tabular project?
- Azure Analysis Services reached maximum allowable memory allocation when creating partitions for table
- Parsing/expanding escaped array in Stream Job?
- MSOLEDBSQL on Azure Pipelines
- Getting error when connecting two Azure Analysis Services from child scripts
- Unable to connect to AAS - Resolution of actual cluster endpoint of Azure Analysis Server failed
- Load data from Databricks to Azure Analysis Services (AAS)
- Read output from InvokeAscmd in azure automation PowerShell runbook
- How to add a description to TMSL script for Azure Analysis Services
- Problem with Azure Analysis Services during migration AD FS to Entra
- Unable To Open Tabular Cube From Azure Analysis Service With Compatibility Level 1604 In Visual Studio
- Trying to refresh data model using pyadomd but getting namespace cannot appear under Envelope/Body/Execute/Command
- How to resolve row level security performance impact
Related Questions in AZURE-ANALYTICS
- How to use s skalar stored in 'let' in a where clause with '!contains' in Kusto Query Language
- Unable to configure Azure Insights on Azure VM running Windows
- Send custom complex properties to Telemetry to Azure Portal with App Insights TrackEvent in Javascript?
- Convert UTC 'TimeGenerated' to local time in Azure monitor/log/analytics, when using "summarize by"
- Azure Analysis service - Firewall off deny policy
- ADF- Define dynamic triggers
- Azure DataFactory - A database operation failed with "Invalid object name"
- finding detailed error information in azure data factory pipeline
- What permissions do I need to make Azure Storage analytics logs immutable?
- Azure resource to handle unstructured data sources
- Setup data used by Azure Log Analytics with Azure Fucntions
- Error setting up Azure Function within Azure Resource under Log Analytics Workspace
- Find the start and end time (or time span) of the kusto query is running on azure log analytics?
- How to track Azure work items that have been added into the sprint after the iteration start date?
- How do I view Windows Azure Analytics
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
It sounds like you just want to extract raw text or images from these rich text format documents. If only do these, some libraries of parsing different documents is your real needs.
Here is some libraries in Java or Python to do that. If you are using .NET which I'm not familiar with, you can search in Google or Bing to find these alternative for .NET.
Apache POIis a good library for extracting data from MS office files; for Python, there seems to be not any package to do that, except using COM object likeWord.ApplicationorIronPython(Reading/Writing MS Word files in Python) in .NET on Windows.Apache PDFBox,jPDFTextfor Java andPyPDF2for Python.javax.swing.text.rtf.RTFEditorKitwhich you can get some sample code via search; like #1, also seems none for Python.jsoupfor Java andBeautifulSoup&HTMLParserfor Python are best for extracting data from HTML.Stanford NLPfor Java andNLTKfor Python are useful, also using Azure Text Analytics API of Cognitive Service can help doing some like key phrase extraction, and language detection.Tess4Jor others you searched in GitHub.All of above are almost depended on the third party dev kits without Azure resources. However, you can store these documents in Azure Storage and process them on Azure VM or Batch services, even to analyze the extract data in Azure Jupyter Notebook or use Azure ML to do more deeper research.