I have a bson file: xyz.bson
full of useful data and I'd like to query/process the data using python. Is there a simple example/tutorial out there I can get started with?
I don't understand this one.
If you want to stream the data as though it were a flat JSON file on disk rather than loading it into a mongod, you can use this small python-bson-streaming library:
https://github.com/bauman/python-bson-streaming
from bsonstream import KeyValueBSONInput
from sys import argv
for file in argv[1:]:
f = open(file, 'rb')
stream = KeyValueBSONInput(fh=f, fast_string_prematch="somthing") #remove fast string match if not needed
for id, dict_data in stream:
if id:
...process dict_data...
You may use sonq to query .bson file directly from bash, or you can import and use the lib in Python.
A few examples:
Query a .bson file
sonq -f '{"name": "Stark"}' source.bson
Convert query results to a newline separated .json file
sonq -f '{"name": {"$ne": "Stark"}}' -o target.json source.bson
Query a .bson file in python
from sonq.operation import query_son
record_list = list(query_son('source.bson', filters={"name": {"$in": ["Stark"]}}))
You could use the
mongorestore
command to import the data into a mongoDB server and then query it by connecting to that server.