ElasticSearch: How to query by multiple conditions in different locations?

324 Views Asked by At

I've been trying to build this ElasticSearch Query on the Danish CVR database API so far without success. Basically I'm trying to find companies where

  1. The company has a relationship with "deltager" (participant) with "enhedsNummer" (ID) equal NUMBER
  2. The relationship is still active, i.e. the "end of period" field is null

How do I construct a query that has multiple conditions like this?

    'query': {
        'bool': {
            'must': [
                {
                    'term': {'Vrvirksomhed.deltagerRelation.deltager.enhedsNummer': NUMBER},
        AND            
                    'term': {'Vrvirksomhed.deltagerRelation.organisationer.attributter.vaerdier.periode.gyldigTil': null}
                },
            ],
        },
    },
}

FYI: database mapping may be found at http://distribution.virk.dk/cvr-permanent/_mapping

2

There are 2 best solutions below

0
Jonatata On BEST ANSWER

It doesn't appear like ElasticSearch Queries are as dynamic as I had wanted (or I don't know how use them). Instead, it appears that the Python code below is the best choice for generating the desired outcome:

import requests
import pandas as pd

# creation of empty lists:
virksomhedsnavne = []
virksomhedscvr = []
relation_fra = []
relation_til = []

# Pulling data (apparently limited to 3000 elements at a time):
for i in range(20):

    if i == 0:
        highestcvrnummer = 0
    else:
        highestcvrnummer = max(virksomhedscvr)
    
    headers = {
        'Content-Type': 'application/json',
    }

    json_data = {

        "_source": ["Vrvirksomhed.cvrNummer", "Vrvirksomhed.navne", "Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn", "Vrvirksomhed.deltagerRelation"],

        "sort" : [{"Vrvirksomhed.cvrNummer" : {"order":"asc"}}],
      
        "query": {
            "bool": {
                "must": [
                    {
                        "term": {
                            "Vrvirksomhed.deltagerRelation.deltager.enhedsNummer": "some_value"
                        }
                    },
                    {
                        "range":{
                            "Vrvirksomhed.cvrNummer": {
                                "gt": highestcvrnummer
                            }
                        
                        }
                    }
                ]
            }
        },
        'size': 3000
    }

    response = requests.post('http://distribution.virk.dk/cvr-permanent/virksomhed/_search', headers=headers, json=json_data, auth=('USERNAME', 'PASSWORD'))
    json_data = response.json()['hits']['hits']

    # Aggregate and format data neatly
    for data in json_data:
        virksomhed_data = data['_source']['Vrvirksomhed']
        virksomhedscvr.append(virksomhed_data['cvrNummer'])
        try:
            virksomhedsnavne.append(virksomhed_data['virksomhedMetadata']['nyesteNavn']['navn'])
        except:
            virksomhedsnavne.append(virksomhed_data['navne'][0]['navn'])
        
        # Loop through all "deltagere" and find match with value
        for relation in virksomhed_data['deltagerRelation']:

            # If match found
            if relation['deltager']['enhedsNummer'] == some_value:
                
                # Make sure most recent period is chosen
                antalopdateringer = len(relation['organisationer'])-1
                relation_gyldig = relation['organisationer'][antalopdateringer]['medlemsData'][0]['attributter'][0]['vaerdier'][0]['periode']
                relation_fra.append(relation_gyldig['gyldigFra'])
                relation_til.append(relation_gyldig['gyldigTil'])
                break

#export to excel
dict = {'CVR nummer':virksomhedscvr, 'navn':virksomhedsnavne, 'Relation fra':relation_fra, 'Relation til':relation_til}
df = pd.DataFrame(dict)
df.to_excel("output.xlsx")

If anyone else is working with the Danish CVR register's API, I hope this helps!

Also, if you find a better solution, please let me know :)

1
Alex Baidan On

You can try:

GET /cvr-permanent/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "Vrvirksomhed.deltagerRelation.deltager.enhedsNummer": {
              "value": "your_value_here"
        }
      }
    }
  ],
  "must_not": [
    {
      "exists": {
        "field": "Vrvirksomhed.deltagerRelation.organisationer.attributter.vaerdier.periode.gyldigTil"
       }
     }
   ]
  }
 }
}

Trick here is to use must_not/exist for nil values.

P.S. I cannot check it because it requires authorisation.