I'm trying to analyse data with Elasticsearch. I've started working with Elasticsearch and Nest about four months ago, so I might have missed some obvious stuff. All examples are simplified or altered, but the core is the same.
The data contains an array of nested objects, each of which also contain an array of nested objects, and again, each contains an array of nested objects. The data is obtained from an information request which contains XML messages. The messages are parsed and each element containing (multiple) text elements is saved with their element name, location, and an array with all text element names and values under the message name. I'm thinking this set-up might make analyzing the data easier.
Mapping example:
{
"data" : {
"properties" : {
"id" : { "type" : "string" },
"action" : { "type" : "string" },
"result" : { "type" : "string" },
"details" : {
"type" : "nested",
"properties" : {
"description" : { "type" : "string" },
"message" : {
"type" : "nested",
"properties" : {
"name" : { "type" : "string" },
"nodes" : {
"type" : "nested",
"properties" : {
"name" : { "type" : "string" },
"value" : { "type" : "string" }
}
},
"source" : { "type" : "string" }
}
}
}
}
}
}
}
Data example:
{
"id" : "123456789",
"action" : "GetInformation",
"result" : "Success",
"details" : [{
"description" : "Request",
"message" : [{
"name" : "Body",
"source" : "Message|Body",
"nodes" : [{
"name" : "Action",
"value" : "GetInformation"
}, {
"name" : "Identity",
"value" : "1234"
}
]
}
]
}, {
"description" : "Response",
"message" : [{
"name" : "Object",
"source" : "Message|Body|Object",
"nodes" : [{
"name" : "ID",
"value" : "123"
}, {
"name" : "Name",
"value" : "Jim"
}
]
}, {
"name" : "Information",
"source" : "Message|Body|Information",
"nodes" : [{
"name" : "Type",
"value" : "Birth City"
}, {
"name" : "City",
"value" : "Los Angeles"
}
]
}, {
"name" : "Information",
"source" : "Message|Body|Information",
"nodes" : [{
"name" : "Type",
"value" : "City of Residence"
}, {
"name" : "City",
"value" : "New York"
}
]
}
]
}
]
}
XML Example:
<Message>
<Body>
<Object>
<ID>123</ID>
<Name>Jim</Name>
</Object>
<Information>
<Type>Birth City</Type>
<City>Los Angeles</City>
<Information>
<Information>
<Type>City of Residence</Type>
<City>New York</City>
<Information>
</Body>
</Message>
I want to analyse the Name and Value properties of Nodes so I can get an overview of each city within the index that functions as a birthplace and how many people were born in them. Something like:
Dictionary<string, int> birthCities = {
{"Los Angeles", 400}, {"New York", 800},
{"Detroit", 500}, {"Michigan", 700} };
The code I have so far:
var response = client.Search<Data>(search => search
.Query(query =>
query.Match(match=> match
.OnField(data=>data.Action)
.Query("GetInformation")
)
)
.Aggregations(a1 => a1
.Nested("Messages", messages => messages
.Path(data => data.details.FirstOrDefault().Message)
.Aggregations(a2 => a2
.Terms("Sources", termSource => termSource
.Field(data => data.details.FirstOrDefault().Message.FirstOrDefault().Source)
.Aggregations(a3 => a3
.Nested("Nodes", nodes => nodes
.Path(dat => data.details.FirstOrDefault().Message.FirstOrDefault().Nodes)
.Aggregations(a4 => a4
.Terms("Names", termName => termName
.Field(data => data.details.FirstOrDefault().Message.FirstOrDefault().Nodes.FirstOrDefault().Name)
.Aggregations(a5 => a5
.Terms("Values", termValue => termValue
.Field(data => data.details.FirstOrDefault().Message.FirstOrDefault().Nodes.FirstOrDefault().Value)
)
)
)
)
)
)
)
)
)
)
);
var dict = new Dictionary<string, long>();
var sAggr = response.Aggs.Nested("Messages").Terms("Sources");
foreach (var item in sAggr.Items)
{
if (item.Key.Equals("information"))
{
var nAggr = item.Nested("Nodes").Terms("Names");
foreach (var nItem in nAggr.Items)
{
if (nItem.Key.Equals("city"))
{
var vAgg = nItem.Terms("Values");
foreach (var vItem in vAgg.Items)
{
if (!dict.ContainsKey(vItem.Key))
{
dict.Add(vItem.Key, 0);
}
dict[vItem.Key] += vItem.DocCount;
}
}
}
}
}
This code gives me every city and how many times they occur, but since they're saved with the same element name and at the same location (both of which I'm not able to change), I've found no way to distinguish between birth cities and cities of residence.
Specific types for each action are sadly not an option. So my question is: How can I count all occurrences of a city name with Birth City type, preferably without having to import and go through all documents.