Context index generation for meilisearch

162 Views Asked by ViktorasCNC At 27 July 2025 at 16:25

I've been using all sorts of hacks to generate file indexes out of SMB shares. And it's all cool with basic filepath plus metadata indexing.

The next step I want to implement is an algorithm combining some unix-like utilities and php, to index specific context from within files.

Now the first step in this context generation is something like this

while read p; do egrep -rH '^;|\(|^\(|\)$' "$p"; done <textual.txt > text_context_search.txt

This is specific regexing for my purpose for indexing contents of programs, this extracts lines that are whole comments or contains comments out of CNC program files.

resulting output is something like

file_path:regex_hit

now obviously most programs has more than one comment, so theres too much redundancy not only in repetition, but an exhaustive context index is about a gigabyte in size

I am now working towards script that would compact redudancy in such pattern

file_path_1:regex_hit_1
file_path_1:regex_hit_2
file_path_1:regex_hit_3
...

would become:

file_path_1:regex_hit1,regex_hit_2,regex_hit3

and if I succeed to do this in efficient manner its all ok.

The problem here is whether I'm doing this in a proper way. Maybe I should be using different tools to generate such context index in the first place ?

EDIT

After further copying and pasting from stack overflow and thinking about it I glued up solution using not my code, that nearly entirely solves my previously mentioned issue.

    <?php
//    https://stackoverflow.com/questions/26238299/merging-csv-lines-where-column-value-is-the-same



$rows = array_map('str_getcsv', file('text_context_search2.1.txt'));
//echo '<pre>';
print_r($csv);
//echo '</pre>';
// Array for output
$concatenated = array();

// Key to organize over
$sortKey = '0';

// Key to concatenate
$concatenateKey = '1';

// Separator string
$separator = ' ';

foreach($rows as $row) {

    // Guard against invalid rows
    if (!isset($row[$sortKey]) || !isset($row[$concatenateKey])) {
        continue;
    }

    // Current identifier
    $identifier = $row[$sortKey];

    if (!isset($concatenated[$identifier])) {
        // If no matching row has been found yet, create a new item in the
        // concatenated output array
        $concatenated[$identifier] = $row;
    } else {
        // An array has already been set, append the concatenate value
        $concatenated[$identifier][$concatenateKey] .= $separator . $row[$concatenateKey];
    }
}

// Do something useful with the output
//var_dump($concatenated);

//echo json_encode($concatenated)."\n";


$fp = fopen('exemplar.csv', 'w');

foreach ($concatenated as $fields) {
    fputcsv($fp, $fields);
}

fclose($fp);

Original Q&A

Context index generation for meilisearch

There are 0 best solutions below

Related Questions in PHP

Related Questions in SHELL

Related Questions in CSV

Related Questions in INDEXING

Related Questions in MEILISEARCH

Trending Questions

Popular # Hahtags

Popular Questions