How to extract Langchain Output

522 Views Asked by At

I want to ask you a question like how can i get all the contents of the lagnchain similarity search output like i need to extract page_content, metdata , source , pages and the score.

Here's the output i'm getting when directly try to print:

  [
    "Document(page_content='LEC-11: Swapping | Context-Switching | Orphan process | Zombie process \\n1. Swapping\\na. Time- sharing system may have medium term schedular (MTS ).\\nb. Remove processes from memory to reduce degree of multi -programming.\\nc. These removed processes  can be re introduce d into memory, and its e xecution can be con tinued\\nwhere it left off . This is called Swapping.\\nd. Swap- out and swap- in is done by MTS.\\ne. Swapping is necessary to improve process mix or because a change in  memory requirement s has\\novercommitted avail able memory, requiring memory to be freed u p.\\nf. Swapping is a mechanism in which a process can be swapped temporarily out of main memory (or\\nmove) to secondary storage (disk) and make that memory available to other processes. At some\\nlater time, the system swaps back the process from the secondary storage to main memory.\\n2. Context -Switching\\na. Switching the CPU to another process require s performing a state save of the current process and a\\nstate restore of a different process.\\nb. Whe\\nn this occurs, the kern el saves the context of the old process in its PCB and loads the saved\\ncontext of the new process scheduled to r un.\\nc. It is pu\\nre overhead, because the system does no useful work whi le switching.\\nd. Speed varies from machine t o machine, depending on the mem ory speed, the number of registers\\nthat must be copied etc.\\n3. Orpha\\nn proce ss\\na. The process whose parent process ha s been terminated and it is still running.\\nb. Orphan processes are adopted by init process.\\nc. Init is the first process of OS.\\n4. Zombie process / Defunct process\\na. A zombie process is a process whose execution is completed but it still has an entry in the process\\ntable.\\nb. Zombie processes usually occur for child processes, as the parent process still needs to read its\\nchild’s exit status.  Once  this is done using the wait system call, the zombie process is eliminated\\nfrom the process table. This is known as reaping  the zombie process.\\nc. It is because  parent process may  call wait  () on child process for a longer time dur ation and child\\nprocess got terminated much earlier.\\nd. As \\nentry in the process table can only be removed, after the parent process reads the exit status of\\nchild process. Hence, the child process remains a zombie till it is removed from the process table.  \\nCodeHelp', metadata={'source': 'docs\\\\dataset\\\\OS_Full_Notes.pdf', 'page': 15})",
    "0.6975052"
  ],
  [
    "Document(page_content='LEC-22: Deadlock Part-2 \\n1. Deadlock Avoidance: Idea is, the kernel be given in advance info concerning which resources will\\nuse in its lifetime.\\nBy this, system can decide for each request whether the process should wait.To decide whether the current request can be satisfied or delayed, the system must consider the\\nresources currently available, resources currently allocated to each process in the system and the\\nfuture requests and releases of each process.\\na. Schedule process and its resources allocation in such a way that the DL never occur.b. Safe state: A state is safe if the system can allocate resources to each process (up to its\\nmax.) in some order and still avoid DL.\\nA system is in safe state only if there exists a safe sequence.\\nc. In an Unsafe state, the operating system cannot prevent processes from requesting\\nresources in such a way that any deadlock occurs. It is not necessary that all unsafe states\\nare deadlocks; an unsafe state may lead to a deadlock.\\nd. The main key of the deadlock avoidance method is whenever the request is made for\\nresources then the request must only be approved only in the case if the resulting state is asafe state.\\ne. In a case, if the system is unable to fulfill the request of all processes, then the state of the\\nsystem is called unsafe.\\nf. Scheduling algorithm using which DL can be avoided by finding safe state. (Banker\\nAlgorithm)\\n2. Banker Algorithm\\na. When a process requests a set of resources, the system must determine whether allocating\\nthese resources will leave the system in a safe state. If yes, then the resources may beallocated to the process. If not, then the process must wait till other processes release\\nenough resources.\\n3. Deadlock Detection: Systems haven’t implemented deadlock-prevention or a deadlock avoidance\\ntechnique, then they may employ DL detection then, recovery technique.\\na. Single Instance of Each resource type (wait-for graph method)\\ni. A deadlock exists in the system if and only if there is a cycle in the wait-for graph.\\nIn order to detect the deadlock, the system needs to maintain the wait-for graph\\nand periodically system invokes an algorithm that searches for the cycle in the\\nwait-for graph.\\nb. Multiple instances for each resource type\\ni. Banker Algorithm\\n4. Recovery from Deadlock\\na. Process termination\\ni. Abort all DL processes\\nii. Abort one process at a time until DL cycle is eliminated.\\nb. Resource preemption\\ni. To eliminate DL, we successively preempt some resources from processes and\\ngive these resources to other processes until DL cycle is broken.\\nCodeHelp', metadata={'source': 'docs\\\\dataset\\\\OS_Full_Notes.pdf', 'page': 27})",
    "0.9011409"
  ],
  [
    "Document(page_content='LEC-21: Deadlock Part-1 \\n1. In Multi-programming environment, we have several processes competing for finite number of\\nresources\\n2. Process requests a resource (R), if R is not available (taken by other process), process enters in a\\nwaiting state. Sometimes that waiting process is never able to change its state because the resource,\\nit has requested is busy (forever), called DEADLOCK (DL)\\n3. Two or more processes are waiting on some resource’s availability, which will never be available as\\nit is also busy with some other process. The Processes are said to be in Deadlock.\\n4. DL is a bug present in the process/thread synchronization method.\\n5. In DL, processes never finish executing, and the system resources are tied up, preventing other jobs\\nfrom starting.\\n6. Example of resources: Memory space, CPU cycles, files, locks, sockets, IO devices etc.\\n7. Single resource can have multiple instances of that. E.g., CPU is a resource, and a system can have 2\\nCPUs.\\n8. How a process/thread utilize a resource?\\na. Request: Request the R, if R is free Lock it, else wait till it is available.b. Use\\nc. Release: Release resource instance and make it available for other processes\\n9. Deadlock Necessary Condition: 4 Condition should hold simultaneously.\\na. Mutual Exclusion\\ni.Only 1 process at a time can use the resource, if another process requests that  \\nresource, the requesting process must wait until the resource has been released.\\nb. Hold & Wait\\ni.A process must be holding at least one resource & waiting to acquire additional  \\nresources that are currently being held by other processes.\\nc.No-preemption\\ni.Resource must be voluntarily released by the process after completion of  \\nexecution. (No resource preemption)\\nd. Circular wait\\ni.A set {P0, P1, … ,Pn} of waiting processes must exist such that P0 is waiting for a  \\nresource held by P1, P1 is waiting for a resource held by P2, and so on.\\n10. Methods for handling Deadlocks:\\na. Use a protocol to prevent or avoid deadlocks, ensuring that the system will never enter a \\ndeadlocked state.\\nb. Allow the system to enter a deadlocked state, detect it, and recover.\\nCodeHelp', metadata={'source': 'docs\\\\dataset\\\\OS_Full_Notes.pdf', 'page': 25})",
    "0.9442713"
  ]
]```

As you can see that it is a list inside list so can you tell me how can i extract all the data.

Thanks
2

There are 2 best solutions below

0
On

I suggest storing the metadata in a separate field and then inserting the vector database. This approach is beneficial because, as mentioned in @ZKS's answer, it may prove useful in certain cases. However, it might not be sufficient for implementing a proper metadata filter functionality. If you have the time, please visit the link provided https://python.langchain.com/docs/integrations/retrievers/self_query/pinecone for more details.

0
On

You can try below code

    extracted_data = []
    #Create list of all items page_content, metadata, score
    for item in your_data:
        document_string = item[0]
        
        content_start = document_string.find("page_content='") + len("page_content='")
        content_end = document_string.find("'", content_start)
        page_content = document_string[content_start:content_end]
        
        metadata_start = document_string.find("metadata=") + len("metadata=")
        metadata_end = document_string.find("})", metadata_start) + 1 
        metadata_str = document_string[metadata_start:metadata_end]
        
        metadata = eval(metadata_str)
        
        score = item[1]
        
        extracted_data.append({
            'page_content': page_content,
            'metadata': metadata,
            'score': score
        })
    #Iterate over list
    for item in extracted_data:
        print(item['page_content'])
        print(item['metadata'])
        print(item['score'])