I have a data bank of PDF files that I've downloaded through webscraping. I can extract the tables from these PDF files and visualise them in jupyter notebook like this:
import os
import camelot.io as camelot
n = 1
arr = os.listdir('D:\Test') # arr ist die Liste der PDF-Titel
for item in arr:
tables = camelot.read_pdf(item, pages='all', split_text=True)
print(f'''DATENBLATT {n}: {item}
''')
n += 1
for tabs in tables:
print(tabs.df, "\n==============================================================================\n")
in this way I get the results for two PDF files in the data bank as follows.
Now I would like to ask how I can get only the specific data from tables that contain for example "Voltage" and "Current" info. More specifically I would like to extract user-defined or targeted info and make charts with this values instead of printing them as whole.
Thanks in advance.
DATENBLATT 1: HY-Energy-Plus-Peak-Pack-HYP-00-2972-R2.pdf
0 1
0 Part Number HYP-00-2972
1 Voltage Nominal 51.8V
2 Voltage Range Min/Max 43.4V/58.1V
3 Charge Current 160A maximum \nDe-rated by BMS message over CA...
4 Discharge Current 300A maximum \nDe-rated by BMS message over CA...
5 Maximum Capacity 5.76kWh/111.4Ah
6 Maximum Energy Density 164Wh/kg
7 Useable capacity Limited to 90% by BMS to improve cell life
8 Dimensions W: 243 x L: 352 x H: 300.5mm
9 Weight 37kg
10 Mounting Fixtures 4x M8 mounting points for easy secure mounting
11
==============================================================================
0 \
0 Communication Protocol
1 Reported Information
2 Pack Protection Mechanism
3 Balancing Method
4 Multi-Pack Behaviour
5 Compatible Chargers as standard
6 Charger Control
7 Auxiliary Connectors
8 Power connectors
9
1
0 CAN bus at user selectable baud rate (propriet...
1 Cell Temperatures and Voltages, Pack Current, ...
2 Interlock to control external protection devic...
3 Actively controlled dissipative balancing
4 BMS implements a single master and multi-slave...
5 Zivan, Victron, Delta-Q, TC-Charger, SPE. For ...
6 Direct current control based on cell voltage/t...
7 Binder 720-Series 8-way male & female
8 4x Amphenol SurLok Plus 8mm \nWhen using batte...
9
==============================================================================
0 \
0 Max no of packs in series
1 Max Number of Parallel Packs
2 External System Requirements
3
1
0 10
1 127
2 External Protection Device (e.g. Contactor) co...
3
==============================================================================
DATENBLATT 2: HY-Energy-Standard-Pack-HYP-00-2889-R2.pdf
0 1
0 Part Number HYP-00-2889
1 Voltage Nominal 44.4V
2 Voltage Range Min/Max 37.2V/49.8V
3 Charge Current 132A maximum \nDe-rated by BMS message over CA...
4 Discharge Current 132A maximum \nDe-rated by BMS message over CA...
5 Maximum Capacity 4.94kWh/111Ah
6 Maximum Energy Density 152Wh/kg
7 Useable capacity Limited to 90% by BMS to improve cell life
8 Dimensions W: 243 x L: 352 x H: 265mm
9 Weight 32kg
10 Mounting Fixtures 4x M8 mounting points for easy secure mounting
11
==============================================================================
0 \
0 Communication Protocol
1 Reported Information
2 Pack Protection Mechanism
3 Balancing Method
4 Multi-Pack Behaviour
5 Compatible Chargers as standard
6 Charger Control
7 Auxiliary Connectors
8 Power connectors
9
1
0 CAN bus at user selectable baud rate (propriet...
1 Cell Temperatures and Voltages, Pack Current, ...
2 Interlock to control external protection devic...
3 Actively controlled dissipative balancing
4 BMS implements a single master and multi-slave...
5 Zivan, Delta-Q, TC-Charger, SPE, Victron, Bass...
6 Direct current control based on cell voltage/t...
7 Binder 720-Series 8-way male & female
8 4x Amphenol SurLok Plus 8mm \nWhen using batte...
9
==============================================================================
0 \
0 Max no of packs in series
1 Max Number of Parallel Packs
2 External System Requirements
3
1
0 12
1 127
2 External Protection Device (e.g. Contactor) co...
3
==============================================================================
You can define a list of the strings of interest;
then select only the tables which contain at least one of these strings.
If you want to search for interesting strings only in specific places (for example, in the first column), you can use Pandas dataframes properties, such as
iloc
: