I'm new to Kotlin. I am creating an application in which, after the user selects/picks a PDF file, he will see fragments of the extracted text. Unfortunately each time I read text from the PDF file it is unreadable. i.e.:
%PDF-1.4
%����
1 0 obj
<</Title (MojPdf)
/Producer (Skia/PDF m123 Google Docs Renderer)>>
endobj
3 0 obj
<</ca 1
/BM /Normal>>
endobj
5 0 obj
<</Filter /FlateDecode
/Length 326>> stream
x��SQN�0��)r�evllj��`l���&!����D�n*sBk�6ʋ_^�m�P��'s������~������x��arػ����5\{�H���v��Ac{+�K�c����[n*������+���J�w�d��*1e߽�??�[߽9�!BV�\�Db�䀂:���n��!�\�ϋ��R(',�)���� Z�V=P�KB.4و��Q3F�:b}9�Ιe�!wCa@�Z��4��tDV�B�??%J,�M��??P,*z.��+�����qm5e�����ej��F5��d��l9��m�@�_u�Q�v#����[�}V�(;
and so on...
I have already tried in various ways, adding different charsets UTF-8 and others, using reader or bufferreader... I'm using this method to get the text from PDF:
val result = remember { mutableStateOf<Uri?>(null) }
var stringResult = remember {
""
}
var stringDienst = remember {
""
}
val applicationContext = LocalContext.current
val contentResolver = applicationContext.contentResolver
@Throws(IOException::class)
fun readTextFromUri(uri: Uri): String {
val stringBuilder = StringBuilder()
contentResolver.openInputStream(uri)?.use { inputStream ->
BufferedReader(InputStreamReader(inputStream, "UTF-8")).use { reader ->
var line: String? = reader.readText()
while (line != null) {
stringBuilder.append(line)
line = reader.readLine()
}
}
}
Log.d(TAG, "stringBuilder: $stringBuilder")
return stringBuilder.toString()
}
val launcher = rememberLauncherForActivityResult(ActivityResultContracts.OpenDocument()) {
result.value = it
if (it != null) {
stringResult = readTextFromUri(it)
}
}
Column {
Row {
Button(onClick = {
launcher.launch(arrayOf("application/pdf"))
}) {
Text(text = "Select Document")
}
}
Row {
Text(text = "stringDienst: $stringDienst")
}
}
After selecting the file and running the method, the text is completely unreadable. Thanks for any help.
Pdf is not a plain-text format.
If you want to parse it, can find the specification here: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf
Or, preferably, you could import one of many Java/Kotlin PDF libraries and use that to read it.