android

HowWeParseBankSMSOn-DeviceWithoutSendingYourDataAnywhere

Most finance apps send your transactions to a server for processing. We built a regex-based SMS parser that runs entirely on your phone — here's exactly how it works.

bytereyGitHub →
Published
Reading
4 min

Most expense tracker apps have a dirty secret: when they "auto-detect" your bank transactions, they're sending your SMS messages to their servers first.

We refused to do that with Artha. Every transaction SMS is parsed locally, on your device, with no network call. Here's the engineering behind it.

The Problem With Cloud SMS Parsing

Bank SMS messages look like this:

PHP 1,240.00 debited from a/c XX1234 on 19-Apr-26. Txn ID: UPI/Swiggy/123456

To extract the amount, merchant, and direction (debit vs credit) you need text parsing logic. The shortcut most apps take is shipping the raw SMS to a backend NLP service. Convenient — but your raw financial SMS is now on someone else's server.

We took a different route.

A Regex-Based Parser That Covers 95% of Cases

Philippine bank SMS messages follow predictable patterns per bank. We built a SmsParser class with a rule set per institution:

data class ParsedTransaction(
    val amount: Double,
    val isCredit: Boolean,
    val merchant: String?,
    val reference: String?,
)

class SmsParser {
    private val rules = listOf(
        BdoRule(),
        BpiRule(),
        UnionBankRule(),
        MetrobankRule(),
        GenericRule(),
    )

    fun parse(sender: String, body: String): ParsedTransaction? =
        rules.firstOrNull { it.matches(sender) }?.parse(body)
}

Each Rule is a small, testable class:

class BdoRule : SmsRule {
    override fun matches(sender: String) =
        sender.contains("BDO", ignoreCase = true)

    private val debitPattern = Regex(
        """PHP\s?([\d,]+\.?\d*)\s+debited""",
        RegexOption.IGNORE_CASE
    )
    private val creditPattern = Regex(
        """PHP\s?([\d,]+\.?\d*)\s+credited""",
        RegexOption.IGNORE_CASE
    )

    override fun parse(body: String): ParsedTransaction? {
        val debit = debitPattern.find(body)
        val credit = creditPattern.find(body)
        val match = debit ?: credit ?: return null
        val amount = match.groupValues[1].replace(",", "").toDoubleOrNull() ?: return null
        return ParsedTransaction(
            amount = amount,
            isCredit = credit != null,
            merchant = extractMerchant(body),
            reference = extractReference(body),
        )
    }
}

Merchant Extraction Is the Hard Part

Amounts are easy — they're always numeric. Merchants are not standardised at all. We use a combination of:

  1. Known merchant alias map — a local Map<String, String> of ~400 merchant identifiers to display names, compiled from user-reported SMS samples
  2. Heuristic extraction — if the alias map misses, we fall back to extracting the token after known keywords (at, merchant:, the UPI handle segment)
  3. Graceful fallback — if we still can't identify the merchant, we surface "Unknown merchant" and let the user label it once — that label is stored locally and reused next time
fun extractMerchant(body: String): String? {
    val upiMerchant = Regex("""UPI/([^/]+)/""").find(body)?.groupValues?.get(1)
    if (upiMerchant != null) return MERCHANT_ALIASES[upiMerchant] ?: upiMerchant

    val atMerchant = Regex("""at\s+([A-Z][A-Za-z\s]+)""").find(body)?.groupValues?.get(1)
    return atMerchant?.trim()
}

Why Not Use ML?

We evaluated on-device ML (TFLite text classification) for merchant categorisation. The result: 87% accuracy, 4 MB model weight, 120ms cold inference time.

The regex + alias map approach: 94% accuracy on our test corpus, 0 MB extra, 2ms parse time.

For a well-scoped problem like parsing structured bank SMS, rules beat models. We can also fix a misparse with a one-line alias update rather than retraining a model and shipping an update.

Testing the Parser

Every bank rule has a test corpus of real (anonymised) SMS samples:

@Test
fun `BDO debit SMS extracts correct amount and merchant`() {
    val result = BdoRule().parse(
        "PHP 1,240.00 debited from a/c XX1234 on 19-Apr-26. Txn ID: UPI/Swiggy/123456"
    )
    assertNotNull(result)
    assertEquals(1240.0, result!!.amount)
    assertFalse(result.isCredit)
    assertEquals("Swiggy", result.merchant)
}

The parser is entirely pure Kotlin — no Android dependencies — so the full test suite runs in milliseconds on the JVM without an emulator.

The Privacy Guarantee

Because parsing is synchronous and in-process:

  • No network permission is needed for this feature
  • No SMS content ever leaves the process boundary
  • The SmsReceiver BroadcastReceiver reads, parses, and discards the raw body in one call — nothing is logged, persisted raw, or transmitted
class SmsReceiver : BroadcastReceiver() {
    override fun onReceive(context: Context, intent: Intent) {
        val messages = SmsMessage.createFromPdu(/* pdus */)
        messages.forEach { sms ->
            val parsed = SmsParser().parse(sms.originatingAddress ?: "", sms.messageBody)
            if (parsed != null) {
                TransactionRepository.get(context).stageParsed(parsed)
            }
            // sms.messageBody is not stored anywhere after this point
        }
    }
}

What We Learned

Building on-device NLP taught us one thing clearly: the best privacy architecture is one where the sensitive data never gets the chance to leave. Not encryption, not access controls, not legal agreements — just physics. Data that never moves can't be intercepted.

For Artha, that design constraint pushed us toward better engineering: smaller, faster, more testable code that also happens to be the most trustworthy option for users.

If you're building a finance app and wrestling with the same trade-off, we'd love to hear how you approached it.

About the Author

byterey

Software engineer building Artha — a privacy-first expense tracker that runs entirely on-device. These essays are the working notes behind the product.