• dbvitapps 9 hours ago
    I've been working on a document processing API suite that solves a few problems I encountered in my career..

    - LLMs are great to provide answers on documents but not so great when you need bounding boxes for them.

    - OCRs give you bounding boxes but they don't understand context.

    - Mixing them both is a pain.

    So we are using vision models to solve this. The result is an easy to use api that gives you an answer for any question you may have but also geometry information for the evidence it found.

    We have a few more products lined up down the line but for now this is what you get:

    - Endpoints to ask a single question

    - A dashboard to define a collection of questions

    - Ai enhanced Markdown transforms (our techniques are great for these)

    More improvements will come down the line but it's looking good and I wish I had something like this before.

    Let me know what you think, any feedback is appreciated.

    Thanks!