Launch HN: Captain (YC W26) – Automated RAG for Files(runcaptain.com)
38 points by CMLewis 5 hours ago | 14 comments
- vg_head 4 hours agoGood looking! I didn't get to watch the video or look at docs in depth, but do the results trace back to the location of the answers in a document? Let's say it finds an answer in a PDF, and I'd like to know where in that PDF the citation is. Is that possible or intended?[-]
- CMLewis 3 hours agoGreat question, we have deterministic page # citations for PDF results and exact bounding box citations coming very soon.
If you want to check out the Query API response example, here's a link: https://docs.runcaptain.com/api-reference/query/collection-v...
- jzig 2 hours agoThis is an interesting product, thanks for sharing. Can you elaborate on some of your competitors in this landscape and what you might do differently compared to each one?
- mchusma 3 hours agoHaving tried this a bit I do really like the single api call for all of it.
I also appreciate transparent pricing but I am not 100% sure the sense of scale of costs. It could be helpful to give some ballparks on things for each of the plans. I'm not sure exactly what i could get out of a plan. My guess, trying hard to figure it out, was if i had about 1,000 pages of new/updated content per month, I would pay $295/month for unlimited queries on top of it. Is that roughly correct?
[-]- edgarbabajanyan 3 hours agoYes, we don't charge for queries. For $295, you're able to index up to 1000 pages of new content per month into a fully queryable pipeline.
Advanced and Basic do play a difference though. Advanced is for complex graphics or charts in the documents submitted. Basic is sufficient for most document workloads.
- jamiequint 4 hours agoThis is cool, like qmd as a service with real-time integrations where it matters?
How do you handle more structured data like csv/xlsx/json? Would be cool if it were possible to auto-process links to markdown (e.g. youtube, podcast, arbitrary websites, etc) a la https://github.com/steipete/summarize (which can pull full text in addition to summarizing).
[-]- CMLewis 4 hours agoThanks, we're just starting to optimize more for the semi-structured data. So far, we've been parsing tables into Markdown and running them through the contextualized embedding model with no overlap, taking advantage of how it strings together chunks. This isn't great for big files so we're exploring agentic exploration (slow but good for more structured numerical data) and automated graph creation (promising for more relational data).
Love the auto-process markdown idea, we'll add it to our roadmap :D
- cleansy 2 hours agoJust some unfiltered feedback after checking out the website: from what I understand this is an SaaS only? So basically I’m asked to upload ALL company docs to a company that existed for basically a minute with some questionable SOC2 report. Soc2 is basically dead as a security artefact and the data asked to upload is sensitive by nature. I don’t see that working.[-]
- piker 2 hours ago> Soc2 is basically dead as a security artefact
can you expand on that?
- BoorishBears 2 hours agoAre you writing the integrations listed there, or is are you using something that manages the data connections?[-]
- edgarbabajanyan 1 hour agoWe've built these integrations ourselves.
For larger enterprises that require governance and additional compliance, we've been relying on trusted partners to help establish a connection to Captain
- jzig 4 hours ago> spotty RAG
:O
- maxperience 2 hours agoInteresting to see still solutions being developed for RAG. We developed a solution similar to yours: Automatic indexing from GDrive, SharePoint etc. and then advanced hierarchical chunking, context header based markdown conversion etc... All the tricks that were published last year while RAG was still the "new" kid in town. We finally open sourced everything as the competition from the big players (Notion AI, Google etc.) was daunting. If anyone is interested, this blog post about all the techniques we tried and what actually works is still relevant and up2date: https://bytevagabond.com/post/how-to-build-enterprise-ai-rag...[-]
- macmac_mac 12 minutes agoThank you so much for this, started reading it a few min ago and already learnt quite a lot!
I like how clean and compressed the info is
- BrianFHearn 2 hours ago[flagged]