Repopack
I’ve tried several things to keep my Claude projects up to date as I develop software with LLM’s. I used to keep a finder window open, with all the relevant files selected. I’d just remove and add back all the files on a cadence.
More recently, I started using Warp to repopack my files (tee hee). I’d give it a prompt like command that packs all my markdown (.md) files together in a docs.md file. And it would give me back something like find . -type f -name "*.md" -exec cat {} + > docs.md.
And all was right with the world. I’d move that file to the Claude project, life was easier.
Repopack is designed to solve this problem, and does it well. It keeps my repository synchronized with Claude Projects.
Introduction
For giggles, I used repopack to repopack repopack, put it in a claude project. Then, I asked claude to
Explain to me this application and what it's for
Repopack is a tool designed to pack an entire code repository into a single, AI-friendly file format. Its primary purpose is to facilitate easier analysis and processing of codebases by AI systems like large language models (LLMs), including tools such as Claude, ChatGPT, and Gemini.
Key features of Repopack include:
- AI Optimization: It formats the codebase in a way that’s easy for AI to understand and process.
- Token Counting: Provides token counts for each file and the entire repository, which is useful for managing LLM context limits.
- Customization: Users can configure what to include or exclude from the packed output.
- Git-Aware: Automatically respects .gitignore files.
- Security Focus: Incorporates Secretlint for security checks to detect and prevent the inclusion of sensitive information.
- Multiple Output Formats: Supports plain text, XML, and Markdown output styles.
- Remote Repository Processing: Can process remote Git repositories without manual cloning.
The tool is particularly useful for developers and teams who want to:
- Perform code reviews using AI tools
- Generate documentation for their projects
- Create test cases
- Assess code quality
- Get a high-level overview of a library or project
Repopack can be used via command line interface (CLI) and offers various options for customization through a configuration file (repopack.config.json). It’s designed to be simple to use while offering powerful features for more advanced use cases.
The packed output file typically includes:
- A summary of the repository
- The repository structure
- The content of all included files
- Custom instructions (if provided)
By consolidating an entire codebase into a single, well-structured file, Repopack makes it easier for developers to leverage AI tools for various software development tasks, potentially improving productivity and code quality.
XML output
I immediately saw the value in this repository and started using to develop serious software with LLM’s because it outputs LLM-friendly XML output.
Anecdotally, I can see why this improves the quality of the context to the LLM.
Anthropic even recommends this approach.
The output not only includes your code, it also includes the following metadata.
Please note some of this metadata is intended to inform the LLM, which I think is super awesome.
So, I asked the model to:
Analyze the code, and this article and explain to me how and why the repository exports the files with xml. Also talk about the metadata that's included in that xml file (the file tree, etc)
Which it did a fine job of:
- The File Summary (<file_summary>) tag explains the purpose of the packed file
- Repository Structure (<repository_structure>) tag is a tree-like representation of the directory structure that includes all directories and files, even those not included in the packed content (e.g., binary files)
- Repository Files (<repository_files>) has each file’s content is wrapped in a<file>tag with apathattribute
- Custom Instructions (<instruction>) tag includes custom instructions about the project or how to interpret the code
- Generation Information includes a header with the generation date and tool information
- A user-Provided Header is included in the <user_provided_header>tag within the<additional_info>section
Repopack uses XML because:
- XML provides a hierarchical structure that clearly separates different parts of the repository content. This structure makes it easier for AI models to parse and understand the content.
- XML tags help AI models like Claude parse prompts more accurately. This leads to higher-quality outputs and reduces the chances of misinterpretation.
- XML tags make it easy to find, add, remove, or modify specific parts of the content without rewriting everything. This is beneficial for both humans and AI when working with the packed repository.
- The XML structure makes it easier to extract specific parts of the content through post-processing, which can be useful for various AI-assisted tasks.
- The use of XML aligns with recommended practices for interacting with AI models, as outlined in the article. This makes Repopack’s output more effective when used with AI tools.
Metadata included in the XML file:
This structured approach, combining XML tags with comprehensive metadata, makes Repopack’s output highly suitable for AI consumption. It provides clear context, structure, and content that AI models can easily parse and utilize for various code analysis and processing tasks.
Customization
I will commonly slice and dice the context I feed to Claude in my projects.
So, I need ways to only send certain files to the model, depending on what I’m working on.
Back in the stone age, I’d just drag and drop those files into my project.
Then, a little later, I’d use warp to say things like Command that puts all my markdown files (.md) into a single md file.
That’s amazing, the world is a better place.
With repopack, I can now define my repopack settings in a configuration file, that runs every time I just call repopack.
It also supports a --config argument, so I can define multiple configurations for different use cases.
I can have a config that:
- Stuffs my docs into a file
- One for the code
- One for each sub project I’m working on
- One for a specific type of feature I commonly develop
Check out all the configuration it supports:
Output Configuration
- output.filePath: Set the name and path of the output file.
- output.style: Choose the output style (‘plain’, ‘xml’, or ‘markdown’).
- output.headerText: Add custom text to the file header.
- output.instructionFilePath: Specify a file with custom instructions.
- output.removeComments: Toggle comment removal in supported file types.
- output.removeEmptyLines: Toggle empty line removal.
- output.showLineNumbers: Add line numbers to each line in the output.
- output.topFilesLength: Set the number of top files to display in the summary.
Include/Ignore Patterns
- include: Specify patterns of files to include.
- ignore.useGitignore: Toggle using patterns from .gitignore.
- ignore.useDefaultPatterns: Toggle using default ignore patterns.
- ignore.customPatterns: Add custom ignore patterns.
Security Settings
- security.enableSecurityCheck: Toggle the security check feature.
Rich CLI
If it’s not worth throwing the config together, and you just want to rip a context for the LLM that you’ll probably use again, repopack has a rich CLI that supports many of the configs.
- -o, --output <file>specifies the name of the output file.
- --include <patterns>lists include patterns (comma-separated) to select specific files or directories.
- -i, --ignore <patterns>specifies additional ignore patterns (comma-separated) to exclude files or directories.
- -c, --config <path>sets the path to a custom config file.
- --top-files-len <number>specifies the number of top files to display in the summary.
- --output-show-line-numbersadds line numbers to each line in the output.
- --style <type>specifies the output style (plain, xml, or markdown).
So you just go repopack --include "ToDo.Domain/**/*" --output todo-domain.txt to get the one sub-project.
Conclusion
I love using LLM’s to generate code, and I’m always looking for new ways to bump my productivity and quality.
I’ve only been using this product for 1 day, and it’s already leveled up my workflow.
Check it out and star it if you like it.
There’s a man behind the legend 😉.