Text and Code Extractor for LLM Context
Fuel Your LLM: Extract Text and Code from Any File
This script is a recursive text miner that simplifies the process of gathering text data from various file types within a directory, making it ready for use with Large Language Models (LLMs).
For a specified folder (directory path), the script recursively scans all files within a specified folder, regardless of their type, extracting both English prose and code snippet, using a hybrid approach:
- Heuristics: rules to identify common patterns in English text and code
- Regular Expressions: Precise pattern matching to capture specific language structures
- Syntax Highlighting: Provide a creative, implict code identification signal.
Usage
- Extract and combine English prose and Code into a single text
- Upload your text file into your preferred LLM’s context window.
- Craft prompts that require quick analysis across a large body of text.
Code
Example
Input
A single folder containing 219 Files, 21 Folders of filetypes: .js, .mjs, .js, .md, .png, .scss, .txt, .ts, .tsx totaling 2.27 MB
Output
output_text.txt returned: 534 KB. 58,067 words and 507,061 characters.
Prompt
Text prompt, having uploaded output_txt.txt as context
Provide instructions for removing %string% from my site. Use the manual/instructions uploaded. Identify the source files such that future compilations the static site generator no longer contain %string% in the footer.
LLM Response
Successful instructions generated
The footer you want to remove is generated by the
Footer
component, which is defined in the file/components/Footer.tsx
. The text content of the footer, including the copyright notice and the links, are customizable through theOptions
interface that theFooter
component constructor accepts. To remove the%string%
text and the links from the footer, you need to modify thelayout.ts
file. In this file, locate the instantiation of theFooter
component within the layout definition. You’ll likely find it within thesharedPageComponents
object. Modify theFooter
component instantiation to provide an emptylinks
object to its constructor. This will effectively remove the “GitHub” and “Discord Community” links from the footer. Additionally, you can customize the text content of the footer by providing a modified string to thei18n(cfg.locale).components.footer.createdWith
property within theFooter
component. If you want to remove the text completely, you can set it to an empty string. TheFooter
component uses internationalization (i18n) to display the text “Created with”. If you have translations set up for different locales, make sure to update the corresponding translation files to remove or modify this text as well. If you upgrade,.layout.ts
may get overwritten, and you may need to reapply these changes. Consider keeping a backup of your modified layout file or using a version control system like Git to track your changes.