# ReflexiCoder
**Repository Path**: faruba/ReflexiCoder
## Basic Information
- **Project Name**: ReflexiCoder
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-30
- **Last Updated**: 2026-04-30
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning
## Installation
> [!CAUTION]
> This project requires **CUDA 12.4**. If you encounter segmentation faults, please verify your CUDA toolchain via `nvcc --version`.
### 1) Set up the TRL environment
```bash
conda create -n reflexicoder python=3.11
conda activate reflexicoder
pip install --upgrade pip
pip install vllm==0.8.5.post1
pip install setuptools
pip install flash-attn --no-build-isolation
pip install tensorboard
GIT_LFS_SKIP_SMUDGE=1 pip install -e ".[dev]"
pip install selenium==4.15.2
pip install pillow==10.3.0
```
This installation will also install **PyTorch v2.6.0**. This version is **required**, as the provided vLLM binaries are built against it.
Authenticate to Hugging Face and Weights & Biases (optional but recommended):
```bash
huggingface-cli login # Required for pushing datasets/models to the HF Hub
wandb login # Enables experiment tracking during training
sudo apt-get install git-lfs
git-lfs --version
```
### 2) Install the Firejail sandbox
Firejail is an open-source Linux sandbox that isolates processes via namespaces and seccomp, reducing security risk when executing untrusted code.
```bash
git clone https://github.com/netblue30/firejail.git
cd firejail
chmod +x configure
./configure
find . -name "*.sh" -exec chmod +x {} \;
make
sudo make install
```
## Data Preparation
For dataset download and preprocessing, please follow the **Data** section in the [DeepCoder](https://github.com/agentica-project/rllm/tree/deepcoder) guideline.
To avoid redundant preprocessing, we provide the preprocessed `parquet` files under `./data`, which can be used directly for training.
## Training
```bash
GIT_LFS_SKIP_SMUDGE=1 pip install -e ".[dev]"
export TOKENIZERS_PARALLELISM=false
export TIMESTAMP=$(date +"%m-%d-%y-%T")
export CONFIG_GRPO="configs/reflexicoder/config_grpo.yaml"
export MODEL_NAME_OR_PATH="/path_to_your_model/Qwen3-8B"
export DATASET_NAME="./data"
export OUTPUT_DIR="./output/$TIMESTAMP"
export ROLLOUT_FILE="$OUTPUT_DIR"
export LOG_FILE="$OUTPUT_DIR/training.log"
mkdir -p $OUTPUT_DIR
ACCELERATE_LOG_LEVEL=info \
accelerate launch --config_file configs/accelerate_configs/zero2.yaml \
src/open_r1/grpo.py --config $CONFIG_GRPO \
--model_name_or_path $MODEL_NAME_OR_PATH \
--dataset_name $DATASET_NAME \
--output_dir $OUTPUT_DIR \
--vllm_mode colocate 2>&1 | tee $LOG_FILE
```
## Evaluation
We evaluate all baselines and RL-trained models on **HumanEval**, **HumanEval+**, **MBPP**, **MBPP+**, **LiveCodeBench_v5**, and **CodeForces** using the **EvalChemy** framework to ensure consistent evaluation.
For the full evaluation pipeline, please refer to the official [EvalChemy](https://github.com/mlfoundations/evalchemy) and its README.
## Performance & Token Efficiency
## Citation
If you use the data or code in this repo, please consider citing the following paper.
```bibtex
@article{jiang2026reflexicoder,
title={ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning},
author={Jiang, Juyong and Shen, Jiasi and Kim, Sunghun and Yoo, Kang Min and Kim, Jeonghoon and Kim, Sungju},
journal={arXiv preprint arXiv:2603.05863},
year={2026}
}
```