Using Stylometric Features with 1D Convolutional Neural Network and Random Forest
Rather than processing raw token sequences through heavy transformer layers, NEULIF converts each text into a fixed 68-dimensional feature vector โ then feeds it into a lightweight CNN or Random Forest. This sidesteps the computational cost of sequence models while preserving rich linguistic signal.
Total trainable parameters: 2,205,185 โ orders of magnitude fewer than BERT (110M) or RoBERTa (125M). Input is a 68-dim stylometric feature vector, not raw token sequences.
| Layer | Type | Output Shape | Parameters | Details |
|---|---|---|---|---|
| Input | Input | (None, 68, 1) | 0 | 68-dim linguistic feature vector |
| Conv1D | Conv | (None, 66, 128) | 512 | 128 filters, kernel=3, ReLU |
| BatchNorm | Norm | (None, 66, 128) | 512 | Stabilizes training convergence |
| Flatten | Reshape | (None, 8448) | 0 | Converts to 1D for dense layers |
| Dense 1 | Dense | (None, 256) | 2,162,944 | ReLU ยท Dropout 0.4 |
| Dense 2 | Dense | (None, 128) | 32,896 | ReLU ยท Dropout 0.3 |
| Dense 3 | Dense | (None, 64) | 8,256 | ReLU ยท Dropout 0.2 |
| Output | Sigmoid | (None, 1) | 65 | P(AI-generated) โ [0, 1] |
NEULIF matches or exceeds heavyweight transformer ensembles โ at a fraction of the model size and compute cost. Evaluated on the Kaggle AI-vs-Human corpus (1,997 held-out test samples).
| Method | Accuracy | F1 | ROC-AUC | Model Size | Hardware |
|---|---|---|---|---|---|
| NEULIF CNN Ours | 97% | 0.95 | 99.5% | ~25 MB Lightweight | CPU |
| NEULIF RF Ours | 95% | 0.94 | 95.0% | ~10.6 MB Lightweight | CPU |
| BERT-base Transformer | ~95% | ~0.93 | โ | ~440 MB | GPU |
| RoBERTa Transformer | ~93% | ~0.92 | โ | ~480 MB | GPU |
| Ghostbuster Ensemble | ~91% | ~0.90 | โ | Large | GPU |
| Stylometry RF (Opara 2024) | ~98% | โ | โ | Small | CPU |
Transformer baselines sourced from Antoun et al. 2023, Guo et al. 2024, Kuznetsov et al. 2024. Direct cross-dataset comparison requires caution.
If you use NEULIF in your research, please cite: