Oct 09, 2025

Public workspaceCALLAF — Creation, Digitization, Vectorization, Indexing, and CGAN Validation

  • Siraj Allaf1
  • 1King Abdulaziz University
Icon indicating open access to content
QR code linking to this content
Protocol CitationSiraj Allaf 2025. CALLAF — Creation, Digitization, Vectorization, Indexing, and CGAN Validation. protocols.io https://dx.doi.org/10.17504/protocols.io.n2bvjeqbngk5/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: October 05, 2025
Last Modified: October 09, 2025
Protocol Integer ID: 229039
Keywords: authentic arabic calligraphy artworks for inclusion, authentic arabic calligraphy artwork, callaf dataset, digitization, cgan validation
Abstract
To describe the complete, reproducible workflow used to digitize, preprocess, and vectorize authentic Arabic calligraphy artworks for inclusion in the CALLAF dataset.
Materials
Calligraphic material
  • Artists: 13 professional calligraphers (styles: Dewani, Naskh, Reqaa)
  • Tools: traditional Arabic calligraphy pens (reed/qalam), traditional ink
  • Paper: high-quality glossy paper (A4)

Hardware
  • Scanner: high-resolution flatbed, 300 DPI
  • Workstation: MacBook Pro (M4 Pro)

Software
  • Vectorization: Adobe Illustrator, Vector Magic, Potrace, scikit-image
  • ML / CV: PyTorch, OpenCV
  • SVG tooling: svgpathtools
Troubleshooting
Procedure
Select and write items
Isolated characters and word forms representing the majority of Arabic calligraphic styles are selected and formatted on A4 sheets. Each calligrapher is tasked with writing these isolated characters and words in Dewani, Naskh, and Reqaa types, ensuring precise composition and consistent baselines.
Scan & clean Scan at 300 DPI, crop borders, fix orientation, clean artifacts, convert to transparent grayscale PNG. Naming: 
type_category_resolution_index
 (e.g., 
dew_char_1024_00001.png

Normalize & multi-resolution Normalize canvases to 1024×1024 px. Downscale to 256, 128, 64, 32 px.

Vectorization From each 1024 px master, generate four SVGs using Adobe Illustrator, Vector Magic, Potrace, scikit-image. Normalize viewBox/units so SVGs overlay rasters exactly.

Directory layout Organize as:
raster/all/
/ # 32 | 64 | 128 | 256 | 1024
*.png # named: type_category_resolution_index.png
raster/split/
/ # 32 | 64 | 128 | 256 | 1024
/ # dew | nas | req
/ # char | word
/ # e.g., baa, meem, ain_alef, rahma
*.png # named: type_category_resolution_index.png
vector/all/1024/
/ # illustrator | vector magic | potrace | skimage
*.svg # named: type_category_resolution_index.svg
, and 
csv/index_*
 files.

CSV indices Create CSVs listing filename, type, category, resolution, content, modality (and tool for vectors).

Files: 
index_raster_all.csv
index_raster_split.csv
index_vector_all.csv

Manual Quality Control
Visually inspect random batches across all types and modalities. Verify labels, PNG-SVG alignment, and absence of artifacts. Cross-check CSV coverage and paths
CGAN validation
Train Conditional WGAN-GP (with spectral norm) on 64×64 grayscale, conditioned on type.
Report: SSIM_mean = 0.616, LPIPS_mean = 0.123, LPIPS_p95 = 0.282 
Acknowledgements
Repository & License
All data and scripts are publicly available under CC0 1.0 Universal at CALLAF Dataset Repository (DOI: TBD).