
Datasets:
The dataset viewer is not available for this dataset.
Error code: ConfigNamesError Exception: FileNotFoundError Message: Couldn't find a dataset script at /src/services/worker/wendlerc/RenderedText/RenderedText.py or any data file in the same directory. Couldn't find 'wendlerc/RenderedText' on the Hugging Face Hub either: FileNotFoundError: No (supported) data files or dataset script found in wendlerc/RenderedText. Traceback: Traceback (most recent call last): File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 55, in compute_config_names_response for config in sorted(get_dataset_config_names(path=dataset, token=hf_token)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 351, in get_dataset_config_names dataset_module = dataset_module_factory( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1508, in dataset_module_factory raise FileNotFoundError( FileNotFoundError: Couldn't find a dataset script at /src/services/worker/wendlerc/RenderedText/RenderedText.py or any data file in the same directory. Couldn't find 'wendlerc/RenderedText' on the Hugging Face Hub either: FileNotFoundError: No (supported) data files or dataset script found in wendlerc/RenderedText.
Need help to make the dataset viewer work? Open a discussion for direct support.
This dataset has been created by Stability AI and LAION.
This dataset contains 12 million 1024x1024 images of handwritten text written on a digital 3D sheet of paper generated using Blender geometry nodes and rendered using Blender Cycles. The text has varying font size, color, and rotation, and the paper was rendered under random lighting conditions. Note that, the first 10 million examples are in the root folder of this dataset repository and the remaining 2 million are in ./remaining (due to the constraint on number of files per directory).
It was generated with the script https://github.com/GbotHQ/ocr-dataset-rendering/, which utilizes:
- ~8000 fonts from https://www.urbanfonts.com/free-fonts.htm and https://www.fontspace.com/
- 643 CC0 HDRIs from https://polyhaven.com/
- 1837 CC0 PRB materials from https://ambientcg.com/
- random sentences sampled from https://huggingface.co/datasets/ChristophSchuhmann/wikipedia-en-nov22-1-sentence-level and https://huggingface.co/datasets/ChristophSchuhmann/1-sentence-level-gutenberg-en_arxiv_pubmed_soda to generate example images as shown below.
The dataset contains both line-level, as well as character level annotations for each example. The annotations are stored in the accompanying json files and are of the following form:
{
'ocr_annotation':
{'bounding_boxes': [[[145.0, 370.0], [788.0, 353.0], [827.0, 633.0], [182.0, 669.0]]],
'text': ['Joe.'],
'bb_relative': [[[0.1416015625, 0.361328125], [0.76953125, 0.3447265625], [0.8076171875, 0.6181640625], [0.177734375, 0.6533203125]]],
'char': ['J', 'o', 'e', '.'],
'char_idx': [0, 1, 2, 3],
'bb_character_level': [[[145.0, 370.0], [346.0, 365.0], [382.0, 651.0], [181.0, 662.0]], [[375.0, 438.0], [557.0, 431.0], [585.0, 640.0], [402.0, 650.0]], [[578.0, 440.0], [744.0, 434.0], [771.0, 629.0], [604.0, 638.0]], [[778.0, 591.0], [821.0, 589.0], [827.0, 633.0], [784.0, 635.0]]],
'font_path': '/fsx/home-wendlerc/blender-dataset/assets/fonts/fontcollection/HelloScribbles-axapm.ttf',
'font_color': [17, 25, 231],
'text_rotation_angle': 7},
'width':1024,
'height':1024,
}
Browse a few more examples here: https://colab.research.google.com/drive/1o0rZhtY9aeurzNrAbu6nJypULSIIcf1v?authuser=1
- Downloads last month
- 4
Models trained or fine-tuned on wendlerc/RenderedText
