diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d6628c4f..e71d164e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -11,10 +11,10 @@ Technical details on how to contribute can be found in our [documentation](https There are several ways you can contribute to Spotlight: -* Fix outstanding issues. -* Implement new features. -* Submit issues related to bugs or desired new features. -* Share your use case +- Fix outstanding issues. +- Implement new features. +- Submit issues related to bugs or desired new features. +- Share your use case If you don't know where to start, you might want to have a look at [hacktoberfest issues](https://github.com/Renumics/spotlight/issues?q=is%3Aissue+is%3Aopen+label%3Ahacktoberfest) and our guide on how to create a [new Lens](https://renumics.com/docs/development/lenses). diff --git a/README.md b/README.md index 6c68d669..320f9e13 100644 --- a/README.md +++ b/README.md @@ -17,9 +17,10 @@

-Spotlight helps you to **understand unstructured datasets** fast. You can quickly create **interactive visualizations** and leverage data enrichments (e.g. embeddings, prediction, uncertainties) to **identify critical clusters** in your data. +Spotlight helps you to **understand unstructured datasets** fast. You can quickly create **interactive visualizations** and leverage data enrichments (e.g. embeddings, prediction, uncertainties) to **identify critical clusters** in your data. Spotlight supports most unstructured data types including **images, audio, text, videos, time-series and geometric data**. You can start from your existing dataframe: +

And start Spotlight with just a few lines of code: @@ -49,7 +50,7 @@ Machine learning and engineering teams use Spotlight to understand and communica [Classification] Find Issues in Any Image Classification Dataset 👨‍đŸ’ģ 📝 🕹ī¸ - + Find data issues in the CIFAR-100 image dataset 🕹ī¸ @@ -91,7 +92,6 @@ Machine learning and engineering teams use Spotlight to understand and communica - ## ⏱ī¸ Quickstart Get started by installing Spotlight and loading your first dataset. @@ -132,12 +132,11 @@ ds = datasets.load_dataset('renumics/emodb-enriched', split='all') layout= spotlight.layouts.debug_classification(label='gender', prediction='m1_gender_prediction', embedding='m1_embedding', features=['age', 'emotion']) spotlight.show(ds, layout=layout) ``` + Here, the data types are discovered automatically from the dataset and we use a pre-defined layout for model debugging. Custom layouts can be built programmatically or via the UI. > The `datasets[audio]` package can be installed via pip. - - #### Usage Tracking We have added crash report and performance collection. We do NOT collect user data other than an anonymized Machine Id obtained by py-machineid, and only log our own actions. We do NOT collect folder names, dataset names, or row data of any kind only aggregate performance statistics like total time of a table_load, crash data, etc. Collecting Spotlight crashes will help us improve stability. To opt out of the crash report collection define an environment variable called `SPOTLIGHT_OPT_OUT` and set it to true. e.G.`export SPOTLIGHT_OPT_OUT=true` @@ -150,9 +149,9 @@ We have added crash report and performance collection. We do NOT collect user da ## Learn more about unstructured data workflows -- 🤗 [Huggingface](https://huggingface.co/renumics) example spaces and datasets -- 🏀 [Playbook](https://renumics.com/docs/playbook/) for data-centric AI workflows -- 🍰 [Sliceguard](https://github.com/Renumics/sliceguard) library for automatic slice detection +- 🤗 [Huggingface](https://huggingface.co/renumics) example spaces and datasets +- 🏀 [Playbook](https://renumics.com/docs/playbook/) for data-centric AI workflows +- 🍰 [Sliceguard](https://github.com/Renumics/sliceguard) library for automatic slice detection ## Contribute