Rosette API Text Embeddings Visualization Sample Code

A simple Python script for transforming a corpus of documents into text vectors suitable for visualization in .tsv format. It uses the Rosette API's /text-embedding endpoint and the BBC News Corpus. Note that the corpus is only free for research purposes.

Getting started

Clone the repo and open the files in your favorite text editor/python IDE.
Download the raw text files zip, bbc-fulltext.zip from http://mlg.ucd.ie/datasets/bbc.html and extract it into the project root folder. You should get a folder called "bbc".
Run visualize-embeddings.py via your python IDE or command line (replace ROSAPI_KEY with your Rosette API key):
```
 $ python visualize-embeddings.py --key ROSAPI_KEY
```

You'll see that the script parses the raw text files of the corpus into a list of documents. Each document consist of 3 fields:

category
headline
content

The script then creates two files:

embeddings.tsv: a TSV file where each line contains the text vector for a document's content field.
metadata.tsv: a TSV file where each line contains a document's metadata (i.e. category and headline).

To visualize the embeddings, load them into Google TensorFlow's Embedding Projector. Turn on color coding by category to really see the vectors in action. You can see our projection at this link.

Customize for your data

Try replacing the BBC News corpus with your own data. And if you find anything interesting, we'd love to hear about it! Find us at community@rosette.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Rosette API Text Embeddings Visualization Sample Code

Getting started

Customize for your data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Rosette API Text Embeddings Visualization Sample Code

Getting started

Customize for your data