Rosette API Text Embeddings Visualization Sample Code

A simple Python script for transforming a corpus of documents into text vectors suitable for visualization in .tsv format. It uses the Rosette API's /text-embedding endpoint and the BBC News Corpus. Note that the corpus is only free for research purposes.

Getting started

Clone the repo and open the files in your favorite text editor/python IDE.
Download the raw text files zip, bbc-fulltext.zip from http://mlg.ucd.ie/datasets/bbc.html and extract it into the project root folder. You should get a folder called "bbc".
Run visualize-embeddings.py via your python IDE or command line (replace ROSAPI_KEY with your Rosette API key):
```
 $ python visualize-embeddings.py --key ROSAPI_KEY
```

You'll see that the script parses the raw text files of the corpus into a list of documents. Each document consist of 3 fields:

category
headline
content

The script then creates two files:

embeddings.tsv: a TSV file where each line contains the text vector for a document's content field.
metadata.tsv: a TSV file where each line contains a document's metadata (i.e. category and headline).

To visualize the embeddings, load them into Google TensorFlow's Embedding Projector. Turn on color coding by category to really see the vectors in action. You can see our projection at this link.

Customize for your data

Try replacing the BBC News corpus with your own data. And if you find anything interesting, we'd love to hear about it! Find us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
visualize-embeddings.py		visualize-embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rosette API Text Embeddings Visualization Sample Code

Getting started

Customize for your data

About

Releases

Packages

Languages

rosette-api-community/visualize-embeddings

Folders and files

Latest commit

History

Repository files navigation

Rosette API Text Embeddings Visualization Sample Code

Getting started

Customize for your data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages