Skip to content
This repository has been archived by the owner on Jun 8, 2019. It is now read-only.

Commit

Permalink
Add separate section for Parquet tools
Browse files Browse the repository at this point in the history
  • Loading branch information
zero323 committed Aug 3, 2016
1 parent b803e6b commit b304bf9
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion 06_data_preparation.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ model.transform(df).schema[-1].metadata
## 'num_attrs': 2}}
```

As for now PySpark doesn't support attaching metadata to a single column. It is possible though to use method similar to this one:
As for now PySpark doesn't support attaching metadata to a single column. It is possible though, to use method similar to this one:

```python
import json
Expand Down Expand Up @@ -257,6 +257,8 @@ spark.read.parquet("/tmp/foo").schema.headOption.map(_.metadata)
// Option[org.apache.spark.sql.types.Metadata] = Some({"foo":"bar"})
```

#### Accessing Metadata Directly

Metadata can be also accessed directly using Parquet tools:

```scala
Expand Down

0 comments on commit b304bf9

Please sign in to comment.