After watching a youtube video reacting to another developer's experience re-creating Gunnar Morling's Java coding challenge in Golang, where you process one billion rows of simple formated data and output the names of the weather station along with its min, max, and average temperatures in alphabetical order to STDOUT
. The data will be read from a file and on each row the data is formatted as follows <Name of Observation point>;<[-99.9, ..., 99.9]>
where there are no more than 10,000 unique locations.
I am also looking to use this as an introductory project to start learning the finer points of Mojo after my few years writing python professionally. Some of the topics of interest are SIMD, concurrency, Mojo's data ownership model, and how mojo will interop with CPython.
- Tooling to help automate interation and validation
- Generate test file
- Timing
- Python
- Mojo
- Profiling
- Python
- Mojo
- Validation
- Python
- Mojo
- Logging performance across commits
- Initial naive python implementation
- Iterate, Profile, and Validate.
Below is a list of what I expect will help decrease the total runtime of the script
- Converting to Mojo datastructures
- generators
- Interactions with the file
- Data typing and ownership
- Concurrency
- Removing un-needed validation
- efficiently writing to
STDOUT
Short Commit Id | Row Count | Timestamp | Average Run Time | Runs | Note |
---|
example link to commit
00.0 sec
Relevant goal reached or implementation made
curl https://pyenv.run | bash
- Follow instructions supplied in STDOUT to add pyenv to $PATH
- Follow this link for instructions to install all build requirements for your machine
pyenv install 3.12.2