Releases: PhonologicalCorpusTools/CorpusTools
v1.5.1
Version 1.5.1 was released in May of 2022.
Installation notes for Mac users
- Upon installing and opening PCT for the first time, you might see a warning that Phonological CorpusTools.app can't be opened because Apple cannot check it for malicious software. If this happens, go to System Preferences > General, and click on "Open Anyway" in the lower part of the panel, and run PCT again.
- It is normal for PCT icon to dissappear for a while after it briefly shows up. Please give it ~60 seconds to launch.
Phonological Search
- A summary display option, “List target segments and environments separately in summary results” was added. By default, the summary results present each input environment as a row. The added option instead shows each target and environment separately. The individual results are identical between the two options.
Bug fixes
- Autocompleting feature or segment selection with the tab key now works correctly.
- PCT can now import TextGrid files without issues.
Full Changelog
v1.5.0
Version 1.5.0 was released in January of 2022.
Installation notes for Mac users
- Upon installing and opening PCT for the first time, you might see a warning that Phonological CorpusTools.app can't be opened because Apple cannot check it for malicious software. If this happens, go to System Preferences > General, and click on "Open Anyway" in the lower part of the panel, and run PCT again.
- It is normal for PCT icon to dissappear for a while after it briefly shows up. Please give it ~60 seconds to launch.
New features
- Implemented a transitional probability algorithm for calculating the conditional probabilities between segments, which widens the scope of use of the 'bigram selector' module.
- Entry boxes that should have numeric values have now all been set to accept only numeric entries, to avoid inadvertent crashes with non-numeric values.
Corpora
- There is now a syllabified version of the example corpus.
- Example corpora are now bundled with the executable, though can also be manually downloaded directly from https://github.com/PhonologicalCorpusTools/PCT_Fileshare.
- If a new word is added or an existing word is edited to be the same as an existing word, PCT offers options to either create separate items or merge the two, summing frequencies.
Duplicated Analyses
- In prior versions of PCT, duplicated phonological searches / analyses could result in cumulative results, e.g., reported frequencies that summed over every instance of a repeated search. This has been corrected so that users are provided a warning when a search / analysis is duplicated, and either no change is made to the output table or the same results are repeated as a new line.
Phonological Search
- Searches can be named.
- Searches can include word frequency, phoneme number, and syllable number filters.
String Similarity and Neighbourhood Density
- Fixed some bugs that were causing the algorithm to crash when lists of words were added.
- The option to calculate neighbourhood density based on spelling has been removed, in order to avoid issues with trying to calculate an 'inventory' of spelling symbols. Note that it is still possible to calculate raw string similarity based on spelling, and it is possible to force PCT to read a spelling column as transcription (when reading the corpus in to the software initially), if ND based on spelling is required.
Mutual Information
- Parameters for MI calculations have been clarified.
- Options have been added for calculating MI only within particular specified environments.
Functional Load
- Calculation algorithms have been re-factored to make them faster.
- Minimal pairs can now be defined as either only "true" minimal pairs (e.g. "mad" and "pad") or as minimal pairs through neutralization (e.g., "mama" and "papa"). (Prior versions allowed only minimal pairs through neutralization.)
Pronunciation Variants
- It has been clarified that all corpora must include canonical pronunciations. It is not possible to have pronunciation variants that are linked to the same lexical item through shared spelling.
Feature Systems
- The feature systems have been updated to be accurate. (As far as we can tell, the original released feature systems were accurate, but got corrupted at some point such that the feature values were all misaligned. We believe this error has now been fixed.)
- Feature / transcription systems are now bundled with the executable, though can also be downloaded from https://github.com/PhonologicalCorpusTools/PCT_Fileshare.
- Master Excel files of all features / transcription symbols have also been provided at https://github.com/PhonologicalCorpusTools/PCT_Fileshare for transparency and ease of personal modification.
Full Changelog
v1.5.0 pre-release
v1.5.0p update dependency information
Pre release
For screenshots
v1.4.1
This version includes a few fixes and enhancements, such as webpage links, frequency information in the corpus summary window, and inventory chart size.
If you are using a Mac OS, we suggest trying "Phonological.CorpusTools_141.dmg" first, but if you get a disk mounting error, try "Phonological.CorpusTools_141_hybrid.dmg."
PCT v 1.4.1 is confirmed to work on OS 10.13 and higher, but may have issues on earlier OS platforms.
v1.4.0 ("beta")
This release includes representation for syllable and allows for phonological search on syllable level. The algorithms for calculating functional load are also corrected.
v1.3.0
v1.2.0
Version 1.2 fixes numerous bugs, as well as providing enhancements in the following areas:
Inventory management -- The tools for categorizing segments into an inventory chart have been updated to allow users to interactively update the chart based on natural (or unnatural) classes, including the ability to add / delete / rearrange columns and rows in the chart. Uncategorized segments are more clearly shown and their features easily examined for reference.
Environments/phonological search -- Environments can be more flexibly defined, e.g., using wildcards and inserting / modifying / deleting segments or classes of segments within a linear string. Within the functional load analysis, functional load can be calculated within individual sets of environments rather than exclusively at the word level.
Small updates for usability -- Numerous small updates have been implemented to aid usability, such as improvements to the ability to select segments based on features, changes to the results window to list features instead of segments (where relevant), updates to the documentation for clarity, the addition of an option for normalizing functional load results, new Preference menu options for overwriting files, and more.
v1.1.1
This is a bugfix release for version 1.1.0.
- Fixed an issue where inventory charts were not properly generated when a feature specifying diphthongs was not present
- Fixed an issue where corpus importing was ignoring user specified corpus names
- Fixed an issue with loading TextGrid and running text corpora with feature systems
- Fixed an issue where inventory charts were sometimes not properly generated for corpora generated before 1.1.0 sometime
- Fixed an issue where feature pairs could not be selected if a segment in the inventory was unspecified or underspecified
- Added a check for unspecified segments on associating feature systems with corpora
- Added a check for columns named transcription that are not parsed as transcription
- Increased initial size of the parsing preview section when importing corpora
v1.1.0
CorpusTools 1.1.0 Release Notes
This is a major version release for Phonological CorpusTools.
The full documentation and manual is available online at http://corpustools.readthedocs.org/en/v1.1.0/ and as a PDF: http://readthedocs.org/projects/corpustools/downloads/pdf/v1.1.0/. Help buttons throughout the GUI will also display relevant information.
Importing corpora
- Importing corpora functionality in the GUI received a large overhaul
- All types of corpora are imported through a single dialog
- PCT should autodetect many settings based on selected files or directories
- Autodetected settings can be edited and refined by the user
- Basic logging support saves parsing details entered by the user (i.e.,
multicharacter segments) - Numbers in transcriptions can be parsed as stress, tone, or as a normal
character (note that tone and stress are currently not supported in functions
or phonological search)
Pronunciation variants
- All algorithms that analyze segments support four strategies for dealing with
pronunciation variants: canonical forms, most frequent variants, separated
tokens as types, and tokens weighted by their relative frequenies - Algorithms that analyze words support two strategies for pronunciation
variants: canonical forms and most frequent variants - Exporting corpora can now export pronunciation variants (and their frequencies)
Functional load
- Added support for finding the average functional load of single segments
Phonotactic probability
- Fixed an issue where calculating biphone probabilities on single segment
words would cause errors; now assigns a probability of 0 to those words
Kullback-Leibler divergence
- Added options to bring KL divergence in line with the other functions
- Added command line script for calculating KL divergence
GUI
- Added a dialog to the "View/change feature system" dialog to edit the
categorization of segments into a coherent segment chart via features - Features can be used as input to the analysis functions, i.e. functional load
of voice in the corpus (segements that are +voice compared to segments that
are -voice)
Segment selection
- Segment selection has been redone
- Segments can be selected via the inventory
- Features can be typed into the filter field, which will highlight
segments that will be included with that feature selection - Once a feature specification has been entered, that segment set can
be locked in
Environments
- Environment creation has been revamped
- Users can select a set of center segments
- Right hand and left hand can be added, with multiple sets of segments
on each side
Known issues
- Help pages for the Mac binary require internet connection to view, due
to issues including .html files in the .app binary