-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge wh-questions library in its current state #502
Changes from 69 commits
8172546
9238372
1043df2
fbb8809
65b1d7e
7276b55
8fa8099
eb26d39
e8b305b
bb9cae7
4e88d77
7f3b214
70f0308
313d6bc
3d08e70
d539bc5
7d391d8
81860df
5c8233d
41ddfd4
c706822
2a97cbe
e409ec3
3a18f67
ef0a1cf
2f9f222
de79427
4480eed
480c48a
51157e0
1bc7c0d
db1acc1
3244dbb
3e12134
c5b5611
3419703
46b064b
ac090d8
572e3c6
7417c9c
3d5d631
96a1849
804cfdd
7a23532
4ee0d79
3f9fc62
58f83b6
a768134
f7b4f15
e25f423
2eeb3bd
a66152a
8ac4511
3e5cfd6
78041fa
8e9397f
8071f33
bdf9858
9f23592
eb6e4f8
7c6c294
971cb2e
f00ec10
a4b0f49
e09a65e
8385f47
d3a6dc7
cc36bdd
a478c25
3aa9a81
e3477c9
6c07b4e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,13 +11,13 @@ | |
''' | ||
|
||
|
||
|
||
### Choices for values of checkboxes, radio buttons ### | ||
|
||
ON = 'on' # Choice convention for checked box | ||
YES = 'yes' # Choice convention for selected radio button | ||
|
||
ORTH = 'orth' # Choice convention for orthographies associated with stems and affixes. | ||
|
||
### Section names ### | ||
|
||
WORD_ORDER = 'word-order' # Choices section associated with word order subpage | ||
|
@@ -33,3 +33,26 @@ | |
OV_ORDERS = ['sov', 'ovs', 'osv', 'v-final'] | ||
VFINAL = ['sov','osv','v-final'] | ||
VO_ORDERS = ['svo', 'vos', 'vso', 'v-initial'] | ||
|
||
### WH-QUESTIONS | ||
|
||
IN_SITU = 'in-situ' # No question phrase fronting | ||
EMB_INSITU = 'embed-insitu' # Embedded in situ questions (when fronting is also possible; that's rare) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we have a standard for constants names, but I feel like some of these are hard to understand, and could be improved by being replaced with the full words, except like On a more meta level, I remember you suggesting using constants, and while in general I agree, I think constant naming is a problematic issue. Constants add a (often useful) layer of abstraction that can also be a layer of confusion with confusing or misleading names. Name spacing is also hard, and while I don't think it needs to be solved perfectly in this PR, it would be nice to have a strategy, for instance defining library constants in library files (my preferred), using prefixes for library-specific constants, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think constants must be used for the sake of minimizing typos (of which we have lots and lots, and occasionally it costs hours of painful debugging). As for what they should be -- that is of course another matter. The familiar tension between line length and explicitness... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment helps, but I agree that constants (and variable names in general) should generally be more descriptive, and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I renamed the EMB-one but note that we have lots of issues in the code with long lines and inconsistent indentation in the TDL as it is. Renaming constants should in fact also involve indentation/space/linebreak cleanup, which is a separate task. (And to continue hinting at something I think about this issue generally: I do not believe there is any hope for bringing a project like the Matrix to a consistent state, ever. Why? Because in order for sucn initiatives to lead, on average, to anything apart from lengthy discussions, a military-grade authority is needed, which I don't think will ever be present in the project, nor would any of us want it to be, I don't think. You might think right now: oh but it is Olga who is arguing over this, but trust me, if we try to institute any kind of protocol, you will soon find yourself arguing over something you didn't even expect you would.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a major piece of infrastructure that is missing right now (that we should probably make an issue for since it’s not there yet) is PR hooks and pre-commit hooks. The way these things are managed at my work very well across Python, C++, PHP, JavaScript, and more, is when you submit a PR, the server tells you if you’re breaking the rules. You can break the rules if you get a reviewer to accept, but there’s lots of signals about style problems, test failures, typing problems, etc. Yes, they aren’t 100%, but I think Black’s philosophy about no-config reformatting is great because it removes a category of disagreement. There’s of course a balance to find between writing styling rules and using them, but moving towards automated tooling and commit-blocking checks is exactly the “military-grade authority” that will allow an async project like this to maintain style over time. Last note: I have seen people write linter rules to complain about invalid constant names, so it’s possible if we want to do it :-) (and there‘s probably something existing we could use) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't seem to be able to get my "military-grade authority" point across. What I mean is, it can only come from a person who has actual power over the project. Such as the project owner. This project has always emphatically exercised a very different philosophy. People will always have different opinions on any particular style issue. There is nothing inherently right or wrong in any particular style guide. Subscribing to one is either a personal decision, in which case it does not make sense to bring that discussion into a collaborative project at all because it can only lead to endless arguing (during which real bugs will be overlooked), or it is a project-wide decision, but that is only possible if one person who is actually the owner and has actual power over everyone else (as in: if you don't agree, I fire you) ultimately imposes the decision on everybody. Maybe something like this will happen in the Matrix one day but for now I don't see it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That said, I personally wouldn't mind a tool which I could optionally run to get an idea about the degree to which my code complies with some style guide. I would probably run it and would probably even follow (some of) its suggestions. I would be very against anything that actually prevents commits though, in the context of this project. (Not that I have that military-grade say, either!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I want to note that we already seem to be spending more time arguing over style issues (in a meta-way) than, say, discussing analyses or other contentful aspects of the work. I realize it may seem that I am the instigator but (i) actually, I am not :) I push back on these things but I never initiate them; and (ii) just wait till a project member comes on board who has strong views on style which are different from Mike's and T.J.'s Or even an issue surfaces on which Mike and T.J. happen to disagree. No no, I really am of the opinion that style should be kept firmly at bay and, come to think of it, not even come into CR (because it simply is too distracting). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To conclude my above thoughts, I would be for encouraging using an automated tool which highlights style locally according to some guide (PEP or whatever) and for more or less banning (refraining from discussing) style in code reviews. |
||
MTRX_FRONT = 'front-matrix' # Question phrases fronting in matrix clauses | ||
WH_QUE_PTCL = 'wh-q-part' # Question particle for constituent questions | ||
WH_INFL = 'wh-q-infl' # Inflectional paradigm for constituent questions | ||
MULTI = 'multi' # Multiple (e.g. fronting) | ||
ALL_OBLIG = 'all-oblig' # All question phrases are fronted obligatorily | ||
SG_OBLIG = 'single-oblig' # One question phrase is obligatorily fronted | ||
SINGLE = 'single' | ||
NO_MULTI = 'no-multi-ques' # No multiple questions in one clause | ||
PIED = 'pied-pip' # Pied piping | ||
PIED_ADP = 'pied-pip-adp' # Pied piping of specifically adpositions | ||
OBL_PIP_NOUN = 'oblig-pied-pip-noun' # Obligatory pied piping of nouns | ||
OBL_PIP_ADP = 'oblig-pied-pip-adp' # Obligatory pied piping of adpositions | ||
MTRX_FR_OPT = 'matrix-front-opt' # Optionality of fronting in matrix clauses | ||
|
||
|
||
### LEXICAL TYPE NAMES (SOME); see lexbase.py | ||
WH_PRO = 'wh-pronoun-noun-lex' | ||
INTER = 'inter' # Interrogative words flag |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ | |
from gmcs import tdl | ||
from gmcs.choices import ChoicesFile | ||
from gmcs.linglib import adnominal_possession | ||
from gmcs.linglib import adverbs_adpositions | ||
from gmcs.linglib import agreement_features | ||
from gmcs.linglib import argument_optionality | ||
from gmcs.linglib import case | ||
|
@@ -32,6 +33,7 @@ | |
from gmcs.linglib import valence_change | ||
from gmcs.linglib import verbal_features | ||
from gmcs.linglib import word_order | ||
from gmcs.linglib import wh_ques | ||
from gmcs.linglib import yes_no_questions | ||
from gmcs.utils import format_comment_block | ||
|
||
|
@@ -484,6 +486,10 @@ def customize_matrix(path, arch_type, destination=None, force_dest=False): | |
# do on the other TDL files. NOTE: This needs to be called | ||
# before customize_lexicon() and customize_inflection() | ||
lexical_items.insert_ids(ch) | ||
# Currently, yes-no questions need to go before the lexicon | ||
# because the lexicon needs to know whether there are obligatory question particles, | ||
# to determine the head on the question-embedding verb. | ||
yes_no_questions.customize_yesno_questions(mylang, ch, rules, lrules, hierarchies,roots) | ||
|
||
# The following might modify hierarchies in some way, so it's best | ||
# to customize those components and only have them contribute their | ||
|
@@ -522,9 +528,12 @@ def customize_matrix(path, arch_type, destination=None, force_dest=False): | |
valence_change.customize_valence_change(mylang, ch, lexicon, rules, lrules, hierarchies) | ||
word_order.customize_word_order(mylang, ch, rules) | ||
coordination.customize_coordination(mylang, ch, lexicon, rules, irules) | ||
yes_no_questions.customize_yesno_questions(mylang, ch, rules, lrules, hierarchies) | ||
clausalmods.customize_clausalmods(mylang, ch, lexicon, rules, roots, trigger) | ||
clausalcomps.customize_clausalcomps(mylang,ch,lexicon,rules) | ||
adverbs_adpositions.customize_adv_adp(ch,mylang,rules) | ||
wh_ques.customize_wh_ques(mylang,ch,rules,roots) | ||
|
||
# Service customization | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this comment mean? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't know. I doubt I wrote it; more likely it got reshuffled up or down from somewhere... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like it was introduced in c706822 and wasn't elsewhere before. Maybe you just forgot why you wrote it? In which case it's probably doing no good and can be removed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, sounds good! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed the comment; I think a comment is in fact needed there, it is just that perhaps "Service" is not good. |
||
customize_punctuation(grammar_path) | ||
customize_test_sentences(grammar_path) | ||
customize_itsdb(grammar_path) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Attempt to flag specific features which were added to the grammar by a component. | ||
# One such feature is INIT; it is not needed in all grammars but can be added by the Word Order library, | ||
# the clausal complements library, the clausal modifiers library, and the Wh-questions library | ||
# (and possibly other libraries). | ||
|
||
USED_FEATURES = {'INIT':False} | ||
USED_TYPES = {'qdet':False,'qpro':False,'qadv':False,'qverb':False} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this idea in principle, but I don't think this will work in practice like this forever? The way regression testing happens right now with I think there's a lot of opportunity for refactoring the current system, and this sort of thing seems to fit into that, so I'll open an issue for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted to start collecting some statistics about what features are used how. I did not think about it much in terms of doing it over multiple grammars simultaneously although it would of course be interesting. But even within one grammar is interesting enough. Of course I only started it and didn't really do anything interesting with these dicts yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think T.J. means that in a multithreaded (not multiprocess) use of the Matrix (which we do not currently do but in the future who knows) this global might be modified by different customizations in the same Python process, leading to incorrect results. But, as a side-point, if these are intended to be modified they should not be ALL_CAPS as that indicates they are constants. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (another reason to not use multithreading, as far as I am concerned :) ). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're not using multithreading There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know :). Good :). (Sorry, I am just messing with ya at this point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These
full_key
s look wrong, but I'm definitely not an expert on the current choices files (and I really need to finish getting rid of them 🙂).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "look wrong"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #472 (still a draft) I removed the
full_keys()
method but not thefull_key
initialization parameter, but I expect it, too, will go away in the end.Style-wise the spaces around the
=
are not PEP-8 compliant (but so far I haven't been raising style issues because there are so many in the codebase already... these could be fixed in a style-fixes-only kind of PR).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olzama Oh, now that I look at it again, I think I misunderstood what these were doing. These are top level values, so I guess they're correct.
@goodmami in the Choices redo (still not published...) that I've been working on, I have kept
full_key
. I don't think it's inherently bad (though sometimes used poorly), and it is used a lot in the code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dantiston Ok I did
git grep full_key
and yes, it's more prevalent than I thought. Consider merging #472 to get rid offull_keys()
method, which was only (ab)used in the adnominal_possession library. I still think most uses offull_key
are not great. I was hoping we'd just use TOML and use the decoded dict instead of a custom object, but if we're usingfull_key
it might not be that simple.