-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve TypeScript typings #257
Conversation
const fieldValue = extractField(doc, fieldName) | ||
if (fieldValue !== undefined) documentFields[fieldName] = fieldValue | ||
documentFields[fieldName] = doc[fieldName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only change that I made to the logic of the code. Instead of calling extractField
on the field and saving the result to this._storedFields
, this now simply accesses the field directly, which preserves the correct type information.
What was the reason for using extractField
here? I didn't understand why that was happening, which is why I felt that replacing it was the right thing to do. If there's a reason (or edge case) for why this code was using extractField
, please let me know, so I can make sure to test that reason with my change!
@@ -111,7 +111,7 @@ describe('MiniSearch', () => { | |||
expect(extractField).toHaveBeenCalledWith(document, 'title') | |||
expect(extractField).toHaveBeenCalledWith(document, 'pubDate') | |||
expect(extractField).toHaveBeenCalledWith(document, 'author.name') | |||
expect(extractField).toHaveBeenCalledWith(document, 'category') | |||
expect(extractField).not.toHaveBeenCalledWith(document, 'category') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it important for extractField
to be called on fields that are in storeFields
? If so, why?
Hi @singingwolfboy , thanks a lot for your contribution! I agree that preserving the type information as much as possible would be desirable. Unfortunately, this change would introduce a big breaking change on a feature that a lot of users rely upon: in MiniSearch, the document does not necessarily have to be a simple key/value object. That's just the default, but in principle it can be anything, and the application developer is able to control this via the Suppose, for example, that an application is indexing news articles, and each document has a new MiniSearch({
fields: ['publishedAt', /* etc... */],
extractField: (document, fieldName) => {
if (fieldName === 'publishedAt') {
return document[fieldName].toLocaleDateString('en-UK', {
weekday: 'long',
year: 'numeric',
month: 'long',
day: 'numeric',
})
} else {
return document[fieldName]
}
}
}) This feature is widely used, and existed prior to the switch to TypeScript. If types were the most important goal, one could require developers to transform the documents before indexing, to ensure that all fields are simple key/value objects with string values. While Therefore, I am unfortunately not going to merge this pull request, as the cost of the large breaking change surpasses the benefits. TypeScript users, if they are not using a custom That said, this is not the first time this topic comes up: see #229 and #208 for example. I am devising a partial solution that would allow more type safety while not introducing a breaking changes. Thanks again for the contribution, highly appreciated! |
@lucaong I completely understand the value of the The one and only change in functionality that I have made in this pull request (as far as I'm aware) is that the This is useful because when we want to store extra information in the index, I believe it's useful to store that information exactly as it was listed in the document. For example, a Can you explain why we want to call |
Thanks @singingwolfboy for the explanation ! It is true that, most typically, // Assuming that our documents look like:
const documents = [
{ id: 1, title: 'Moby Dick', author: { name: 'Herman Melville' } },
{ id: 2, title: 'Zen and the Art of Motorcycle Maintenance', author: { name: 'Robert Pirsig' } },
{ id: 3, title: 'Neuromancer', author: { name: 'William Gibson' } },
{ id: 4, title: 'Zen in the Art of Archery', author: { name: 'Eugen Herrigel' } },
// ...and more
]
// We can support nested fields (author.name) with a
// custom `extractField` function:
let miniSearch = new MiniSearch({
fields: ['title', 'author.name'],
extractField: (document, fieldName) => {
// Access nested fields
return fieldName.split('.').reduce((doc, key) => doc && doc[key], document)
}
}) In the case above, should one want to store the author name, one has the option to either do What do you think? |
What I think is a backward incompatible change that worries me more, is that if you enforce the type This was not clear in my first example, because I used a field name that actually exists as a key on the document type, but should be clearer in my second example. Sorry for that. Tests pass because they are written in JavaScript, but I believe that TypeScript projects doing what shown in the example above would not compile anymore after the change. I do have an idea to possibly improve type safety without breaking backwards compatibility, I hope I find the time to outline it in an issue at some point soon. If I manage to, I will mention you in the issue, so you can provide your opinion. |
That's not the only way to do it! In fact, I would say it's not even the best way to do it. If the intended use case is supporting dotted paths to store specific fields from nested objects, then we can write the function that handles the
You underestimate how expressive TypeScript is! 😄 In fact, this pull request already accounts for dotted paths like you describe. If you check the I tried to take a conservative approach so far, and I've only used the
Would you like me to rewrite the tests in TypeScript? Would that convince you? |
I really appreciate your pull request, and I am motivated to find a solution to improve the type safety of MiniSearch, at least in the common case of using plain key/value objects as documents without a custom That said, I still think it's a breaking change: the // Assuming that our documents look like:
const documents = [
{ id: 1, title: 'Moby Dick', author: { name: 'Herman Melville' }, pubDate: new Date(1851, 9, 18) },
{ id: 2, title: 'Zen and the Art of Motorcycle Maintenance', author: { name: 'Robert Pirsig' }, pubDate: new Date(1974, 3, 1) },
{ id: 3, title: 'Neuromancer', author: { name: 'William Gibson' }, pubDate: new Date(1984, 6, 1) },
{ id: 4, title: 'Zen in the Art of Archery', author: { name: 'Eugen Herrigel' }, pubDate: new Date(1948, 0, 1) },
// ...and more
]
// We can support nested fields (author.name) and date fields (pubDate) with a
// custom `extractField` function:
let miniSearch = new MiniSearch({
fields: ['title', 'author.name', 'pubYear'],
extractField: (document, fieldName) => {
// If field name is 'pubYear', extract just the year from 'pubDate'
if (fieldName === 'pubYear') {
const pubDate = document['pubDate']
return pubDate && pubDate.getFullYear().toString()
}
// Access nested fields
return fieldName.split('.').reduce((doc, key) => doc && doc[key], document)
}
}) Here, the What I think could be done in the future, is to support a different syntax in the // Future proposal, NOT YET WORKING
new MiniSearch({
fields: [
{
name: 'title',
index: true, // Index this field. Can be set to false (and store to true) when only storing a field
store: true, // equivalent to adding this field to storeFields
tokenize: (s) => s.split(/\s+/), // tokenizer for this field
extractField: (doc, _) => doc.title, // extractor for this field
processTerm: (term) => term.toLowerCase() // processor for this field
},
{ name: 'body', store: true }, // Uses the default tokenizer, extractor, processor, etc.
{ path: 'author.name' }, // When using path instead of name we can be type safe (as in path
// must be a valid path in the doc type). Path is mutually exclusive to
// extractField, and by default is also used as name, if not specified
'info' /* same as { name: 'info' }. If it was the same as { path: 'info' }
* this would be a breaking change when specifying a default
* extractField function, as we cannot require the name to be a key in the
* document type. Maybe we can make it the same as { path: 'info' } if and only
* if there is no custom `extractField` defined? */
],
storeFields: ['title', 'body', 'info'], // In case of conflict with fields it should error
processTerm: (term) => term.toLowerCase(), // Provides a default for all fields
tokenize: (s) => s.split(/\s+/), // Provides a default for all fields
extractField: (doc, fieldName) => doc[fieldName], // Provides a default for all fields,
searchOptions: {
boost: { title: 2 },
prefix: true,
fuzzy: 0.2,
tokenize: (s) => s.split(/\s+/) // Search tokenizer
}
}) For clarity, here I am using all possible combinations to annotate how they interact together, but typically one would either use the Therefore, these would be type safe in the type of field names: new MiniSearch({ fields: [
{ name: 'title' },
{ name: 'body', store: true },
{ path: 'author.name' }
]
})
new MiniSearch({ fields: ['title', 'body', 'info'] }) While these would allow any string as field names (because a custom extractor is specified): new MiniSearch({
fields: [
{ name: 'title', extractField: (doc, _) => doc.title },
]
})
new MiniSearch({
fields: ['title', 'body'],
extractField: (doc, fieldName) => doc[fieldName]
}) The proposal above supports the current syntax with no breaking change, but offers a type safe syntax for users that do not need a custom What do you think? |
Maybe it is even possible to enforce type safety with the current syntax whenever The above proposal would be a further improvement on it, by allowing dotted paths to be specified without an One drawback is that the more complicated types would make the documentation less user friendly: now it is relatively easy to look up the meaning of each option in the docs, but with this change one would have to follow the type union. This can still be solved by explicitly documenting options. |
Hmm, that does present a problem. But not an insurmountable one! We could simply add some extra fields to the type annotation. Here's some example code, that doesn't currently work with this pull request, but we could alter the code to make it work: interface Doc {
id: number;
title: string;
author: { name: string; }
pubDate: Date;
}
// note the two type arguments
const ms = new MiniSearch<Doc, 'pubYear'>({
fields: ['title', 'author.name', 'pubYear'],
extractField: (document, fieldName) => {
// If field name is 'pubYear', extract just the year from 'pubDate'
if (fieldName === 'pubYear') {
const pubDate = document['pubDate']
return pubDate && pubDate.getFullYear().toString()
}
// Access nested fields
return fieldName.split('.').reduce((doc, key) => doc && doc[key], document)
}
}) We can use type arguments to the
I like this, too! I particularly like that you can now pass functions that are scoped by field: no more I'm OK with abandoning my efforts on this pull request, if you think that this other approach works better. Maybe I could help with specifying it and writing the code/tests? |
Yes, contributions like yours are very welcome :) My concern is mostly to avoid big breaking changes with limited benefit. It's acceptable to add some breaking change in a major release, especially if the benefit outweights the cost. Let me think about this a bit more before closing your PR. In any case, I am completely on your side that additional type safety is desirable, and I definitely can use your help on this or alternative proposals. |
I will close this PR, but the conversation is still very relevant to implement the strongly typed interface of the next major release. |
This alters the TypeScript typings to make it clear that the
fields
,idField
, andstoreFields
properties must be keys in the document type. It also improves the typing of theSearchResult
type so that it knows to combine it with the stored fields, and preserves the type information when doing so.Now, in order to use this project with TypeScript, here is an example of the right way to do so: