Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question re: Filters #18

Open
oucil opened this issue May 8, 2015 · 33 comments
Open

Question re: Filters #18

oucil opened this issue May 8, 2015 · 33 comments

Comments

@oucil
Copy link
Contributor

oucil commented May 8, 2015

First of Konstantin, thank you for writing your own bike :)

I have a question about the "Filter" object, specifically the "position" parameter. I'm not clear what that is. The method describes the type as an int, but I'm not sure what the int is in reference to. Is it a reference to the array position of a corresponding field from the parent select statement?

Appreciate your insights!

Kevin.

@oucil
Copy link
Contributor Author

oucil commented May 8, 2015

I should add the following, because I'm starting to see where it may reference in the index's filter column list, which isn't exactly clear when you're using selectByIndex...

                // Open a connection
        $index_id = $this->_read->getIndexId(
                $this->_database,
                $query->get_table(),
                $query->get_index(),
                $query->get_fields()
        );

        // Create the select query
        $query = $this->_read->selectByIndex(
                $index_id,
                $query->get_type(),
                $query->get_key_values(),
                $query->get_limit(),
                $query->get_offset(),
                $filter_values
        );

Would I be right in assuming that I need to add the filter column list to the getIndexId() method? And that the position param used in the Filters themselves reference the associative array key from that array?

@KonstantinKuklin
Copy link
Owner

thanks for choosing my bike :)

for example:

        $reader = $this->getReader();

        $indexId = $reader->getIndexId(
            $this->getDatabase(),
            $this->getTableName(),
            'PRIMARY',
            array('key', 'text'),
            true,
            array(0 => 'num')
        );

here I opened some table and PRIMARY index with columns (key, text). The next parameter returnOnlyId ('true') need us to get only int of opened index number. And the last array is filter column (num), later we can use it when we will need to use something like where in SQL.

        $selectQuery = $reader->selectInByIndex(
            $indexId,
            array(42, 100),
            0,
            99,
            array(new Filter(Comparison::EQUAL, 0, 1))
        );

this HS query is the same like in SQL:
select key, text from table where key in (42, 100) and num = 1

I added one filter with comparison type equal(=), also position is 0 it is position of element inside array filterColumnList when you use getIndexId.
I hope, I could answer your question.

I would recommend you to try QueryBuilder, because it is more human friendly :)

same query via querybuilder:

        $selectQueryBuilder = QueryBuilder::select(array('key', 'text'))
            ->fromDataBase($this->getDatabase())
            ->fromTable($this->getTableName())
            ->whereIn('key', array(42, 100))
            ->andWhere('num', Comparison::EQUAL, 1);

another examples you can find in docs or in tests

@oucil
Copy link
Contributor Author

oucil commented May 8, 2015

Thanks for getting back so quick, appreciate you clearing that up! I gradually started to figure that out, I believe part of the problem was I had fixed composer to 0.1 rather than dev-master so I was working from an older copy and the method hints didn't match up to your examples. ;)

Love the look of the querybuilder, very logical and I'm sure a great help to those testing out handlersocket with only a mysql background. I actually wrote part of a sockets library for handlersocket a couple of years ago but it wasn't clean, and I didn't go much beyond basic the basics. The nice thing is that it was abstracted well and the way you've organized your input params falls right in line with what my own library did so your library has been like a drop in replacement for our own, instantly opening up a lot more functionality! Can't tell you how great (and rare) that is!

We're doing a lot of throughput through handlersocket, averaging about 50k reads / s, and bursting to 300k / s every 10 min on a regularly run maintenance script. My intention though is to triple that now that I'll have the ability to offload a little more thanks to the work you've done in your library. If I can glean anything important or give back to your library I will.

Cheers!
Kevin.

@KonstantinKuklin
Copy link
Owner

thx for your feedback Kevin, now I found that packagist set last stable version to 0.04 tag, but it is not a last stable version in fact. I'll double check it.
It will be great if you`ll find how to improve the library, I'am always open to discuss

@oucil
Copy link
Contributor Author

oucil commented May 9, 2015

Hey Konstantin,

I'm having no luck getting results out of the system, and I'm having a hard time reconciling any of the debug output I've come up with. Maybe you can take a quick peak, you'll probably be able to spot something pretty quickly...

This is my query object that I pass into my nosql/handlersocket adapter...

My Query Object

ECMS_Database_Query_Nosql Object
(
    [_action:ECMS_Database_Query_Nosql:private] => read
    [_index:ECMS_Database_Query_Nosql:private] => sys_domains_domain
    [_table:ECMS_Database_Query_Nosql:private] => sys_domains
    [_keymap:ECMS_Database_Query_Nosql:private] => Array
        (
            [domainName] => ecms.io
        )

    [_filters:ECMS_Database_Query_Nosql:private] => 
    [_fields:ECMS_Database_Query_Nosql:private] => Array
        (
            [0] => cookie_domain
        )

    [_limit:ECMS_Database_Query_Nosql:private] => 100
    [_offset:ECMS_Database_Query_Nosql:private] => 0
    [_type:ECMS_Database_Query_Nosql:private] => =
)

As you can see above, it contains the basics, index name, the _keymap is in the format of index_key => key_value, _fields is the return field list, etc. _filters would contain additional filter

The rest are your own HS objects...

Reader

HS\Reader Object
(
    [countQueries:HS\CommonClient:private] => 3
    [timeQueries:HS\CommonClient:private] => 0
    [stream:HS\CommonClient:private] => Stream\Stream Object
        (
            [stream:Stream\Stream:private] => Resource id #51
            [connection:Stream\Stream:private] => Stream\Connection Object
                (
                    [path:Stream\Connection:private] => ###.###.###.###
                    [protocol:Stream\Connection:private] => tcp
                    [port:Stream\Connection:private] => 9998
                )

            [dataHandler:Stream\Stream:private] => Stream\DataHandler Object
                (
                    [driver:Stream\DataHandler:private] => 
                )

            [readTimeOutSeconds:Stream\Stream:private] => 0
            [readTimeOutMicroSeconds:Stream\Stream:private] => 500000
            [method:Stream\Stream:private] => Stream\ReceiveMethod\StreamGetLineMethod Object
                (
                    [length:Stream\ReceiveMethod\StreamGetLineMethod:private] => 1024
                    [ending:Stream\ReceiveMethod\StreamGetLineMethod:private] => 

                )

        )

    [currentIndexIterator:HS\CommonClient:private] => 1
    [indexList:HS\CommonClient:private] => Array
        (
            [horizonssys_domainssys_domains_domaincookie_domain] => 1
        )

    [queryListNotSent:HS\CommonClient:private] => Array
        (
        )

    [keysList:HS\CommonClient:private] => Array
        (
            [1] => Array
                (
                    [0] => cookie_domain
                )

        )

    [authKey:HS\CommonClient:private] => 
    [debug:HS\CommonClient:private] => 
    [debugResultList] => Array
        (
        )

)

SelectQuery

HS\Query\SelectQuery Object
(
    [comparisonOparation:protected] => =
    [keyList:protected] => Array
        (
            [0] => ecms.io
        )

    [offset:protected] => 100
    [limit:protected] => 0
    [columnList:protected] => Array
        (
            [0] => cookie_domain
        )

    [suffix:protected] => 
    [inKeyList:protected] => 
    [filterList:protected] => Array
        (
        )

    [returnType:protected] => 1
    [resultObject:protected] => HS\Result\SelectResult Object
        (
            [query:protected] => HS\Query\SelectQuery Object
 *RECURSION*
            [code:protected] => 0
            [error:protected] => 
            [data:protected] => Array
                (
                )

            [time:protected] => 0
            [modifyRows:protected] => 
            [openIndexQuery:protected] => 
        )

    [socket:protected] => HS\Reader Object
        (
            [countQueries:HS\CommonClient:private] => 3
            [timeQueries:HS\CommonClient:private] => 0
            [stream:HS\CommonClient:private] => Stream\Stream Object
                (
                    [stream:Stream\Stream:private] => Resource id #51
                    [connection:Stream\Stream:private] => Stream\Connection Object
                        (
                            [path:Stream\Connection:private] => ##.##.##.##
                            [protocol:Stream\Connection:private] => tcp
                            [port:Stream\Connection:private] => 9998
                        )

                    [dataHandler:Stream\Stream:private] => Stream\DataHandler Object
                        (
                            [driver:Stream\DataHandler:private] => 
                        )

                    [readTimeOutSeconds:Stream\Stream:private] => 0
                    [readTimeOutMicroSeconds:Stream\Stream:private] => 500000
                    [method:Stream\Stream:private] => Stream\ReceiveMethod\StreamGetLineMethod Object
                        (
                            [length:Stream\ReceiveMethod\StreamGetLineMethod:private] => 1024
                            [ending:Stream\ReceiveMethod\StreamGetLineMethod:private] => 

                        )

                )

            [currentIndexIterator:HS\CommonClient:private] => 1
            [indexList:HS\CommonClient:private] => Array
                (
                    [horizonssys_domainssys_domains_domaincookie_domain] => 1
                )

            [queryListNotSent:HS\CommonClient:private] => Array
                (
                )

            [keysList:HS\CommonClient:private] => Array
                (
                    [1] => Array
                        (
                            [0] => cookie_domain
                        )

                )

            [authKey:HS\CommonClient:private] => 
            [debug:HS\CommonClient:private] => 
            [debugResultList] => Array
                (
                )

        )

    [indexId:protected] => 1
    [openIndexQuery:protected] => 
    [valueList:protected] => Array
        (
        )

    [queryClassName:protected] => HS\Query\SelectQuery
)

Here's my current setup after looking through some of your test examples. $this->read contains the Reader object, and $query is my object from up above that passes in all the details to the adapter. The methods in use format the parameters appropriately. $filter_keys and $filter_values are empty arrays for this example, which match the defaults cast in your methods.

        // Open a connection
        $index_id = $this->_read->getIndexId(
                $this->_database,
                $query->get_table(),
                $query->get_index(),
                $query->get_fields(),
                TRUE,
                $filter_keys
        );

        // Create the select query
        /* @var $read SelectQuery */
        $read = $this->_read->selectByIndex(
                $index_id,
                $query->get_type(),
                $query->get_key_values(),
                $query->get_limit(),
                $query->get_offset(),
                $filter_values
        );

        // Get the results as an array
        $read->setReturnType(SelectQuery::VECTOR);

        // Run the query and parse the results
        $data = $read->execute()->getResult()->getData();

No matter what I do $data is always an empty array.

Any light you can shed on this would really help!

Thanks,
Kevin.

@KonstantinKuklin
Copy link
Owner

did u try sent something simple to HS with current configuration without my library? HS worked ok?
database, table, index and column are exists?
how u installed hs? (here example how I did it for tests)

@oucil
Copy link
Contributor Author

oucil commented May 10, 2015

I've been using HS for more than two years in production using our own limited hand crafted library, but wanted to expand it's use, thus the move to your library. Our query object abstracts queries from the adapter so different forms of queries can be dispatched to different adapters depending on the context. For instance, a complex query with joins will automatically go to the Native MySQL adapter, whereas simple Key/Val queries will get routed through HS.

All of the queries I'm testing with work in our own HS library, but are coming up with no results through yours. I know it's a simple matter of me injecting something incorrectly which is why I provided the object dumps above. Maybe you can see something missing, or something incorrectly set?

Thanks!

@KonstantinKuklin
Copy link
Owner

very strange, do u use HS internal auth on servers?
also pls run this code

        // Open a connection
        $indexQuery = $reader->getIndexId(
            $this->_database,
            $query->get_table(),
            $query->get_index(),
            $query->get_fields(),
            false,
            $filter_keys
        );

        $openIndexQueryText = $indexQuery->getQueryString();


        // Create the select query
        /* @var $read SelectQuery */
        $read = $reader->selectByIndex(
            1,
            $query->get_type(),
            $query->get_key_values(),
            $query->get_limit(),
            $query->get_offset(),
            $filter_values
        );
        $selectQueryText = $read->getQueryString();

I`am interested in $openIndexQueryText and selectQueryText values, dump it here.
Also pls try to send the values from variables to HS server via telnel will it work?

@oucil
Copy link
Contributor Author

oucil commented May 10, 2015

Thanks for the hint on using the query string output, helped me notice my Limit and Offset were reversed, meaning I was limiting to 0 results. :)

Now I'm getting an exception out of the /HS/Result/ResultAbstract.php, though for another query, looks like it might have something to do with one of three things:

  1. Something in the XML that's being retrieved,
  2. potentially the length of the data, I seem to remember running across something that looked like it might limit the stream data to 1000 or 1024 bytes which would certainly cause problems with some of the results I'd be retrieving, do you know if that's the case? If so is that adjustable, or does your function pull in a loop till it gets an EOL?
  3. the error output looks like the newline characters are still encoded for transfer, does your response handler decode responses?

Thanks,
Kevin.

@oucil
Copy link
Contributor Author

oucil commented May 11, 2015

Haven't made much headway on this, but I do see that the "Driver" does indeed perform encoding/decoding, so not sure where the error is coming from. Here's a sampling of the output from the error....

Uncaught exception 'HS\Errors\UnknownError' with message 'gs/>�M�J  </ectag>�M�J  <ecugm>�M�J    <tier>4</tier>�M�J    <settings/>

This is just a snipit but it shows that the error message is cutting in in the middle of my xml result data and a number of �M�J misfires where there should be \n newlines. The tabs still seem to be in place following the newlines.

Is there an opportunity to output the raw data from the response prior to decoding, and then right after decoding but prior to parsing?

@KonstantinKuklin
Copy link
Owner

  1. I think, I need to give a way to change response length
  2. New line character "\n" coded by HS protocol, I decode it inside the library with Driver configuration - it is not a problem.

You can't get response raw text after query with library now, but u can generate plain text from the query and bring it to HS via telnet, because UnknownError is a problem from HS side not from library.
I didn't have experience to work with XML via HS, but it must work good

@oucil
Copy link
Contributor Author

oucil commented May 11, 2015

Sounds like responses getting truncated is the likely cause then. Here's how I previously handled getting responses...

        $data = NULL;
        while ($buffer = socket_read($this->_connections['read'],2048)) {
            $data .= $buffer;
            if (strpos($buffer, "\n") !== FALSE) {
                $data = substr($data,0,-1);
                break;
            }
        }

It worked 99% of the time, but there were still some odd issues I hadn't yet sorted out. This will probably help. I'll fork the project and see if I can come up with something based on the stream library you're using (i think). To save a little time, can you point me to file/line where you're performing the actual data read?

Thanks,
K.

@KonstantinKuklin
Copy link
Owner

Stream library doesn't has any logic, just read/write
u need this

@oucil
Copy link
Contributor Author

oucil commented May 11, 2015

The culprit is actually here where you've hard coded a line length of 1024. I've followed through your process a bit to the stream library here where you call stream_get_line() using that length.

The PHP docs state the following...

Reading ends when length bytes have been read, when the string specified by ending is found (which is not included in the return value), or on EOF (whichever comes first).

... so it's hitting 1024 bytes received and quitting, truncating the response. I'm not sure you'll be able to get past this using the current function, it looks like it's better suited to retrieving things with a roughly consistent size, which is difficult to count on in all but the most basic scenarios.

In the code I posted above, the socket_read() method will run until it reaches a \n, where as in stream_read_line() it will end as soon as ANY of the supplied conditions are met.

That means that I'm going to have to change 1024 to something like 512000 (500Kb), to ensure that any eventuality will be dealt with successfully. I'm not sure whether that will have any performance hit or not, because, it will still hit a \n condition first and quit, but I'm curious to know if it has to reserve enough memory to handle the possibility of 500Kb?!?!

I'm still working on this and will let you know.

Cheers,
K.

@oucil
Copy link
Contributor Author

oucil commented May 11, 2015

Thought a potential solution may be to use a loop like I provided above but using stream_get_contents() instead. Alas, it looks like unless you know the exact length of the data, the system will wait for a specified timeout before giving up, rather than checking for EOL. Still thinking this through. Not sure stream functions which are high level compared to the low level of sockets are going to cut it.

@oucil
Copy link
Contributor Author

oucil commented May 11, 2015

So after an SO question, I've learned that stream_get_line() operates much the same as socket_read() does and so placing it in a similar loop to the one I supplied earlier should correct the issue....

$data = NULL;
while ($buffer = @stream_get_line($streamResource, $this->length, $this->ending)) {
        $data .= $buffer;
}
return $data;

The only problem is whenever I change the readStream() method that performs the read, I start getting...

Read stream error. Can't read from stream. URL: tcp://127.0.0.1:9998

Maybe you can give this stuff a shot, it's hard for me to debug efficiently without spending a lot more time looking through your stuff.

@oucil
Copy link
Contributor Author

oucil commented May 12, 2015

It sounds like stream_get_line is too high level for this type of scenario, it will block when there isn't enough data to fill out $limit, and when "/n" has been reached and there's no easy way to detect it because it automatically strips out the EOL. Check out this good description of the problem http://www.linuxonly.nl/docs/41/130_The_problem_with_stream_get_line.html

I'm doubtful this is going to be doable without moving over to Sockets.

@KonstantinKuklin
Copy link
Owner

I did a lot of tests with fgets, stream_get_contents and stream_get_line..
stream_get_contents was the fastest, but we have an a problem, how we can know, that socket sent all data and we receive it all.
fgets not so fast as stream_get_line
I think the best way is set length for stream_get_line and check if count($data) == $length then throw exception - some part of data loss.
Today I`ll check it after the work.
approximately how much data you have transferred to the reponse?

@oucil
Copy link
Contributor Author

oucil commented May 12, 2015

socket_read should be comparable if not faster since it's an extension of the BSD Sockets library, whereas stream_* is a php implementation, so I would consider sockets before the others.

That said, you can't test if the data is the same length as the $length setting for the reason outlined in the link above. You don't know if it ended because of $length, or because "\n" was reached (because you can't test for EOL because it's stripped off results automatically), it's impossible to know unless you run one more iteration of the loop which results in the network blocking.

@KonstantinKuklin
Copy link
Owner

agree, that's why I count all operation which I send in HS library.
I always know count of lines which i must get from socket

@oucil
Copy link
Contributor Author

oucil commented May 12, 2015

But it's not the number of lines that we don't know, it's the length of each line. i.e. a simple ID retrieval may only yield 32 bytes, but an XML field could yield 100's of Kb worth of data.

They're not always going to be consistent, and unless length is set to a very high number, we will not be able to retrieve the full line. Also due to the nature, the query will take longer to fill data from the buffer before it can test for the EOL when length is set high.

Being able to loop over smaller chunks of the data and test for EOL manually will be much faster, but impossible to do with stream_get_line.

@KonstantinKuklin
Copy link
Owner

Reading ends when length bytes have been read, when the string specified by ending is found (which is not included in the return value), or on EOF (whichever comes first).
you need to set buffer size to 10mb for example and read such big line if in line data find \n symbol it will end reading and will return data.
if line will be more than 10mb - it will throw exeption, because line with data not read completely.
Here we have something like mysql_buffer_size or nginx/apache buffer sizes - if enough of their size, then all work fine, but if not - you have errors on logs.

@oucil
Copy link
Contributor Author

oucil commented May 12, 2015

That was my assumption as well, but the question then is, why bother building that function with a length parameter at all if you can set it to infinity and have no performance hit? I would bet you that there is a performance decrease the larger that number is.

@KonstantinKuklin
Copy link
Owner

I pushed code to dev_new_reading branch - pls checkout it and let me know will it help you

@oucil
Copy link
Contributor Author

oucil commented May 15, 2015

Thanks Konstantin, I'll pull it down next week and test it out, long weekend here. Cheers!

@oucil
Copy link
Contributor Author

oucil commented Jun 20, 2015

Sorry for the long delay in getting back, the new code seems to be working, thank you!

I did find a problem with your result parsing, but I've already added a pull request #22 with a patch. I'll put this all through it's paces and let you know if I run into anything else!

Cheers!

@KonstantinKuklin
Copy link
Owner

no problem, I`m glad that my library helping you!

@oucil
Copy link
Contributor Author

oucil commented Jun 20, 2015

So it seemed to start off well under load but within about 3 min I started seeing loads of...

Uncaught exception 'HS\Exception\Exception' with message 'Read stream error. Can't read from stream. URL: tcp://#.#.#.#:9998' in /path/to/vendor/konstantin-kuklin/handlersocket-library/src/HS/CommonClient.php:168

I know the error wasn't coming from the data, the same pages I initially tested which were working well, I retested after I noticed the errors piling up and they were all failing after a short while. Not sure how it overwhelmed things so quickly but there's something going on there. Maybe connection caching or something?

I switched back to my own socket_read() / socket_write() code I've been using all along and the connection errors disappeared and everything returned to normal.

@KonstantinKuklin
Copy link
Owner

It`s hard to say, I would try to look to queries count for send and read

@oucil
Copy link
Contributor Author

oucil commented Jun 20, 2015

Do you mean aggregated across the server or read/write per request?

@KonstantinKuklin
Copy link
Owner

I mean the values $countQueries and $queryListNotSent inside CommonClient
countQueries = count to read from socket
count of queryListNotSent = count to sent
Such error message possible, if you try to get from socket more information, that it contain - for example read 8 results with 7 queries

@oucil
Copy link
Contributor Author

oucil commented Oct 7, 2015

Hey, long time! :) I've been testing the library further and everything is running well now, no errors or unexpected results. Works well with large or small data payloads.

The only thing I'm finding now is that it's a bit sluggish. When I use my own lightweight handlersocket reader, I'm able to load a specific page that is heavily HS reliant, and I can get it up consistently in 1.8s - 1.9s, but when I switch the adapter to yours, it's taking 4s+.

I'll have to do more testing, but is there anything you know of that I can switch off, that's on by default that might add a lot of overhead?

@Profforgr
Copy link

Our query object abstracts queries from the adapter so different forms of queries can be dispatched to different adapters depending on the context. For instance, a complex query with joins will automatically go to the Native MySQL adapter, whereas simple Key/Val queries will get routed through HS.

@oucil Can you consider open-sourcing your query object or parts of it? Queries abstraction way sounds cool. If it is 2x faster than HandlerSocketLibrary, your code is definitely worth testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants