Andrew Aksyonoff has been explaining me the chunk-oriented read process when Sphinx require the Inverted Files (IF). And these are the doubts and comments I have replied:
> ... Sphinx will read it in small (256 KB) chunks ...
So, if the query is a 2 word phrase, Sphinx will have open 2 windows simultaneously, that is, 2 buffers of 256 Kb at the same time. Is that right?
If more words in the query, more simultaneous 256 Kb chunks, right?
And the major CPU consuming process are 2:
1.- Performing the chunks intersections.
2.- Sort the intersected docId's by the user criteria.
Right?
And this is my final doubt: Is always necessary to read from the beginning to the end the corresponding IF indexes when doing the intersection?
Google in mind spurs this question. I mean, when searching in Google 2 common words like "Internet" and "WWW" we obtain this results:
* 2.110 million results for "Internet".
* 9.310 million results for "WWW".
When intersecting them Sphix would have to read and intersect 2 billion (docId+weight+probably other ranks) hits against 9 billion. Right?
Isn't that too slow? The time spent in doing it, is log(N)? Or log(2 billion) or log(9 billion) or log (2 billion x 9 billion) or any other?
Thanks again!
Wednesday, October 31, 2007
Reading IF in 256 Kb chunks
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment