Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed enhancement #1 #17

Open
ddyok opened this issue Dec 2, 2019 · 1 comment
Open

Speed enhancement #1 #17

ddyok opened this issue Dec 2, 2019 · 1 comment

Comments

@ddyok
Copy link

ddyok commented Dec 2, 2019

FWIW, I've futzed around with using concurrent reading single game from within a PGN file by pre-scanning the file, separating to games, and then calling NewPGNScanner()...

I ended up allocating a new PGNScanner for each read, but then, it runs concurrently. This is just an ugly hack because I needed to shave off some miliseconds, but it turned out to half the time of parsing using 4 workers on my mackbook pro, using something like this (you get the gist - for your consideration):

func ReadConcurrent(workers int, fileName string, gamesMap *cmap.ConcurrentMap) {
	defer TimeTrack(time.Now(), "ReadConcurrent")

	var wg sync.WaitGroup

	log.Println("Reading: " + fileName)
	f, err := os.Open(fileName)
	if err != nil {
		log.Fatal(err)
	}

	scanner := bufio.NewScanner(f)

	scanner.Split(crunchSplitFunc)

	sa := make([]string, 0)

	for scanner.Scan() {
		t := scanner.Text()
		if t == "" {
			continue
		}
		sa = append(sa, "["+t)
	}

	dbName := getKey(fileName)

	wg.Add(workers)
	go readPool(&wg, dbName, workers, sa, gamesMap)
	wg.Wait()
}

func readPool(wg *sync.WaitGroup, dbName string, workers int, sa []string, gamesMap *cmap.ConcurrentMap) {
	tasksCh := make(chan int)

	for i := 0; i < workers; i++ {
		go readWorker(workers, dbName, sa, gamesMap, tasksCh, wg)
	}

	for i := 0; i < len(sa); i++ {
		tasksCh <- i
	}

	close(tasksCh)
}

func readWorker(workers int, dbName string, sa []string, gamesMap *cmap.ConcurrentMap, tasksCh <-chan int, wg *sync.WaitGroup) {
	defer wg.Done()
	for {
		idx, ok := <-tasksCh
		if !ok {
			// log.Println("!ok")
			return
		}
		s := sa[idx]
		// fmt.Println(s)
		ps := NewPGNScanner(strings.NewReader(s))
		for ps.Next() {
			game, err := ps.Scan()
			if err != nil {
				// log.Println(err)
				continue
			}

			if game == nil {
				continue
			}

			BuildUniqueBoard(game)
			BuildPiecesAtPosition(game)
			gamesMap.Set(dbName+":"+strconv.Itoa(idx), game)
		}
	}
}

func crunchSplitFunc(data []byte, atEOF bool) (advance int, token []byte, err error) {
	if atEOF && len(data) == 0 {
		return 0, nil, nil
	}

	if i := strings.Index(string(data), "[Event"); i >= 0 {
		return i + 1, data[0:i], nil
	}

	if atEOF {
		return len(data), data, nil
	}

	return
}
@ddyok
Copy link
Author

ddyok commented Dec 2, 2019

(Updated issue to include the crunchSplitFunc function).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant