Update readme; update debug prints

This commit is contained in:
xenia 2020-04-11 04:22:05 -04:00
parent f57aef07c0
commit 23f7407366
3 changed files with 19 additions and 6 deletions

View File

@ -8,12 +8,24 @@
## High level approach
todo
We started by creating robust abstracted HTTP-handling code, which is located in the `smol-http`
module of this project. The HTTP code implements a subset of HTTP 1.1 which is enough to meet the
requirements for crawling the target web server. It also uses plain TCP sockets to communicate using
its HTTP implementation. We used Racket standard library functions to parse and manipulate URLs as
well as parse HTML (as XML, hopefully it's well-formed!) in order to find the hyperlinks on the page
as well as the flags. We implemented a high performance Certified Web Scale(tm) crawling scheduler
with a distributed work queue to allow for very high rate crawling, the crawler on our machines
takes minutes to complete, and finds all the flags very quickly.
## Challenges
todo
The current pandemic situation continues to make this semester difficult. Otherwise, we didn't run
into any major issues during this project.
## Testing
todo
We unit tested the HTTP handling code in smol-http, and used ad-hoc manual testing against the
target server to test the complete crawling functionality.
We have an additional `-d` flag which will print useful debug info during the execution of the
crawler, which may be helpful for manual testing.

View File

@ -29,9 +29,10 @@
;; ->
;; Prints a completion message to the console, only when debug mode is on
(define (print-complete)
(define (print-complete total-pages num-flags)
(when (debug-mode?)
(printf "\r\x1b[KCrawl complete\n")))
(printf "\r\x1b[KCrawl complete: ~a pages crawled, ~a flags found\n"
total-pages num-flags)))
;; Str ->
;; Prints a flag

View File

@ -157,7 +157,7 @@
(set-count completed) (unbox num-flags))
(loop)))
(print-complete)
(print-complete (set-count completed) (unbox num-flags))
;; send all workers the shutdown message and wait
(for ([thd (in-vector worker-threads)])
(thread-send thd #f)