Update readme; update debug prints
This commit is contained in:
parent
f57aef07c0
commit
23f7407366
18
README.md
18
README.md
|
@ -8,12 +8,24 @@
|
||||||
|
|
||||||
## High level approach
|
## High level approach
|
||||||
|
|
||||||
todo
|
We started by creating robust abstracted HTTP-handling code, which is located in the `smol-http`
|
||||||
|
module of this project. The HTTP code implements a subset of HTTP 1.1 which is enough to meet the
|
||||||
|
requirements for crawling the target web server. It also uses plain TCP sockets to communicate using
|
||||||
|
its HTTP implementation. We used Racket standard library functions to parse and manipulate URLs as
|
||||||
|
well as parse HTML (as XML, hopefully it's well-formed!) in order to find the hyperlinks on the page
|
||||||
|
as well as the flags. We implemented a high performance Certified Web Scale(tm) crawling scheduler
|
||||||
|
with a distributed work queue to allow for very high rate crawling, the crawler on our machines
|
||||||
|
takes minutes to complete, and finds all the flags very quickly.
|
||||||
|
|
||||||
## Challenges
|
## Challenges
|
||||||
|
|
||||||
todo
|
The current pandemic situation continues to make this semester difficult. Otherwise, we didn't run
|
||||||
|
into any major issues during this project.
|
||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
todo
|
We unit tested the HTTP handling code in smol-http, and used ad-hoc manual testing against the
|
||||||
|
target server to test the complete crawling functionality.
|
||||||
|
|
||||||
|
We have an additional `-d` flag which will print useful debug info during the execution of the
|
||||||
|
crawler, which may be helpful for manual testing.
|
||||||
|
|
|
@ -29,9 +29,10 @@
|
||||||
|
|
||||||
;; ->
|
;; ->
|
||||||
;; Prints a completion message to the console, only when debug mode is on
|
;; Prints a completion message to the console, only when debug mode is on
|
||||||
(define (print-complete)
|
(define (print-complete total-pages num-flags)
|
||||||
(when (debug-mode?)
|
(when (debug-mode?)
|
||||||
(printf "\r\x1b[KCrawl complete\n")))
|
(printf "\r\x1b[KCrawl complete: ~a pages crawled, ~a flags found\n"
|
||||||
|
total-pages num-flags)))
|
||||||
|
|
||||||
;; Str ->
|
;; Str ->
|
||||||
;; Prints a flag
|
;; Prints a flag
|
||||||
|
|
|
@ -157,7 +157,7 @@
|
||||||
(set-count completed) (unbox num-flags))
|
(set-count completed) (unbox num-flags))
|
||||||
|
|
||||||
(loop)))
|
(loop)))
|
||||||
(print-complete)
|
(print-complete (set-count completed) (unbox num-flags))
|
||||||
;; send all workers the shutdown message and wait
|
;; send all workers the shutdown message and wait
|
||||||
(for ([thd (in-vector worker-threads)])
|
(for ([thd (in-vector worker-threads)])
|
||||||
(thread-send thd #f)
|
(thread-send thd #f)
|
||||||
|
|
Loading…
Reference in New Issue