Code for Pat's dissertation.
Go to file
Jeff Yates ddb4003e66 added threading to file processing 2020-11-21 19:11:29 -05:00
README.md fixing README table part 2 2020-10-24 14:28:30 -04:00
bins.csv added testing files 2020-07-15 13:42:08 -04:00
sorter.rb added threading to file processing 2020-11-21 19:11:29 -05:00
tester.txt added testing files 2020-07-15 13:42:08 -04:00

README.md

pat-dissertation

Code for Pat's dissertation.

sorter.rb usage

Options

sorter.rb takes the following options:

option usage
-f, --file the name of the input file
-b, --bin-file the name of the bin csv file
-t, --type what type of splitting to do, can be "iat" or "pn"

Output

sorter.rb will generate two files:

  • [filename]-out.json
  • [filename]-out.csv

Both files contain the same data in json or csv format.

Type options

The program has two filtering modes:

iat

This mode grabs all text from the input file in between PLOVEOPENING and PLOVECLOSING. It ignores all text before PLOVEOPENING and after PLOVECLOSING. It does not support multiple sections of text.

pn

This mode grabs each section of text from the input file in between Narrative: and Signatures:. It supports multiple sections from a single input text file.

Example:

./sorter.rb --file tester.txt --bin-file bins.csv --type iat

The above command will run against tester.txt, count strings according to bins.csv, and process the input text in iat mode. It will create tester-out.json and tester-out.csv containing the output data.