Code for Pat's dissertation.
Go to file
Jeff Yates ecd02eea4f added .gitignore 2020-11-27 09:36:08 -05:00
.gitignore added .gitignore 2020-11-27 09:36:08 -05:00 fixing README table part 2 2020-10-24 14:28:30 -04:00
bins.csv added testing files 2020-07-15 13:42:08 -04:00
sorter.rb added code to catch error in timestamp 2020-11-22 15:58:47 -05:00
tester.txt added testing files 2020-07-15 13:42:08 -04:00


Code for Pat's dissertation.

sorter.rb usage


sorter.rb takes the following options:

option usage
-f, --file the name of the input file
-b, --bin-file the name of the bin csv file
-t, --type what type of splitting to do, can be "iat" or "pn"


sorter.rb will generate two files:

  • [filename]-out.json
  • [filename]-out.csv

Both files contain the same data in json or csv format.

Type options

The program has two filtering modes:


This mode grabs all text from the input file in between PLOVEOPENING and PLOVECLOSING. It ignores all text before PLOVEOPENING and after PLOVECLOSING. It does not support multiple sections of text.


This mode grabs each section of text from the input file in between Narrative: and Signatures:. It supports multiple sections from a single input text file.


./sorter.rb --file tester.txt --bin-file bins.csv --type iat

The above command will run against tester.txt, count strings according to bins.csv, and process the input text in iat mode. It will create tester-out.json and tester-out.csv containing the output data.