pat-dissertation/README.md

1.3 KiB

pat-dissertation

Code for Pat's dissertation.

sorter.rb usage

Options

sorter.rb takes the following options:

option usage
-f, --file the name of the input file
-b, --bin-file the name of the bin csv file
-t, --type what type of splitting to do, can be "iat" or "pn"

Output

sorter.rb will generate two files:

  • [filename]-out.json
  • [filename]-out.csv

Both files contain the same data in json or csv format.

Type options

The program has two filtering modes:

iat

This mode grabs all text from the input file in between PLOVEOPENING and PLOVECLOSING. It ignores all text before PLOVEOPENING and after PLOVECLOSING. It does not support multiple sections of text.

pn

This mode grabs each section of text from the input file in between Narrative: and Signatures:. It supports multiple sections from a single input text file.

Example:

./sorter.rb --file tester.txt --bin-file bins.csv --type iat

The above command will run against tester.txt, count strings according to bins.csv, and process the input text in iat mode. It will create tester-out.json and tester-out.csv containing the output data.