Add sequence_tagging example by gpengzhi · Pull Request #302 · asyml/texar-pytorch

gpengzhi · 2020-03-03T04:37:43Z

Adapted from sequence_tagging in texar-tf.

codecov · 2020-03-03T05:00:59Z

Codecov Report

Merging #302 into master will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #302   +/-   ##
=======================================
  Coverage   79.76%   79.76%           
=======================================
  Files         133      133           
  Lines       11122    11122           
=======================================
  Hits         8872     8872           
  Misses       2250     2250

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 520018c...2c6dde0. Read the comment docs.

huzecong · 2020-03-03T17:41:32Z

+
+import texar.torch as tx
+
+# pylint: disable=redefined-outer-name, unused-variable


Why are these necessary? I can imagine unused-variable being required for the constants, but why redefined-outer-name? It's also better practice to add a corresponding pylint: enable comment after where it's no longer required.

huzecong · 2020-03-03T17:41:45Z

+DIGIT_RE = re.compile(r"\d")
+
+
+def create_vocabs(train_path, dev_path, test_path, normalize_digits=True,


Please add type annotations for functions.

huzecong · 2020-03-03T17:42:54Z

+# Prepares/loads data
+if config.load_glove:
+    print('loading GloVe embedding...')
+    glove_dict = load_glove(embedding_path, EMBEDD_DIM)
+else:
+    glove_dict = None
+
+(word_vocab, char_vocab, ner_vocab), (i2w, i2n) = create_vocabs(
+    train_path, dev_path, test_path, glove_dict=glove_dict)
+
+data_train = read_data(train_path, word_vocab, char_vocab, ner_vocab)
+data_dev = read_data(dev_path, word_vocab, char_vocab, ner_vocab)
+data_test = read_data(test_path, word_vocab, char_vocab, ner_vocab)
+
+scale = np.sqrt(3.0 / EMBEDD_DIM)
+word_vecs = np.random.uniform(
+    -scale, scale, [len(word_vocab), EMBEDD_DIM]).astype(np.float32)
+if config.load_glove:
+    word_vecs = construct_init_word_vecs(word_vocab, word_vecs, glove_dict)
+
+scale = np.sqrt(3.0 / CHAR_DIM)
+char_vecs = np.random.uniform(
+    -scale, scale, [len(char_vocab), CHAR_DIM]).astype(np.float32)


Consider moving these into main as well.

huzecong · 2020-03-03T17:44:09Z

+    return word_vecs
+
+
+class CoNLLReader:


Can we modify this to use tx.data.Dataset? This would eliminate the need for separate read_data and iterate_batch methods.

huzecong · 2020-03-03T17:44:39Z

+                           ner_tags, ner_ids)
+
+
+class NERInstance:


This and Sentence below can be changed to NamedTuples, which also supports custom methods.

huzecong · 2020-03-03T17:45:21Z

+        yield wid_inputs, cid_inputs, nid_inputs, masks, lengths
+
+
+def load_glove(filename, emb_dim, normalize_digits=True):


I thought we had a load_glove method inside tx.data? What are the differences here?

huzecong · 2020-03-03T17:46:16Z

+        self.dense_2 = nn.Linear(in_features=config.tag_space,
+                                 out_features=len(ner_vocab))
+
+    def forward(self, inputs, chars, targets, masks, seq_lengths, mode):


Type annotations.

huzecong · 2020-03-03T17:50:48Z

+    def start(self, file_path):
+        self.__source_file = open(file_path, 'w', encoding='utf-8')
+
+    def close(self):
+        self.__source_file.close()


We could change these to __enter__ and __exit__ and use the with writer.open(path) as f context manager pattern. It's also fine to keep it as is if you're more comfortable with this.

Add sequence_tagging example

7efd244

gpengzhi requested a review from huzecong March 3, 2020 04:37

huzecong requested changes Mar 3, 2020

View reviewed changes

Merge branch 'master' into sequence_tagging

2c6dde0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sequence_tagging example#302

Add sequence_tagging example#302
gpengzhi wants to merge 2 commits into
asyml:masterfrom
gpengzhi:sequence_tagging

gpengzhi commented Mar 3, 2020

Uh oh!

codecov Bot commented Mar 3, 2020 •

edited

Loading

Uh oh!

Uh oh!

huzecong Mar 3, 2020

Uh oh!

huzecong Mar 3, 2020

Uh oh!

huzecong Mar 3, 2020

Uh oh!

Uh oh!

huzecong Mar 3, 2020

Uh oh!

huzecong Mar 3, 2020

Uh oh!

huzecong Mar 3, 2020

Uh oh!

huzecong Mar 3, 2020

Uh oh!

huzecong Mar 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		import texar.torch as tx

		# pylint: disable=redefined-outer-name, unused-variable

		DIGIT_RE = re.compile(r"\d")


		def create_vocabs(train_path, dev_path, test_path, normalize_digits=True,

		yield wid_inputs, cid_inputs, nid_inputs, masks, lengths


		def load_glove(filename, emb_dim, normalize_digits=True):

Conversation

gpengzhi commented Mar 3, 2020

Uh oh!

codecov Bot commented Mar 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Mar 3, 2020 •

edited

Loading