Need help with querying a table.

2    12 Aug 2015 20:30 by u/Vitruvia

Guys, I'm not a programmer (although I'm trying to learn it slowly in my free time). However, there is one small idea that I had that could help my friends where I study organize a little bit. I just want to get a table from a website and look for people with certain values (the table of people who attempted to join the school and those who actually got in, if you are curious). The table is open for browising and for download at the college's website. An example of what I'm intending to do is, for example, check the laste column "mensagem", where it says if the person was approved or not. If it is true, then add this person's name to a list. That would help us organize a reception for the freshmen with a lot of antecedence. (the list of "approved only" isn't released until much later).

Anyways, I don't know if this even fits here. I know it's a very basic programming thing but it's much more than what I could do right now. I appreciate any help you may give!

12 comments

3

Hey! I'm bored at work and would up to helping. The link you posted doesn't work though. Is the table a table in a database, or excel spreadsheet? Or is it just an HTML table?

0

Not sure... It opens in a PDF Here's a working link

1

Oof! That's basically a brick wall. Parsing a PDF is awful at best. I found this with Google, but can't test it because it is blocked on our network. Check it out: www.zamzar.com/convert/pdf-to-xls/

If we can get it into a spreadsheet we can easily use VBA to remove the unwanted rows.

EDIT:

Some code I shamelessly stole from the MrExcel forums:

Sub DeleteRowWithContents()
'========================================================================
' DELETES ALL ROWS FROM A2 DOWNWARDS WITH THE WORDs "Record Only" IN COLUMN D
'========================================================================
    Last = Cells(Rows.Count, "D").End(xlUp).Row
    For i = Last To 1 Step -1
        If (Cells(i, "D").Value) = "Record Only" Then
    'Cells(i, "A").EntireRow.ClearContents ' USE THIS TO CLEAR CONTENTS BUT NOT DELETE ROW
            Cells(i, "A").EntireRow.Delete
        End If
    Next i
End Sub

Source

We can modify this to remove all rows that contain empty cells in column P, skipping the first as I assume there will be a header row.

Sub DeleteRow()
'========================================================================
' DELETES ALL ROWS FROM A2 DOWNWARDS WITH EMPTY CELLS IN COLUMN P
'========================================================================
    Last = Cells(Rows.Count, "P").End(xlUp).Row 'Get the last row number in spreadsheet
    For i = Last To 1 Step -1 'Loop N times (Last row number to 1. Not inclusive, so row 2 to N)
        If IsEmpty(Cells(i, "P").Value) Then 'If cell in column P on current row is empty
            Cells(i, "A").EntireRow.Delete 'Delete entire row
        End If
    Next i 'Loop
End Sub

This code can be injected as a macro by pressing Alt+F11 to open the VBA editor. Copy and paste the above modified code into the "This Workbook" module in project explorer on the left hand side of the window. Save the workbook, and then run the macro from the developer toolbar.

File > Options > Customize Ribbon > Tick the box next to "Developer".

Open the "Developer" tab and click "Macros", then select the "DeleteRow" Macro and click "Run".

0

I have converted it into a xls file, but there is no storage so I had to download it. Where can I upload that is not blocked on your network? By the way thanks a lot for helping!

0

I edited my earlier post, please check it or some code snippets. I have yet to find a storage/backup site that isn't blocked, so I don't think I'm going to actually be able to look at the spreadsheet.

EDIT: I have gone back and commented the modified code. In VBA a ' denotes a comment, so all text appearing after the ' character is a comment. Modify the code as necessary to fit the spreadsheet, as I am unable to view it.

0

I'll try it right away.. brb

0

It worked! It deleted some rows (not sure which). The list became a document with 27 pages so I'm doing it on every page. Where do we go from here?

0

Yay! Can you verify that it deleted the correct rows? The ones that are left should NOT have empty cells in the "mensagem" column.

The next step would be to copy the data into one master sheet, and run Excels "sort" function (found under the Data" tab) if necessary. That should really be it. Is there anything else you wanted to do with the data?

0

Well.. it seems to have worked only partially... it left some empty rows in the end (but it maked kinda easy to delete them since they are in the end xD). That's about it... After pasting everything in a master sheet I should be able to sort by major and find the people who were accepted into the same as me.

Edit: I'm starting to think the problem was in the conversion. I'll try converting it in another website or program to see what happens

Edit2: Ok, so every time I convert the file it messes up a few rows. Like, one name goes missing for a line, in the other line, there is another empty row. 80-90% of the rows on each page are normal, but the others get some issues that end up messing the whole table. Any ideas on how to solve this? Would a different format, instead of xml, work?

0

Yeah, as I said PDF is a clusterfuck for parsing. Keep trying to find a converter that works is all I can suggest. If there are only a few rows with incomplete data it might honestly be easier to manually repair those rows.

0

Alright I'll do that. Thanks a lot for all your help. It has already save me a lot of time =]

0

Happy to! Let me know if there is anything else I can help with :)